Read UTF-8 characters from text files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Read UTF-8 characters from text files

shiguera
Hello:

I'm trying to read some text files with Spanish characters, as ñ, ó and
others.

I don't know how to do it. I have tried with fscanf('%s'), but I get a
warning message:  'warning: range error for conversion to character value'

Can you help me?

Thanks, regards


Santiago Higuera



_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Read UTF-8 characters from text files

shiguera
Hello:

I have prepared a sample file with utf-8 characters to explain my
problem using fscanf(fid,'%s'). The file can be downloaded from [1].

If I use fread(), it reads the correct byte value for extended
characters, but if I use fscanf(), I receive the message 'warning: range
error for conversion to character value', and the extended code
characters are not read in the result string. I can read the correct
line if I use fgets() or fgetl(), but I need single words to process the
file and the same problem occurs with sscanf(), if I try to read the
words from a string variable.

I checked the problem in Matlab, and I have verified the Matlab can read
the correct characters from string with utf8 extended codes. I have
spent a lot of time googling for the solution, but I have not found
anything that solves the problem.

Can you help me? Is there any workaround to solve the problem. I'm
teaching Octave to my students and we need to process a lot of
statistics files in Spanish language, that have a lot of this kind of
characters. In my University, there are free Matlab licences for the
students and professors, but I always recommend to my students the use
of Octave.  I wouldn't like to surrender to the use of Matlab due to
this problem.

Thanks in advance.

Regards

Santiago Higuera

[1] http://mercatorlab.com/p1.txt



El 07/12/17 a las 00:18, Santiago Higuera escribió:

> Hello:
>
> I'm trying to read some text files with Spanish characters, as ñ, ó
> and others.
>
> I don't know how to do it. I have tried with fscanf('%s'), but I get a
> warning message:  'warning: range error for conversion to character
> value'
>
> Can you help me?
>
> Thanks, regards
>
>
> Santiago Higuera
>
>


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Read UTF-8 characters from text files

mmuetzel
Santiago,

I am seeing the same error on Linux and Windows with 4.2.1 and a recent dev.

That message also appears if you try to convert an out-of-range number to a
character (e.g. char(300) or char(-2)). There might be a problem with using
char, unsigned char and signed char inconsistently in the code (probably in
oct-stream.cc).

Can you please open a bug report at
https://savannah.gnu.org/bugs/?group=octave describing your issue?

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Read UTF-8 characters from text files

shiguera
Thanks, Markus, I'll do


El 16/12/17 a las 20:20, mmuetzel escribió:

> Santiago,
>
> I am seeing the same error on Linux and Windows with 4.2.1 and a recent dev.
>
> That message also appears if you try to convert an out-of-range number to a
> character (e.g. char(300) or char(-2)). There might be a problem with using
> char, unsigned char and signed char inconsistently in the code (probably in
> oct-stream.cc).
>
> Can you please open a bug report at
> https://savannah.gnu.org/bugs/?group=octave describing your issue?
>
> Markus
>
>
>
> --
> Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html
>
> _______________________________________________
> Help-octave mailing list
> [hidden email]
> https://lists.gnu.org/mailman/listinfo/help-octave
>


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Read UTF-8 characters from text files

shiguera
It's done


El 16/12/17 a las 20:34, Santiago Higuera escribió:

> Thanks, Markus, I'll do
>
>
> El 16/12/17 a las 20:20, mmuetzel escribió:
>> Santiago,
>>
>> I am seeing the same error on Linux and Windows with 4.2.1 and a
>> recent dev.
>>
>> That message also appears if you try to convert an out-of-range
>> number to a
>> character (e.g. char(300) or char(-2)). There might be a problem with
>> using
>> char, unsigned char and signed char inconsistently in the code
>> (probably in
>> oct-stream.cc).
>>
>> Can you please open a bug report at
>> https://savannah.gnu.org/bugs/?group=octave describing your issue?
>>
>> Markus
>>
>>
>>
>> --
>> Sent from:
>> http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html
>>
>> _______________________________________________
>> Help-octave mailing list
>> [hidden email]
>> https://lists.gnu.org/mailman/listinfo/help-octave
>>
>
>
> _______________________________________________
> Help-octave mailing list
> [hidden email]
> https://lists.gnu.org/mailman/listinfo/help-octave


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave