Putting international characters into output files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Putting international characters into output files

Ian McCallion
Hi,

I'm trying to put a UK Pounds sign into an output file. Is there a way
to do this in Octave.

Cheers... Ian


Reply | Threaded
Open this post in threaded view
|

Re: Putting international characters into output files

Andreas Weber-6
Am 04.08.2018 um 18:36 schrieb Ian McCallion:
> I'm trying to put a UK Pounds sign into an output file. Is there a way
> to do this in Octave.

What kind of "output file"? I'm sure it's possible to write any utf8
char into a textfile.



Reply | Threaded
Open this post in threaded view
|

Re: Putting international characters into output files

mmuetzel
In reply to this post by Ian McCallion
Which Operating System? Which version of Octave are you using?

Can you give the specific lines of code that you tried to use? Please
describe what you expected and what you got instead.

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

Re: Putting international characters into output files

Ian McCallion
My apologies for this too-hasty question. The Octave step correctly placed the pound sign into the file. The pound sign I was expecting was garbled by the second (non-Octave) step of the process I was creating and the Octave GUI apparently cannot show a pound sign, which completed my confusion.

I'm using octave 4.4.1 on Windows by the way.

Cheers... Ian


On Sun, 5 Aug 2018, 16:49 mmuetzel, <[hidden email]> wrote:
Which Operating System? Which version of Octave are you using?

Can you give the specific lines of code that you tried to use? Please
describe what you expected and what you got instead.

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html




Reply | Threaded
Open this post in threaded view
|

Re: Putting international characters into output files

mmuetzel
No worries.
The Pound symbol should display correctly in the command window (even on
Windows).
If I write the following in a script, select the whole line and execute
using F9, it shows up correctly. (It will be less tedious once Octave 5.0
will be available.):
t = "£"


Which steps are you doing that provoke the character not being shown? Are
you referring to the Command Window or to the Workspace panel? Or both?
I am interested in this because there currently are ongoing efforts to make
Octave more Unicode aware across all supported platforms.

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

Re: Putting international characters into output files

Ian McCallion
On 6 August 2018 at 08:42, mmuetzel <[hidden email]> wrote:
> No worries.
> The Pound symbol should display correctly in the command window (even on
> Windows).
> If I write the following in a script, select the whole line and execute
> using F9, it shows up correctly. (It will be less tedious once Octave 5.0
> will be available.):
> t = "£"
It does not work for me. If I copy your text and paste it into the
Octave GUI Command window
it shows as t = "". If I copy and paste it into the editor it shows correctly.

> Which steps are you doing that provoke the character not being shown? Are
> you referring to the Command Window or to the Workspace panel? Or both?
> I am interested in this because there currently are ongoing efforts to make
> Octave more Unicode aware across all supported platforms.

My original code, which edited an incoming file and produced an output
file worked apart from the screen output where the £ sign was
displayed as a black diamond containing a '?'.

I then wrote this function:
function [] = codes ()
  fid=fopen('codes.txt','w');
  for i=1:4:512
      fprintf(fid,'%d => %s %s %s %s\n',i, char(i), char(i+1),
char(i+2), char(i+3))
      fprintf(    '%d => %s %s %s %s\n',i, char(i), char(i+1),
char(i+2), char(i+3))
  end
  fclose(fid);
end

which has a £ sign in position 163 in the file but all characters
above code 127 show as the black diamond in the command window

Hope that helps.

Cheers... Ian


Reply | Threaded
Open this post in threaded view
|

Aw: Re: Putting international characters into output files

mmuetzel
> Gesendet: Montag, 06. August 2018 um 17:07 Uhr
> Von: "Ian McCallion" <[hidden email]>
> An: mmuetzel <[hidden email]>
> Cc: [hidden email]
> Betreff: Re: Putting international characters into output files
>
> On 6 August 2018 at 08:42, mmuetzel <[hidden email]> wrote:
> > No worries.
> > The Pound symbol should display correctly in the command window (even on
> > Windows).
> > If I write the following in a script, select the whole line and execute
> > using F9, it shows up correctly. (It will be less tedious once Octave 5.0
> > will be available.):
> > t = "£"
> It does not work for me. If I copy your text and paste it into the
> Octave GUI Command window
> it shows as t = "". If I copy and paste it into the editor it shows correctly.
>

Typing or pasting non-ASCII characters in the command window doesn't currently work on Windows. But execution from the editor (with F9) should make it display correctly in the results.

> > Which steps are you doing that provoke the character not being shown? Are
> > you referring to the Command Window or to the Workspace panel? Or both?
> > I am interested in this because there currently are ongoing efforts to make
> > Octave more Unicode aware across all supported platforms.
>
> My original code, which edited an incoming file and produced an output
> file worked apart from the screen output where the £ sign was
> displayed as a black diamond containing a '?'.
>
> I then wrote this function:
> function [] = codes ()
>   fid=fopen('codes.txt','w');
>   for i=1:4:512
>       fprintf(fid,'%d => %s %s %s %s\n',i, char(i), char(i+1),
> char(i+2), char(i+3))
>       fprintf(    '%d => %s %s %s %s\n',i, char(i), char(i+1),
> char(i+2), char(i+3))
>   end
>   fclose(fid);
> end
>
> which has a £ sign in position 163 in the file but all characters
> above code 127 show as the black diamond in the command window
>

Unlike the name might suggest, char can be used to hold any byte array. Not only strings.
Octave used UTF-8 as its internal encoding for strings. 128 (and above) doesn't correspond to a valid codepoint in UTF-8. You would have to use the corresponding double byte to see what you might expect. That's why you are seeing the replacement character.
Your editor probably uses a different codepage to display the file. That's why you are seeing the £ sign among others.

If that should be necessary you could use native2unicode to convert from any codepage (e.g. Windows-1252)  to UTF-8. And unicode2native to convert back.

But if you don't care about what Octave is displaying, it might be safe to don't mess with the encoding at all as long as all of your operations are on ASCII characters only (or you know how to treat characters at codepoints >127).

I don't think Octave is doing anything wrong here.

Hope this helps anyway.

Cheers
Markus


Reply | Threaded
Open this post in threaded view
|

Re: Re: Putting international characters into output files

Ian McCallion
On Mon, 6 Aug 2018, 23:36 "Markus Mützel", <[hidden email]> wrote:

Unlike the name might suggest, char can be used to hold any byte array. Not only strings.
Octave used UTF-8 as its internal encoding for strings. 128 (and above) doesn't correspond to a valid codepoint in UTF-8. You would have to use the corresponding double byte to see what you might expect. That's why you are seeing the replacement character.
Your editor probably uses a different codepage to display the file. That's why you are seeing the £ sign among others.

If that should be necessary you could use native2unicode to convert from any codepage (e.g. Windows-1252)  to UTF-8. And unicode2native to convert back.

But if you don't care about what Octave is displaying, it might be safe to don't mess with the encoding at all as long as all of your operations are on ASCII characters only (or you know how to treat characters at codepoints >127).

I don't think Octave is doing anything wrong here.

Hi Marcus,

Thanks for the explanation - really helpful. Based on that I agree Octave is acting correctly when executing my function. I'm less convinced it is acting correctly when it ignores non-ascii characters typed into the console though as I feel it should be possible to deduce the correct utf-8 code even if it then displays as unknown. However it is not a problem for me now I understand what is going on.

Cheers... Ian