View previous topic :: View next topic |
Author |
Message |
akyahoo
Joined: 10 Nov 2009 Posts: 24 Location: Beijing, China
|
Posted: Wed Nov 11, 2009 11:35 am Post subject: Why .txt file becomes ANSI after removing BOM? |
|
|
I use AkelPad to save a .txt file as UTF-8. The BOM is removed.
Later, I use Windows Notepad to open it. It shows its encoding is ANSI, not UTF-8. Why? |
|
Back to top |
|
 |
FeyFre
Joined: 07 Aug 2007 Posts: 2240 Location: Vinnitsa, Ukraine
|
Posted: Wed Nov 11, 2009 1:02 pm Post subject: |
|
|
akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding. |
|
Back to top |
|
 |
akyahoo
Joined: 10 Nov 2009 Posts: 24 Location: Beijing, China
|
Posted: Thu Nov 19, 2009 3:09 am Post subject: |
|
|
FeyFre wrote: | akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding. |
Thank you. |
|
Back to top |
|
 |
|