Page 1 of 1

Why .txt file becomes ANSI after removing BOM?

Posted: Wed Nov 11, 2009 11:35 am
by akyahoo
Image

I use AkelPad to save a .txt file as UTF-8. The BOM is removed.


Image
Later, I use Windows Notepad to open it. It shows its encoding is ANSI, not UTF-8. Why?

Posted: Wed Nov 11, 2009 1:02 pm
by FeyFre
akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.

Posted: Thu Nov 19, 2009 3:09 am
by akyahoo
FeyFre wrote:akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.
Thank you.