Why .txt file becomes ANSI after removing BOM?

akyahoo · Post by **akyahoo** » Wed Nov 11, 2009 11:35 am

I use AkelPad to save a .txt file as UTF-8. The BOM is removed.

Later, I use Windows Notepad to open it. It shows its encoding is ANSI, not UTF-8. Why?

FeyFre · Post by **FeyFre** » Wed Nov 11, 2009 1:02 pm

akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.

akyahoo · Post by **akyahoo** » Thu Nov 19, 2009 3:09 am

FeyFre wrote:akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.

Thank you.