Why .txt file becomes ANSI after removing BOM?

English main discussion
Post Reply
  • Author
  • Message
Offline
Posts: 24
Joined: Tue Nov 10, 2009 2:43 am
Location: Beijing, China
Contact:

Why .txt file becomes ANSI after removing BOM?

Post by akyahoo »

Image

I use AkelPad to save a .txt file as UTF-8. The BOM is removed.


Image
Later, I use Windows Notepad to open it. It shows its encoding is ANSI, not UTF-8. Why?

Offline
Posts: 2247
Joined: Tue Aug 07, 2007 2:03 pm
Location: Vinnitsa, Ukraine

Post by FeyFre »

akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.

Offline
Posts: 24
Joined: Tue Nov 10, 2009 2:43 am
Location: Beijing, China
Contact:

Post by akyahoo »

FeyFre wrote:akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.
Thank you.
Post Reply