Why .txt file becomes ANSI after removing BOM?
- Author
- Message
-
Offline
- Posts: 24
- Joined: Tue Nov 10, 2009 2:43 am
- Location: Beijing, China
- Contact:
Why .txt file becomes ANSI after removing BOM?
I use AkelPad to save a .txt file as UTF-8. The BOM is removed.
Later, I use Windows Notepad to open it. It shows its encoding is ANSI, not UTF-8. Why?
-
Offline
- Posts: 2247
- Joined: Tue Aug 07, 2007 2:03 pm
- Location: Vinnitsa, Ukraine
akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.
-
Offline
- Posts: 24
- Joined: Tue Nov 10, 2009 2:43 am
- Location: Beijing, China
- Contact:
Thank you.FeyFre wrote:akyahoo
Because latin 1 subset of UTF-8 encoding is egual to ANSI's latin-1 Encoding (Windows 1252): they both uses byte range [0-127]. And so texts UTF-8 w/o BOM and ANSI 1252 are binary equal. And Notepad has no any idea what was this or that text.
If you will try to to save UTF-8 text which contains Cyrillic chars w/o BOM, you`ll notice that those chars represented by two bytes, and the any smart-enought text editor(don't know if Notepad is) will analyze those pairs and correctly decide that text is in BOM-less UTF-8 encoding.