UTF-8 (no BOM) format support?
- Author
- Message
-
Offline
- Posts: 47
- Joined: Sat Jul 05, 2008 11:30 am
- Location: Odesa, Ukraine
Re: UTF-8 (no BOM) format support?
Diamen
It can be a problem to recognize UFT-8 files without BOM properly if there's only one line in the file (no line break(s) present), if the first multi-byte character appears late in the file, etc.
Nevertheless, my AkelPad recognizes your "test òà" saved as UFT-8 without BOM .txt pretty stably.
Also, there's an option in AkelPad that affects recognition quality:
Options - Settings... General [tab] - Codepage recognition [section]: Buffer: [size].
Increasing the buffer size could help.
It can be a problem to recognize UFT-8 files without BOM properly if there's only one line in the file (no line break(s) present), if the first multi-byte character appears late in the file, etc.
Nevertheless, my AkelPad recognizes your "test òà" saved as UFT-8 without BOM .txt pretty stably.
Also, there's an option in AkelPad that affects recognition quality:
Options - Settings... General [tab] - Codepage recognition [section]: Buffer: [size].
Increasing the buffer size could help.
-
Offline
- Posts: 165
- Joined: Fri Aug 15, 2008 8:58 am
Re: UTF-8 (no BOM) format support?
I save it as UTF-8 but Without BOM.
When I reopen Akelpad not recognize codepade and open as Ansi.
Also if I add line break or increment buffer never recognize utf-8 codepage without BOM.
Not problem with notepad.exe also without line break.
-
Offline
- Posts: 47
- Joined: Sat Jul 05, 2008 11:30 am
- Location: Odesa, Ukraine
Re: UTF-8 (no BOM) format support?
Diamen
There's way more than one way to arrange the options, including disabling automatic recognition entirely.
And your "test òà" file is still getting opened fine.
https://i.imgur.com/UVpGbvq.png
There's way more than one way to arrange the options, including disabling automatic recognition entirely.
And your "test òà" file is still getting opened fine.
https://i.imgur.com/UVpGbvq.png
-
Offline
- Posts: 165
- Joined: Fri Aug 15, 2008 8:58 am
Re: UTF-8 (no BOM) format support?
ewild
your work because default id utf-8.
But this will not recognize the ANSI files.
No problem with notepad.exe.
Save same file with accented as utf-8 no bom and ansi.
open them with akelpad and notepad.
your work because default id utf-8.
But this will not recognize the ANSI files.
No problem with notepad.exe.
Save same file with accented as utf-8 no bom and ansi.
open them with akelpad and notepad.
-
Offline
- Posts: 1292
- Joined: Thu Nov 16, 2006 11:53 am
- Location: Kyiv, Ukraine
Re: UTF-8 (no BOM) format support?
I think I understand. The file is too small and the two non-Latin characters do not make AkelPad to recognize the file's content as UTF-8.
We may ask Instructor to add a new option to the settings, such as "Treat non-recognized encoding as UTF-8 no BOM". Or rather "Prefer UTF-8 no BOM to ANSI" while detecting the encoding.
(The 2nd suggestion is more correct in terms of technical implementation. First AkelPad should check whether the file content satisfies the byte sequence of UTF-8 without BOM - and if it does, treat it as "UTF-8 no BOM", without further attempts to recognize an ANSI encoding).
-
Offline
- Posts: 165
- Joined: Fri Aug 15, 2008 8:58 am
Re: UTF-8 (no BOM) format support?
How notepad recognize also with small file?
If file content a 0xc3 byte ("À", "É", "ñ", ...) + (0x80-0xBF) it is much more likely, also if not need, to be UTF-8 and not ansi.
If file content a 0xc3 byte ("À", "É", "ñ", ...) + (0x80-0xBF) it is much more likely, also if not need, to be UTF-8 and not ansi.
-
Offline
- Posts: 1292
- Joined: Thu Nov 16, 2006 11:53 am
- Location: Kyiv, Ukraine
Re: UTF-8 (no BOM) format support?
According to Instructor, it's enough to specify just one setting to detect UTF-8 without BOM properly:
However, be sure to save your file as ether "UTF-8 no BOM" or "UTF-8"!
File saving is a manual action where you can specify the desired encoding.
If you specify an ANSI encoding (such as 1250, 1251, 1252, etc), AkelPad explicitly warns you:
Code: Select all
Default codepage:
65001 (UTF-8)
File saving is a manual action where you can specify the desired encoding.
If you specify an ANSI encoding (such as 1250, 1251, 1252, etc), AkelPad explicitly warns you:
Code: Select all
Line "1" contains symbols which will be lost at saving in this encoding. Continue?
-
Offline
- Posts: 165
- Joined: Fri Aug 15, 2008 8:58 am
Re: UTF-8 (no BOM) format support?
It's not enough to specify just this setting to detect UTF-8 without BOM properly.DV wrote: ↑Mon Jan 27, 2025 2:57 pm According to Instructor, it's enough to specify just one setting to detect UTF-8 without BOM properly:Code: Select all
Default codepage:65001 (UTF-8)
If I have a 1252 ANSI file and open it, AkelPad then load it as UTF-8 without BOM, not corretly as 1252 ANSI.
-
Offline
- Posts: 1292
- Joined: Thu Nov 16, 2006 11:53 am
- Location: Kyiv, Ukraine
Re: UTF-8 (no BOM) format support?
You are mixing 2 different things. Let's isolate them:
1. For proper detection of "UTF-8 without BOM", it is enough to set the "Default codepage" to "65001 (UTF-8)". It will detect "UTF-8 without BOM" properly.
2. For proper detection of ANSI encodings, you need to set the "Codepage recognition" to the proper value such as "Western European".
These are two different options that work independently.
When the "Default codepage" is set to "65001 (UTF-8)", AkelPad firstly checks whether a file being opened can be recognized as "UTF-8 without BOM". If it can, AkelPad identifies it as "UTF-8 without BOM" without further attempts to detect an ANSI encoding because this option literally says: "Treat files as UTF-8 by default".
If the file being opened contains bytes that can not be interpreted as UTF-8 sequence, then AkelPad uses the "Codepage recognition" setting to detect an ANSI encoding.
-
Offline
- Posts: 165
- Joined: Fri Aug 15, 2008 8:58 am
Re: UTF-8 (no BOM) format support?
I tryed your setting but not work.
"Default codepage" to "65001 (UTF-8)"
"Codepage recognition" to "Western European" or "none"
When I load a file 1252 Ansi with accentaded AkelPad load it as UTF-8 with chinese characters.
"Default codepage" to "65001 (UTF-8)"
"Codepage recognition" to "Western European" or "none"
When I load a file 1252 Ansi with accentaded AkelPad load it as UTF-8 with chinese characters.
-
Offline
- Posts: 1292
- Joined: Thu Nov 16, 2006 11:53 am
- Location: Kyiv, Ukraine
Re: UTF-8 (no BOM) format support?
You are correct, this needs to be addressed in AkelPad.
I've contacted Instructor regarding this issue.
-
Offline
- Site Admin
- Posts: 6403
- Joined: Thu Jul 06, 2006 7:20 am
-
Offline
- Posts: 165
- Joined: Fri Aug 15, 2008 8:58 am
Re: UTF-8 (no BOM) format support?
It seems that now it works well with:
Codepage recognition:
Western European (1252, OEM, UTF-8)
Default codepage:
65001 (UTF-8)
New file
65001 (UTF-8) wiyhout BOM
ty.
Codepage recognition:
Western European (1252, OEM, UTF-8)
Default codepage:
65001 (UTF-8)
New file
65001 (UTF-8) wiyhout BOM
ty.
-
Offline
- Posts: 165
- Joined: Fri Aug 15, 2008 8:58 am
Re: UTF-8 (no BOM) format support?
with this setting, if save a empty file as (UTF-8) wiyhout BOM, close, it reopen as ansi.