Page 2 of 3
Re: UTF-8 (no BOM) format support?
Posted: Wed Jan 22, 2025 6:58 pm
by ewild
Diamen
It can be a problem to recognize UFT-8 files without BOM properly if there's only one line in the file (no line break(s) present), if the first multi-byte character appears late in the file, etc.
Nevertheless, my AkelPad recognizes your "test òà" saved as UFT-8 without BOM .txt pretty stably.
Also, there's an option in AkelPad that affects recognition quality:
Options - Settings... General [tab] - Codepage recognition [section]: Buffer: [size].
Increasing the buffer size could help.
Re: UTF-8 (no BOM) format support?
Posted: Wed Jan 22, 2025 8:17 pm
by Diamen
DV wrote: ↑Wed Jan 22, 2025 6:30 pm
To be sure these "ò" and "à" characters are properly saved and then properly displayed on any system, you should save your file as "UTF-8".
I save it as UTF-8 but Without BOM.
When I reopen Akelpad not recognize codepade and open as Ansi.
Also if I add line break or increment buffer never recognize utf-8 codepage without BOM.
Not problem with notepad.exe also without line break.
Re: UTF-8 (no BOM) format support?
Posted: Wed Jan 22, 2025 8:36 pm
by ewild
Diamen
There's way more than one way to arrange the options, including disabling automatic recognition entirely.
And your "test òà" file is still getting opened fine.
https://i.imgur.com/UVpGbvq.png
Re: UTF-8 (no BOM) format support?
Posted: Wed Jan 22, 2025 9:00 pm
by Diamen
ewild
your work because default id utf-8.
But this will not recognize the ANSI files.
No problem with notepad.exe.
Save same file with accented as utf-8 no bom and ansi.
open them with akelpad and notepad.
Re: UTF-8 (no BOM) format support?
Posted: Wed Jan 22, 2025 10:51 pm
by DV
Diamen wrote: ↑Wed Jan 22, 2025 8:17 pmI save it as UTF-8 but Without BOM.
When I reopen Akelpad not recognize codepade and open as Ansi.
I think I understand. The file is too small and the two non-Latin characters do not make AkelPad to recognize the file's content as UTF-8.
We may ask
Instructor to add a new option to the settings, such as "Treat non-recognized encoding as UTF-8 no BOM". Or rather "Prefer UTF-8 no BOM to ANSI" while detecting the encoding.
(The 2nd suggestion is more correct in terms of technical implementation. First AkelPad should check whether the file content satisfies the byte sequence of UTF-8 without BOM - and if it does, treat it as "UTF-8 no BOM", without further attempts to recognize an ANSI encoding).
Re: UTF-8 (no BOM) format support?
Posted: Thu Jan 23, 2025 5:52 am
by Diamen
How notepad recognize also with small file?
If file content a 0xc3 byte ("À", "É", "ñ", ...) + (0x80-0xBF) it is much more likely, also if not need, to be UTF-8 and not ansi.
Re: UTF-8 (no BOM) format support?
Posted: Mon Jan 27, 2025 2:57 pm
by DV
According to
Instructor, it's enough to specify just one setting to detect UTF-8 without BOM properly:
However, be sure to
save your file as ether "UTF-8 no BOM" or "UTF-8"!
File saving is a manual action where you can specify the desired encoding.
If you specify an ANSI encoding (such as 1250, 1251, 1252, etc), AkelPad explicitly warns you:
Code: Select all
Line "1" contains symbols which will be lost at saving in this encoding. Continue?
Re: UTF-8 (no BOM) format support?
Posted: Tue Jan 28, 2025 11:40 am
by Diamen
DV wrote: ↑Mon Jan 27, 2025 2:57 pm
According to
Instructor, it's enough to specify just one setting to detect UTF-8 without BOM properly:
It's not enough to specify just this setting to detect UTF-8 without BOM properly.
If I have a 1252 ANSI file and open it, AkelPad then load it as UTF-8 without BOM, not corretly as 1252 ANSI.
Re: UTF-8 (no BOM) format support?
Posted: Tue Jan 28, 2025 6:41 pm
by DV
Diamen wrote: ↑Tue Jan 28, 2025 11:40 amIt's not enough to specify just this setting to detect UTF-8 without BOM properly.
If I have a 1252 ANSI file and open it, AkelPad then load it as UTF-8 without BOM, not corretly as 1252 ANSI.
You are mixing 2 different things. Let's isolate them:
1. For proper detection of "UTF-8 without BOM", it is enough to set the "Default codepage" to "65001 (UTF-8)". It will detect "UTF-8 without BOM" properly.
2. For proper detection of ANSI encodings, you need to set the "Codepage recognition" to the proper value such as "Western European".
These are two different options that work independently.
When the "Default codepage" is set to "65001 (UTF-8)", AkelPad firstly checks whether a file being opened can be recognized as "UTF-8 without BOM". If it can, AkelPad identifies it as "UTF-8 without BOM" without further attempts to detect an ANSI encoding because this option literally says: "Treat files as UTF-8 by default".
If the file being opened contains bytes that can not be interpreted as UTF-8 sequence, then AkelPad uses the "Codepage recognition" setting to detect an ANSI encoding.
Re: UTF-8 (no BOM) format support?
Posted: Tue Jan 28, 2025 10:35 pm
by Diamen
I tryed your setting but not work.
"Default codepage" to "65001 (UTF-8)"
"Codepage recognition" to "Western European" or "none"
When I load a file 1252 Ansi with accentaded AkelPad load it as UTF-8 with chinese characters.
Re: UTF-8 (no BOM) format support?
Posted: Fri Jan 31, 2025 10:52 am
by DV
Diamen wrote: ↑Tue Jan 28, 2025 10:35 pm
I tryed your setting but not work.
"Default codepage" to "65001 (UTF-8)"
"Codepage recognition" to "Western European" or "none"
When I load a file 1252 Ansi with accentaded AkelPad load it as UTF-8 with chinese characters.
You are correct, this needs to be addressed in AkelPad.
I've contacted
Instructor regarding this issue.
Re: UTF-8 (no BOM) format support?
Posted: Fri Jan 31, 2025 3:14 pm
by Diamen
ty
Re: UTF-8 (no BOM) format support?
Posted: Mon Feb 24, 2025 2:52 pm
by Instructor
Diamen
Test version
x86 /
x64
Re: UTF-8 (no BOM) format support?
Posted: Tue Feb 25, 2025 4:11 pm
by Diamen
It seems that now it works well with:
Codepage recognition:
Western European (1252, OEM, UTF-8)
Default codepage:
65001 (UTF-8)
New file
65001 (UTF-8) wiyhout BOM
ty.
Re: UTF-8 (no BOM) format support?
Posted: Wed Feb 26, 2025 4:40 am
by Diamen
with this setting, if save a empty file as (UTF-8) wiyhout BOM, close, it reopen as ansi.