UTF-8 (no BOM) format support?

English main discussion
  • Author
  • Message
Offline
Posts: 47
Joined: Sat Jul 05, 2008 11:30 am
Location: Odesa, Ukraine

Re: UTF-8 (no BOM) format support?

Post by ewild »

Diamen
It can be a problem to recognize UFT-8 files without BOM properly if there's only one line in the file (no line break(s) present), if the first multi-byte character appears late in the file, etc.
Nevertheless, my AkelPad recognizes your "test òà" saved as UFT-8 without BOM .txt pretty stably.
Also, there's an option in AkelPad that affects recognition quality:
Options - Settings... General [tab] - Codepage recognition [section]: Buffer: [size].
Increasing the buffer size could help.

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

DV wrote: Wed Jan 22, 2025 6:30 pm To be sure these "ò" and "à" characters are properly saved and then properly displayed on any system, you should save your file as "UTF-8".
I save it as UTF-8 but Without BOM.
When I reopen Akelpad not recognize codepade and open as Ansi.
Also if I add line break or increment buffer never recognize utf-8 codepage without BOM.
Not problem with notepad.exe also without line break.

Offline
Posts: 47
Joined: Sat Jul 05, 2008 11:30 am
Location: Odesa, Ukraine

Re: UTF-8 (no BOM) format support?

Post by ewild »

Diamen
There's way more than one way to arrange the options, including disabling automatic recognition entirely.
And your "test òà" file is still getting opened fine.
https://i.imgur.com/UVpGbvq.png

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

ewild
your work because default id utf-8.
But this will not recognize the ANSI files.
No problem with notepad.exe.

Save same file with accented as utf-8 no bom and ansi.
open them with akelpad and notepad.

DV
Offline
Posts: 1292
Joined: Thu Nov 16, 2006 11:53 am
Location: Kyiv, Ukraine

Re: UTF-8 (no BOM) format support?

Post by DV »

Diamen wrote: Wed Jan 22, 2025 8:17 pmI save it as UTF-8 but Without BOM.
When I reopen Akelpad not recognize codepade and open as Ansi.
I think I understand. The file is too small and the two non-Latin characters do not make AkelPad to recognize the file's content as UTF-8.
We may ask Instructor to add a new option to the settings, such as "Treat non-recognized encoding as UTF-8 no BOM". Or rather "Prefer UTF-8 no BOM to ANSI" while detecting the encoding.
(The 2nd suggestion is more correct in terms of technical implementation. First AkelPad should check whether the file content satisfies the byte sequence of UTF-8 without BOM - and if it does, treat it as "UTF-8 no BOM", without further attempts to recognize an ANSI encoding).

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

How notepad recognize also with small file?
If file content a 0xc3 byte ("À", "É", "ñ", ...) + (0x80-0xBF) it is much more likely, also if not need, to be UTF-8 and not ansi.

DV
Offline
Posts: 1292
Joined: Thu Nov 16, 2006 11:53 am
Location: Kyiv, Ukraine

Re: UTF-8 (no BOM) format support?

Post by DV »

According to Instructor, it's enough to specify just one setting to detect UTF-8 without BOM properly:

Code: Select all

Default codepage:
65001 (UTF-8)
However, be sure to save your file as ether "UTF-8 no BOM" or "UTF-8"!
File saving is a manual action where you can specify the desired encoding.
If you specify an ANSI encoding (such as 1250, 1251, 1252, etc), AkelPad explicitly warns you:

Code: Select all

Line "1" contains symbols which will be lost at saving in this encoding. Continue?

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

DV wrote: Mon Jan 27, 2025 2:57 pm According to Instructor, it's enough to specify just one setting to detect UTF-8 without BOM properly:

Code: Select all

Default codepage:65001 (UTF-8)
It's not enough to specify just this setting to detect UTF-8 without BOM properly.
If I have a 1252 ANSI file and open it, AkelPad then load it as UTF-8 without BOM, not corretly as 1252 ANSI.

DV
Offline
Posts: 1292
Joined: Thu Nov 16, 2006 11:53 am
Location: Kyiv, Ukraine

Re: UTF-8 (no BOM) format support?

Post by DV »

Diamen wrote: Tue Jan 28, 2025 11:40 amIt's not enough to specify just this setting to detect UTF-8 without BOM properly.
If I have a 1252 ANSI file and open it, AkelPad then load it as UTF-8 without BOM, not corretly as 1252 ANSI.
You are mixing 2 different things. Let's isolate them:
1. For proper detection of "UTF-8 without BOM", it is enough to set the "Default codepage" to "65001 (UTF-8)". It will detect "UTF-8 without BOM" properly.
2. For proper detection of ANSI encodings, you need to set the "Codepage recognition" to the proper value such as "Western European".

These are two different options that work independently.
When the "Default codepage" is set to "65001 (UTF-8)", AkelPad firstly checks whether a file being opened can be recognized as "UTF-8 without BOM". If it can, AkelPad identifies it as "UTF-8 without BOM" without further attempts to detect an ANSI encoding because this option literally says: "Treat files as UTF-8 by default".
If the file being opened contains bytes that can not be interpreted as UTF-8 sequence, then AkelPad uses the "Codepage recognition" setting to detect an ANSI encoding.

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

I tryed your setting but not work.
"Default codepage" to "65001 (UTF-8)"
"Codepage recognition" to "Western European" or "none"
When I load a file 1252 Ansi with accentaded AkelPad load it as UTF-8 with chinese characters.

DV
Offline
Posts: 1292
Joined: Thu Nov 16, 2006 11:53 am
Location: Kyiv, Ukraine

Re: UTF-8 (no BOM) format support?

Post by DV »

Diamen wrote: Tue Jan 28, 2025 10:35 pm I tryed your setting but not work.
"Default codepage" to "65001 (UTF-8)"
"Codepage recognition" to "Western European" or "none"
When I load a file 1252 Ansi with accentaded AkelPad load it as UTF-8 with chinese characters.
You are correct, this needs to be addressed in AkelPad.
I've contacted Instructor regarding this issue.

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

ty

Offline
Site Admin
Posts: 6403
Joined: Thu Jul 06, 2006 7:20 am

Re: UTF-8 (no BOM) format support?

Post by Instructor »

Diamen
Test version x86 / x64

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

It seems that now it works well with:
Codepage recognition:
Western European (1252, OEM, UTF-8)
Default codepage:
65001 (UTF-8)
New file
65001 (UTF-8) wiyhout BOM
ty.

Offline
Posts: 165
Joined: Fri Aug 15, 2008 8:58 am

Re: UTF-8 (no BOM) format support?

Post by Diamen »

with this setting, if save a empty file as (UTF-8) wiyhout BOM, close, it reopen as ansi.
Post Reply