Why some .txt files cannot be displayed correctly?

English main discussion
Post Reply
  • Author
  • Message
Offline
Posts: 24
Joined: Tue Nov 10, 2009 2:43 am
Location: Beijing, China
Contact:

Why some .txt files cannot be displayed correctly?

Post by akyahoo »

If Codepage recognition: is set to None, then AkelPad can detect (subject to sufficient buffer size) only Unicode files:

with BOM (byte order mark) present in file
UTF-16LE or UTF-16BE without BOM present.
The Buffer: value is the number of characters to test in the recognition algorithm. This must be set to a sufficient number to discern the codepage using the internal algorithm. In order to correctly determine the codepage or Unicode type, this minimum buffer size also varies somewhat with the size of file.

==========================================================


I have checked this manual. The CodePage Recognizion of my AlekPad is set to None.

Sometimes, it cannot display Chinese charaters correctly. Most Chinese characters use GB2312, GBK or BIG5 as the encoding, not UTF-8. :roll:


Is it this problem? What should I do if AlekPad cannot display characters? :?:

Offline
Posts: 147
Joined: Fri Feb 08, 2008 6:41 pm
Location: British Columbia, Canada

Post by Surveyor »

akyahoo wrote:Sometimes, it cannot display Chinese charaters correctly.
I have not checked with Instructor on this topic, but perhaps you already know that Windows contains (depending on your installation) MANY codepages. Often codepages contain the same characters as several others, with some unique characters. It is possible that your text files do not use enough characters unique to the codepage, and so the AkelPad algorithm cannot make the determination.

If you are not getting the correct codepage, perhaps it is better to load the file using the codepage that you know is correct. If you enable the option "Options/Settings.../Registry/Remember code page", AkelPad tries to keep track of the codepage used for each file, but only for the files on the "Recent files" list; this may help a little.

I'm not sure that this behavior is a failing in AkelPad - it probably just points out the complexities in language.

Also, the "Options/Settings.../General" page contains settings for a default codepage - if you always work in the same codepage, use that as a default.

Offline
Posts: 26
Joined: Sun Mar 02, 2008 12:53 pm

Re: Why some .txt files cannot be displayed correctly?

Post by infimum »

akyahoo wrote:I have checked this manual. The CodePage Recognizion of my AlekPad is set to None.

Sometimes, it cannot display Chinese charaters correctly. Most Chinese characters use GB2312, GBK or BIG5 as the encoding, not UTF-8. :roll:


Is it this problem? What should I do if AlekPad cannot display characters? :?:
AkelPad doesn't automatically recognize those encodings of Chinese.

Automatic recognition actually involves a very intricate algorithm. Considering how many encodings there are, it's too much to ask of a free program. That's one of the reasons they invented a thing called unicode.

Actually, 100% accuracy is almost impossible. Many shareware programs aren't good either in this regard. It's rare to find programmers who are well versed in both programming and natural languages :wink:

Offline
Posts: 24
Joined: Tue Nov 10, 2009 2:43 am
Location: Beijing, China
Contact:

Post by akyahoo »

Thank you.
Post Reply