Page 1 of 1
utf-8 recognition
Posted: Wed Jan 14, 2009 7:55 am
by harfman
Hi
thanks for your new version feature
Added: Chinese recognition (UTF-8).
but korean utf-8 character auto recognition is still unavailable
if you want test, visit
http://www.cineast.co.kr/ and click view source
then korean utf-8 characters will be shown in broken status
Posted: Wed Jan 14, 2009 9:36 am
by Instructor
Test version for Japanese and Korean codepage recognition.
Posted: Wed Jan 14, 2009 10:53 am
by lupin1984
don't work
if the default codepage is utf-8 ,the codepage recognition don't work(you can choose none,cyrilic,latin,chinese)
the no-bom utf-8 text can be auto recognized
but i don't use utf-8 always

Posted: Wed Jan 14, 2009 11:04 am
by harfman
thanaks your fast reply
test version 4.14 still don't works for korean utf-8 charaters
Posted: Wed Jan 14, 2009 11:19 am
by Instructor
lupin1984 & harfman
1. turn on "Options->Settings...->General->Codepage recognition->Chinese or Korean".
2. turn off "Options->Settings...->Registry->Remember code page" (not necessary, but for clean results).
3. change default codepage to your native (if you change it). Don't use UTF-8 as your default ANSI codepage.
"Options->Settings...->General->Default codepage"
4. open file again.
Note:
File must have been not too small.
Posted: Wed Jan 14, 2009 12:46 pm
by harfman
Ok it works well, thanks for your efforts
Posted: Fri Jan 16, 2009 3:32 pm
by u_u86
Since the work go this way, it is possible to add recognition of Turkish codepage (ANSI 1254)? If you need any information about, feel free to ask.
Posted: Fri Jan 16, 2009 4:48 pm
by Instructor
u_u86
Test version "Turkish (OEM, UTF-8)".
Posted: Fri Jan 16, 2009 5:27 pm
by u_u86
Don't work for me. And what about (ANSI 1254)? Example: Turkish.rc akelpad language resource file in cp1254, when opening (default cp set to cp1251 or 1252) i want to automaticaly open it in cp1254.
The possible workaround with default cp set to cp1254, and recognize cp1251 also don't work - text always reconized as cp1251.
File with turkish text:
http://www.box.net/shared/iio2nq3dum
Only difference between 1254 and 1252 - ~6 chars
Posted: Fri Jan 16, 2009 6:43 pm
by Instructor
u_u86 wrote:Don't work for me.
I hope you understand that you must turn on "Options->Settings...->General->Codepage recognition->Turkish (OEM, UTF-8)". As I wrote in this thread before.
u_u86 wrote:... i want to automaticaly open it in cp1254
It will be worked as you want only if you set 1254 as your default codepage.
Posted: Sat Jan 17, 2009 3:23 am
by u_u86
Of cause, i set.
If i set cp1254 as default it always open all files in that cp, what reason for recognition? Only to recognize UTF-8 and OEM?
Cyrillic recognition works well when default cp set to 1252 or 1254, and recognize to 1251. May be it has some algorithms?
Posted: Sat Jan 17, 2009 5:26 am
by Instructor
u_u86 wrote:Only to recognize UTF-8 and OEM?
Yes.
Posted: Sat Jan 17, 2009 6:12 am
by u_u86
Ok, understand. Thanks for implementation!
Posted: Fri Jan 23, 2009 2:50 am
by lupin1984
you can test the two software , no utf-8 recognition problem, perfect
but akelpad is faster and lighter , efficient
they are open source software , thanks
Notepad++
http://notepad-plus.sourceforge.net/uk/site.htm
notepad2
http://www.flos-freeware.ch/notepad2.html
Posted: Fri Jan 23, 2009 3:42 am
by Instructor
lupin1984
Do you get my answers on your emails? Try to read...
i test on xp and vista
vista is ok ,but xp...
thanks
This one detected correctly. Make sure you make all this steps on XP:
1. turn on "Options->Settings...->General->Codepage recognition->Chinese".
2. turn off "Options->Settings...->Registry->Remember code page" (not necessary, but for clean results).
3. change default codepage to your native (if you change it).
Don't use UTF-8 as your default ANSI codepage.
"Options->Settings...->General->Default codepage"
4. open file again.
this text file can't be detected
This one is to small (has not much Chinese characters) for detection as UTF-8. Try to copy contents and it will detected correctly:
Code: Select all
测试文本thanks谢谢
18:42 2009/1/13
测试文本thanks谢谢
18:42 2009/1/13
it's firefox's simple chinese lang package
all no-bom text files,the pageInfo.properties can't be auto detected . you can test
Increase recognition buffer, for example to 8096:
"Options->Settings...->General->Buffer"