utf-8 recognition

English main discussion
Post Reply
  • Author
  • Message
Offline
Posts: 15
Joined: Wed Jan 14, 2009 7:45 am

utf-8 recognition

Post by harfman »

Hi

thanks for your new version feature

Added: Chinese recognition (UTF-8).

but korean utf-8 character auto recognition is still unavailable

if you want test, visit http://www.cineast.co.kr/ and click view source

then korean utf-8 characters will be shown in broken status

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

Test version for Japanese and Korean codepage recognition.

Offline
Posts: 20
Joined: Mon May 07, 2007 6:14 pm

Post by lupin1984 »

don't work :(

if the default codepage is utf-8 ,the codepage recognition don't work(you can choose none,cyrilic,latin,chinese)

the no-bom utf-8 text can be auto recognized :D

but i don't use utf-8 always :)

Offline
Posts: 15
Joined: Wed Jan 14, 2009 7:45 am

Post by harfman »

thanaks your fast reply

test version 4.14 still don't works for korean utf-8 charaters

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

lupin1984 & harfman
1. turn on "Options->Settings...->General->Codepage recognition->Chinese or Korean".
2. turn off "Options->Settings...->Registry->Remember code page" (not necessary, but for clean results).
3. change default codepage to your native (if you change it). Don't use UTF-8 as your default ANSI codepage.
"Options->Settings...->General->Default codepage"
4. open file again.

Note:
File must have been not too small.

Offline
Posts: 15
Joined: Wed Jan 14, 2009 7:45 am

Post by harfman »

Ok it works well, thanks for your efforts

Offline
Posts: 16
Joined: Wed Jul 09, 2008 7:04 am

Post by u_u86 »

Since the work go this way, it is possible to add recognition of Turkish codepage (ANSI 1254)? If you need any information about, feel free to ask.

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

u_u86
Test version "Turkish (OEM, UTF-8)".

Offline
Posts: 16
Joined: Wed Jul 09, 2008 7:04 am

Post by u_u86 »

Instructor wrote:u_u86
Test version "Turkish (OEM, UTF-8)".
Don't work for me. And what about (ANSI 1254)? Example: Turkish.rc akelpad language resource file in cp1254, when opening (default cp set to cp1251 or 1252) i want to automaticaly open it in cp1254.

The possible workaround with default cp set to cp1254, and recognize cp1251 also don't work - text always reconized as cp1251.

File with turkish text: http://www.box.net/shared/iio2nq3dum
Only difference between 1254 and 1252 - ~6 chars

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

u_u86 wrote:Don't work for me.
I hope you understand that you must turn on "Options->Settings...->General->Codepage recognition->Turkish (OEM, UTF-8)". As I wrote in this thread before.
u_u86 wrote:... i want to automaticaly open it in cp1254
It will be worked as you want only if you set 1254 as your default codepage.

Offline
Posts: 16
Joined: Wed Jul 09, 2008 7:04 am

Post by u_u86 »

Of cause, i set.
If i set cp1254 as default it always open all files in that cp, what reason for recognition? Only to recognize UTF-8 and OEM?

Cyrillic recognition works well when default cp set to 1252 or 1254, and recognize to 1251. May be it has some algorithms?

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

u_u86 wrote:Only to recognize UTF-8 and OEM?
Yes.

Offline
Posts: 16
Joined: Wed Jul 09, 2008 7:04 am

Post by u_u86 »

Ok, understand. Thanks for implementation!

Offline
Posts: 20
Joined: Mon May 07, 2007 6:14 pm

Post by lupin1984 »

you can test the two software , no utf-8 recognition problem, perfect

but akelpad is faster and lighter , efficient :D

they are open source software , thanks

Notepad++
http://notepad-plus.sourceforge.net/uk/site.htm

notepad2
http://www.flos-freeware.ch/notepad2.html

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

lupin1984
Do you get my answers on your emails? Try to read...
i test on xp and vista

vista is ok ,but xp...

thanks
This one detected correctly. Make sure you make all this steps on XP:
1. turn on "Options->Settings...->General->Codepage recognition->Chinese".
2. turn off "Options->Settings...->Registry->Remember code page" (not necessary, but for clean results).
3. change default codepage to your native (if you change it). Don't use UTF-8 as your default ANSI codepage.
"Options->Settings...->General->Default codepage"
4. open file again.
this text file can't be detected
This one is to small (has not much Chinese characters) for detection as UTF-8. Try to copy contents and it will detected correctly:

Code: Select all

测试文本thanks谢谢
18:42 2009/1/13
测试文本thanks谢谢
18:42 2009/1/13
it's firefox's simple chinese lang package

all no-bom text files,the pageInfo.properties can't be auto detected . you can test
Increase recognition buffer, for example to 8096:

"Options->Settings...->General->Buffer"
Post Reply