| View previous topic :: View next topic |
| Author |
Message |
harfman
Joined: 14 Jan 2009 Posts: 14
|
Posted: Wed Jan 14, 2009 7:55 am Post subject: utf-8 recognition |
|
|
Hi
thanks for your new version feature
Added: Chinese recognition (UTF-8).
but korean utf-8 character auto recognition is still unavailable
if you want test, visit http://www.cineast.co.kr/ and click view source
then korean utf-8 characters will be shown in broken status |
|
| Back to top |
|
 |
Instructor Site Admin
Joined: 06 Jul 2006 Posts: 4644
|
Posted: Wed Jan 14, 2009 9:36 am Post subject: |
|
|
| Test version for Japanese and Korean codepage recognition. |
|
| Back to top |
|
 |
lupin1984
Joined: 07 May 2007 Posts: 20
|
Posted: Wed Jan 14, 2009 10:53 am Post subject: |
|
|
don't work
if the default codepage is utf-8 ,the codepage recognition don't work(you can choose none,cyrilic,latin,chinese)
the no-bom utf-8 text can be auto recognized
but i don't use utf-8 always  |
|
| Back to top |
|
 |
harfman
Joined: 14 Jan 2009 Posts: 14
|
Posted: Wed Jan 14, 2009 11:04 am Post subject: |
|
|
thanaks your fast reply
test version 4.14 still don't works for korean utf-8 charaters |
|
| Back to top |
|
 |
Instructor Site Admin
Joined: 06 Jul 2006 Posts: 4644
|
Posted: Wed Jan 14, 2009 11:19 am Post subject: |
|
|
lupin1984 & harfman
1. turn on "Options->Settings...->General->Codepage recognition->Chinese or Korean".
2. turn off "Options->Settings...->Registry->Remember code page" (not necessary, but for clean results).
3. change default codepage to your native (if you change it). Don't use UTF-8 as your default ANSI codepage.
"Options->Settings...->General->Default codepage"
4. open file again.
Note:
File must have been not too small. |
|
| Back to top |
|
 |
harfman
Joined: 14 Jan 2009 Posts: 14
|
Posted: Wed Jan 14, 2009 12:46 pm Post subject: |
|
|
| Ok it works well, thanks for your efforts |
|
| Back to top |
|
 |
u_u86
Joined: 09 Jul 2008 Posts: 16
|
Posted: Fri Jan 16, 2009 3:32 pm Post subject: |
|
|
| Since the work go this way, it is possible to add recognition of Turkish codepage (ANSI 1254)? If you need any information about, feel free to ask. |
|
| Back to top |
|
 |
Instructor Site Admin
Joined: 06 Jul 2006 Posts: 4644
|
Posted: Fri Jan 16, 2009 4:48 pm Post subject: |
|
|
u_u86
Test version "Turkish (OEM, UTF-8)". |
|
| Back to top |
|
 |
u_u86
Joined: 09 Jul 2008 Posts: 16
|
Posted: Fri Jan 16, 2009 5:27 pm Post subject: |
|
|
Don't work for me. And what about (ANSI 1254)? Example: Turkish.rc akelpad language resource file in cp1254, when opening (default cp set to cp1251 or 1252) i want to automaticaly open it in cp1254.
The possible workaround with default cp set to cp1254, and recognize cp1251 also don't work - text always reconized as cp1251.
File with turkish text: http://www.box.net/shared/iio2nq3dum
Only difference between 1254 and 1252 - ~6 chars |
|
| Back to top |
|
 |
Instructor Site Admin
Joined: 06 Jul 2006 Posts: 4644
|
Posted: Fri Jan 16, 2009 6:43 pm Post subject: |
|
|
| u_u86 wrote: | | Don't work for me. | I hope you understand that you must turn on "Options->Settings...->General->Codepage recognition->Turkish (OEM, UTF-8)". As I wrote in this thread before.
| u_u86 wrote: | | ... i want to automaticaly open it in cp1254 | It will be worked as you want only if you set 1254 as your default codepage. |
|
| Back to top |
|
 |
u_u86
Joined: 09 Jul 2008 Posts: 16
|
Posted: Sat Jan 17, 2009 3:23 am Post subject: |
|
|
Of cause, i set.
If i set cp1254 as default it always open all files in that cp, what reason for recognition? Only to recognize UTF-8 and OEM?
Cyrillic recognition works well when default cp set to 1252 or 1254, and recognize to 1251. May be it has some algorithms? |
|
| Back to top |
|
 |
Instructor Site Admin
Joined: 06 Jul 2006 Posts: 4644
|
Posted: Sat Jan 17, 2009 5:26 am Post subject: |
|
|
| u_u86 wrote: | | Only to recognize UTF-8 and OEM? | Yes. |
|
| Back to top |
|
 |
u_u86
Joined: 09 Jul 2008 Posts: 16
|
Posted: Sat Jan 17, 2009 6:12 am Post subject: |
|
|
| Ok, understand. Thanks for implementation! |
|
| Back to top |
|
 |
lupin1984
Joined: 07 May 2007 Posts: 20
|
|
| Back to top |
|
 |
Instructor Site Admin
Joined: 06 Jul 2006 Posts: 4644
|
Posted: Fri Jan 23, 2009 3:42 am Post subject: |
|
|
lupin1984
Do you get my answers on your emails? Try to read...
| Quote: | i test on xp and vista
vista is ok ,but xp...
thanks
|
This one detected correctly. Make sure you make all this steps on XP:
1. turn on "Options->Settings...->General->Codepage recognition->Chinese".
2. turn off "Options->Settings...->Registry->Remember code page" (not necessary, but for clean results).
3. change default codepage to your native (if you change it). Don't use UTF-8 as your default ANSI codepage.
"Options->Settings...->General->Default codepage"
4. open file again.
| Quote: | | this text file can't be detected |
This one is to small (has not much Chinese characters) for detection as UTF-8. Try to copy contents and it will detected correctly:
| Code: | 测试文本thanks谢谢
18:42 2009/1/13
测试文本thanks谢谢
18:42 2009/1/13 |
| Quote: | it's firefox's simple chinese lang package
all no-bom text files,the pageInfo.properties can't be auto detected . you can test |
Increase recognition buffer, for example to 8096:
"Options->Settings...->General->Buffer" |
|
| Back to top |
|
 |
|