AkelPad Forum Index AkelPad
Support forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Search using regular expressions

 
Post new topic   Reply to topic    AkelPad Forum Index -> Discussion (English)
View previous topic :: View next topic  
Author Message
nbsp



Joined: 17 Jan 2016
Posts: 19

PostPosted: Wed Feb 03, 2016 4:58 am    Post subject: Search using regular expressions Reply with quote

1. If I use the Find dialog to search for a regex such as [^\x00-\x7F], AkelPad finds newline characters in addition to any non-ASCII characters. The same regex using "FindReplace.js" finds only non-ASCII characters.

2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
Back to top
View user's profile Send private message
Instructor
Site Admin


Joined: 06 Jul 2006
Posts: 6188

PostPosted: Sun Feb 21, 2016 9:18 am    Post subject: Reply with quote

nbsp
1. AkelPad's new lines have negative values internally. Use [^\x00-\x7F\n].
Back to top
View user's profile Send private message Send e-mail
Instructor
Site Admin


Joined: 06 Jul 2006
Posts: 6188

PostPosted: Mon Feb 22, 2016 6:32 am    Post subject: Reply with quote

2. Test version
Back to top
View user's profile Send private message Send e-mail
nbsp



Joined: 17 Jan 2016
Posts: 19

PostPosted: Sun Mar 06, 2016 10:25 pm    Post subject: Reply with quote

Instructor
Thanks. The test version works fine.
Back to top
View user's profile Send private message
KDJ



Joined: 06 Mar 2010
Posts: 1907
Location: Poland

PostPosted: Sat Apr 23, 2016 4:57 pm    Post subject: Reply with quote

Instructor
What is hex code of any new line (\n)? Smile
Code:
[\x{0}-\x{FF}]         - no matches \n
[\x{100}-\x{FFFF}]     - matches \n, so hex code is comprised between 100 and FFFF
[\x{10000}-\x{10FFFF}] - no matches \n
[\x{100}-\x{FFF8}]     - no matches \n
[\x{FFFA}-\x{FFFF}]    - no matches \n
[\x{FFF9}-\x{FFF9}]    - matches \n, so hex code is probably equal FFF9
\x{FFF9}               - no matches \n, unfortunately hex code is not equal FFF9
Back to top
View user's profile Send private message
KDJ



Joined: 06 Mar 2010
Posts: 1907
Location: Poland

PostPosted: Sat Apr 23, 2016 6:45 pm    Post subject: Re: Search using regular expressions Reply with quote

nbsp wrote:
2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.

It seems to me that (q?)b\1 should match only b and qbq (and not qb).
SearchReplace.js does not match qb.
Back to top
View user's profile Send private message
Instructor
Site Admin


Joined: 06 Jul 2006
Posts: 6188

PostPosted: Sat Apr 23, 2016 10:05 pm    Post subject: Reply with quote

KDJ wrote:
What is hex code of any new line (\n)? Smile
Code:
\x{7FFFFFEC}


KDJ wrote:
It seems to me that (q?)b\1 should match only b and qbq (and not qb).
Test version
Back to top
View user's profile Send private message Send e-mail
KDJ



Joined: 06 Mar 2010
Posts: 1907
Location: Poland

PostPosted: Sun Apr 24, 2016 1:31 pm    Post subject: Reply with quote

Instructor
Now (q?)b\1 is OK, but still [\x{100}-\x{FFFF}] matches the new line.
Back to top
View user's profile Send private message
nbsp



Joined: 17 Jan 2016
Posts: 19

PostPosted: Mon May 16, 2016 1:22 am    Post subject: Re: Search using regular expressions Reply with quote

Hi KDJ,
nbsp wrote:
The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.

KDJ wrote:
SearchReplace.js does not match qb.

Try this:
    1. Create a new file with the following content:
    Code:
    b
    qb
    qbq

    2. Execute "SearchReplace.js" and set:
    Code:
    What: (q?)b\1
    With: @
    Check: Regular expressions
    Direction: Beginning

    3. Clicking "Find all" yields the following on my system:
    Log::Output wrote:
    3:
    (1,1) b
    (2,2) qb
    (3,1) qbq

    4. Clicking "Replace all" changes the new file created in step 1 to:
    Code:
    @
    q@
    @

KDJ wrote:
It seems to me that (q?)b\1 should match only b and qbq (and not qb).

This is also what I have thought when I initially played with this regex. I spent quite a few hours to understand why it matches "qb". It seems that several regex engines I have tested this regex on match "qb" in addition to "b" and "qbq". These include: JavaScript and JScript regex engines, PCRE (Perl Compatible Regular Expressions), PCRE2, CL-PPCRE (which should be a PCRE implementation for Common Lisp), and Python regex engine.

I found a very useful (and free) tool called The Regex Coach which you can use interactively to see how the matching process works. Set the "Regular Expression" field to "(q?)b\1" and the "Target string" field to "qb", switch to the "Step" tab at the bottom to see how the matching process works in this case.

In short:
1. "(q?)" first matches (and captures) the "q" at the beginning of "qb". Then, after matching the "b", the "\1" (which holds the "q") cannot be matched.
2. At this point the engine backtracks and... the rest of the explanation is from the page I have linked before:
Quote:
q? is optional and matches nothing, causing (q?) to successfully match and capture nothing. b matches b and \1 successfully matches the nothing captured by the group.

If you have RegexBuddy, you can also use the Debug feature.

Hope this helps.
Back to top
View user's profile Send private message
Instructor
Site Admin


Joined: 06 Jul 2006
Posts: 6188

PostPosted: Mon May 16, 2016 3:48 pm    Post subject: Reply with quote

KDJ wrote:
... but still [\x{100}-\x{FFFF}] matches the new line.
Test version
Back to top
View user's profile Send private message Send e-mail
KDJ



Joined: 06 Mar 2010
Posts: 1907
Location: Poland

PostPosted: Mon May 16, 2016 6:00 pm    Post subject: Reply with quote

Instructor
In test version is OK. Thank you.

-------
nbsp
In your example:
Code:
b
qb
qbq

(q?)b\1 matches:
b - in line 1,
b - in line 2 (not qb),
qbq - in line 3.
And this is what I had in mind.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    AkelPad Forum Index -> Discussion (English) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


SourceForge.net Logo Powered by phpBB © 2001, 2005 phpBB Group