Search using regular expressions

English main discussion
Post Reply
  • Author
  • Message
Offline
Posts: 19
Joined: Sun Jan 17, 2016 5:41 pm

Search using regular expressions

Post by nbsp »

1. If I use the Find dialog to search for a regex such as [^\x00-\x7F], AkelPad finds newline characters in addition to any non-ASCII characters. The same regex using "FindReplace.js" finds only non-ASCII characters.

2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

nbsp
1. AkelPad's new lines have negative values internally. Use [^\x00-\x7F\n].

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »


Offline
Posts: 19
Joined: Sun Jan 17, 2016 5:41 pm

Post by nbsp »

Instructor
Thanks. The test version works fine.

KDJ
Offline
Posts: 1949
Joined: Sat Mar 06, 2010 7:40 pm
Location: Poland

Post by KDJ »

Instructor
What is hex code of any new line (\n)? :)

Code: Select all

[\x{0}-\x{FF}]         - no matches \n
[\x{100}-\x{FFFF}]     - matches \n, so hex code is comprised between 100 and FFFF
[\x{10000}-\x{10FFFF}] - no matches \n
[\x{100}-\x{FFF8}]     - no matches \n
[\x{FFFA}-\x{FFFF}]    - no matches \n
[\x{FFF9}-\x{FFF9}]    - matches \n, so hex code is probably equal FFF9
\x{FFF9}               - no matches \n, unfortunately hex code is not equal FFF9

KDJ
Offline
Posts: 1949
Joined: Sat Mar 06, 2010 7:40 pm
Location: Poland

Re: Search using regular expressions

Post by KDJ »

nbsp wrote:2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
It seems to me that (q?)b\1 should match only b and qbq (and not qb).
SearchReplace.js does not match qb.

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

KDJ wrote:What is hex code of any new line (\n)? :)

Code: Select all

\x{7FFFFFEC}
KDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
Test version

KDJ
Offline
Posts: 1949
Joined: Sat Mar 06, 2010 7:40 pm
Location: Poland

Post by KDJ »

Instructor
Now (q?)b\1 is OK, but still [\x{100}-\x{FFFF}] matches the new line.

Offline
Posts: 19
Joined: Sun Jan 17, 2016 5:41 pm

Re: Search using regular expressions

Post by nbsp »

Hi KDJ,
nbsp wrote:The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
KDJ wrote:SearchReplace.js does not match qb.
Try this:
  • 1. Create a new file with the following content:

    Code: Select all

    b
    qb
    qbq
    2. Execute "SearchReplace.js" and set:

    Code: Select all

    What: (q?)b\1
    With: @
    Check: Regular expressions
    Direction: Beginning
    
    3. Clicking "Find all" yields the following on my system:
    Log::Output wrote:3:
    (1,1) b
    (2,2) qb
    (3,1) qbq
    4. Clicking "Replace all" changes the new file created in step 1 to:

    Code: Select all

    @
    q@
    @
KDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
This is also what I have thought when I initially played with this regex. I spent quite a few hours to understand why it matches "qb". It seems that several regex engines I have tested this regex on match "qb" in addition to "b" and "qbq". These include: JavaScript and JScript regex engines, PCRE (Perl Compatible Regular Expressions), PCRE2, CL-PPCRE (which should be a PCRE implementation for Common Lisp), and Python regex engine.

I found a very useful (and free) tool called The Regex Coach which you can use interactively to see how the matching process works. Set the "Regular Expression" field to "(q?)b\1" and the "Target string" field to "qb", switch to the "Step" tab at the bottom to see how the matching process works in this case.

In short:
1. "(q?)" first matches (and captures) the "q" at the beginning of "qb". Then, after matching the "b", the "\1" (which holds the "q") cannot be matched.
2. At this point the engine backtracks and... the rest of the explanation is from the page I have linked before:
q? is optional and matches nothing, causing (q?) to successfully match and capture nothing. b matches b and \1 successfully matches the nothing captured by the group.
If you have RegexBuddy, you can also use the Debug feature.

Hope this helps.

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

KDJ wrote:... but still [\x{100}-\x{FFFF}] matches the new line.
Test version

KDJ
Offline
Posts: 1949
Joined: Sat Mar 06, 2010 7:40 pm
Location: Poland

Post by KDJ »

Instructor
In test version is OK. Thank you.

-------
nbsp
In your example:

Code: Select all

b
qb
qbq
(q?)b\1 matches:
b - in line 1,
b - in line 2 (not qb),
qbq - in line 3.
And this is what I had in mind.
Post Reply