Page 1 of 1

Search using regular expressions

Posted: Wed Feb 03, 2016 4:58 am
by nbsp
1. If I use the Find dialog to search for a regex such as [^\x00-\x7F], AkelPad finds newline characters in addition to any non-ASCII characters. The same regex using "FindReplace.js" finds only non-ASCII characters.

2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.

Posted: Sun Feb 21, 2016 9:18 am
by Instructor
nbsp
1. AkelPad's new lines have negative values internally. Use [^\x00-\x7F\n].

Posted: Mon Feb 22, 2016 6:32 am
by Instructor

Posted: Sun Mar 06, 2016 10:25 pm
by nbsp
Instructor
Thanks. The test version works fine.

Posted: Sat Apr 23, 2016 4:57 pm
by KDJ
Instructor
What is hex code of any new line (\n)? :)

Code: Select all

[\x{0}-\x{FF}]         - no matches \n
[\x{100}-\x{FFFF}]     - matches \n, so hex code is comprised between 100 and FFFF
[\x{10000}-\x{10FFFF}] - no matches \n
[\x{100}-\x{FFF8}]     - no matches \n
[\x{FFFA}-\x{FFFF}]    - no matches \n
[\x{FFF9}-\x{FFF9}]    - matches \n, so hex code is probably equal FFF9
\x{FFF9}               - no matches \n, unfortunately hex code is not equal FFF9

Re: Search using regular expressions

Posted: Sat Apr 23, 2016 6:45 pm
by KDJ
nbsp wrote:2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
It seems to me that (q?)b\1 should match only b and qbq (and not qb).
SearchReplace.js does not match qb.

Posted: Sat Apr 23, 2016 10:05 pm
by Instructor
KDJ wrote:What is hex code of any new line (\n)? :)

Code: Select all

\x{7FFFFFEC}
KDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
Test version

Posted: Sun Apr 24, 2016 1:31 pm
by KDJ
Instructor
Now (q?)b\1 is OK, but still [\x{100}-\x{FFFF}] matches the new line.

Re: Search using regular expressions

Posted: Mon May 16, 2016 1:22 am
by nbsp
Hi KDJ,
nbsp wrote:The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
KDJ wrote:SearchReplace.js does not match qb.
Try this:
  • 1. Create a new file with the following content:

    Code: Select all

    b
    qb
    qbq
    2. Execute "SearchReplace.js" and set:

    Code: Select all

    What: (q?)b\1
    With: @
    Check: Regular expressions
    Direction: Beginning
    
    3. Clicking "Find all" yields the following on my system:
    Log::Output wrote:3:
    (1,1) b
    (2,2) qb
    (3,1) qbq
    4. Clicking "Replace all" changes the new file created in step 1 to:

    Code: Select all

    @
    q@
    @
KDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
This is also what I have thought when I initially played with this regex. I spent quite a few hours to understand why it matches "qb". It seems that several regex engines I have tested this regex on match "qb" in addition to "b" and "qbq". These include: JavaScript and JScript regex engines, PCRE (Perl Compatible Regular Expressions), PCRE2, CL-PPCRE (which should be a PCRE implementation for Common Lisp), and Python regex engine.

I found a very useful (and free) tool called The Regex Coach which you can use interactively to see how the matching process works. Set the "Regular Expression" field to "(q?)b\1" and the "Target string" field to "qb", switch to the "Step" tab at the bottom to see how the matching process works in this case.

In short:
1. "(q?)" first matches (and captures) the "q" at the beginning of "qb". Then, after matching the "b", the "\1" (which holds the "q") cannot be matched.
2. At this point the engine backtracks and... the rest of the explanation is from the page I have linked before:
q? is optional and matches nothing, causing (q?) to successfully match and capture nothing. b matches b and \1 successfully matches the nothing captured by the group.
If you have RegexBuddy, you can also use the Debug feature.

Hope this helps.

Posted: Mon May 16, 2016 3:48 pm
by Instructor
KDJ wrote:... but still [\x{100}-\x{FFFF}] matches the new line.
Test version

Posted: Mon May 16, 2016 6:00 pm
by KDJ
Instructor
In test version is OK. Thank you.

-------
nbsp
In your example:

Code: Select all

b
qb
qbq
(q?)b\1 matches:
b - in line 1,
b - in line 2 (not qb),
qbq - in line 3.
And this is what I had in mind.