Page 1 of 1
Search using regular expressions
Posted: Wed Feb 03, 2016 4:58 am
by nbsp
1. If I use the Find dialog to search for a regex such as
[^\x00-\x7F], AkelPad finds newline characters in addition to any non-ASCII characters. The same regex using "FindReplace.js" finds only non-ASCII characters.
2. The regex
(q?)b\1 should match
b,
qb and
qbq (see
here). However, it only matches all 3 using "FindReplace.js", and only
qbq when using the Find dialog.
Posted: Sun Feb 21, 2016 9:18 am
by Instructor
nbsp
1. AkelPad's new lines have negative values internally. Use [^\x00-\x7F\n].
Posted: Mon Feb 22, 2016 6:32 am
by Instructor
Posted: Sun Mar 06, 2016 10:25 pm
by nbsp
Instructor
Thanks. The test version works fine.
Posted: Sat Apr 23, 2016 4:57 pm
by KDJ
Instructor
What is hex code of any new line (\n)?
Code: Select all
[\x{0}-\x{FF}] - no matches \n
[\x{100}-\x{FFFF}] - matches \n, so hex code is comprised between 100 and FFFF
[\x{10000}-\x{10FFFF}] - no matches \n
[\x{100}-\x{FFF8}] - no matches \n
[\x{FFFA}-\x{FFFF}] - no matches \n
[\x{FFF9}-\x{FFF9}] - matches \n, so hex code is probably equal FFF9
\x{FFF9} - no matches \n, unfortunately hex code is not equal FFF9
Re: Search using regular expressions
Posted: Sat Apr 23, 2016 6:45 pm
by KDJ
nbsp wrote:2. The regex
(q?)b\1 should match
b,
qb and
qbq (see
here). However, it only matches all 3 using "FindReplace.js", and only
qbq when using the Find dialog.
It seems to me that
(q?)b\1 should match only
b and
qbq (and not
qb).
SearchReplace.js does not match
qb.
Posted: Sat Apr 23, 2016 10:05 pm
by Instructor
KDJ wrote:What is hex code of any new line (\n)?
KDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
Test version
Posted: Sun Apr 24, 2016 1:31 pm
by KDJ
Instructor
Now (q?)b\1 is OK, but still [\x{100}-\x{FFFF}] matches the new line.
Re: Search using regular expressions
Posted: Mon May 16, 2016 1:22 am
by nbsp
Hi
KDJ,
nbsp wrote:The regex
(q?)b\1 should match
b,
qb and
qbq (see
here). However, it only matches all 3 using "FindReplace.js", and only
qbq when using the Find dialog.
KDJ wrote:SearchReplace.js does not match qb.
Try this:
- 1. Create a new file with the following content:
2. Execute "SearchReplace.js" and set:
Code: Select all
What: (q?)b\1
With: @
Check: Regular expressions
Direction: Beginning
3. Clicking "Find all" yields the following on my system:
Log::Output wrote:3:
(1,1) b
(2,2) qb
(3,1) qbq
4. Clicking "Replace all" changes the new file created in step 1 to:
KDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
This is also what I have thought when I initially played with this regex. I spent quite a few hours to understand why it matches "
qb". It seems that several regex engines I have tested this regex on match "
qb" in addition to "
b" and "
qbq". These include: JavaScript and JScript regex engines, PCRE (Perl Compatible Regular Expressions), PCRE2,
CL-PPCRE (which should be a PCRE implementation for Common Lisp), and Python regex engine.
I found a very useful (and
free) tool called
The Regex Coach which you can use interactively to see how the matching process works. Set the "Regular Expression" field to "
(q?)b\1" and the "Target string" field to "
qb", switch to the "Step" tab at the bottom to see how the matching process works in this case.
In short:
1. "
(q?)" first matches (and captures) the "
q" at the beginning of "
qb". Then, after matching the "
b", the "
\1" (which holds the "
q") cannot be matched.
2. At this point the engine backtracks and... the rest of the explanation is from the page I have
linked before:
q? is optional and matches nothing, causing (q?) to successfully match and capture nothing. b matches b and \1 successfully matches the nothing captured by the group.
If you have RegexBuddy, you can also use the Debug feature.
Hope this helps.
Posted: Mon May 16, 2016 3:48 pm
by Instructor
KDJ wrote:... but still [\x{100}-\x{FFFF}] matches the new line.
Test version
Posted: Mon May 16, 2016 6:00 pm
by KDJ
Instructor
In test version is OK. Thank you.
-------
nbsp
In your example:
(q?)b\1 matches:
b - in line 1,
b - in line 2 (not qb),
qbq - in line 3.
And this is what I had in mind.