Search using regular expressions
- Author
- Message
-
Offline
- Posts: 19
- Joined: Sun Jan 17, 2016 5:41 pm
Search using regular expressions
1. If I use the Find dialog to search for a regex such as [^\x00-\x7F], AkelPad finds newline characters in addition to any non-ASCII characters. The same regex using "FindReplace.js" finds only non-ASCII characters.
2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
-
Offline
- Posts: 1949
- Joined: Sat Mar 06, 2010 7:40 pm
- Location: Poland
Instructor
What is hex code of any new line (\n)?
What is hex code of any new line (\n)?
Code: Select all
[\x{0}-\x{FF}] - no matches \n
[\x{100}-\x{FFFF}] - matches \n, so hex code is comprised between 100 and FFFF
[\x{10000}-\x{10FFFF}] - no matches \n
[\x{100}-\x{FFF8}] - no matches \n
[\x{FFFA}-\x{FFFF}] - no matches \n
[\x{FFF9}-\x{FFF9}] - matches \n, so hex code is probably equal FFF9
\x{FFF9} - no matches \n, unfortunately hex code is not equal FFF9
-
Offline
- Posts: 1949
- Joined: Sat Mar 06, 2010 7:40 pm
- Location: Poland
Re: Search using regular expressions
It seems to me that (q?)b\1 should match only b and qbq (and not qb).nbsp wrote:2. The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
SearchReplace.js does not match qb.
-
Offline
- Site Admin
- Posts: 6311
- Joined: Thu Jul 06, 2006 7:20 am
KDJ wrote:What is hex code of any new line (\n)?
Code: Select all
\x{7FFFFFEC}
Test versionKDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
-
Offline
- Posts: 19
- Joined: Sun Jan 17, 2016 5:41 pm
Re: Search using regular expressions
Hi KDJ,
I found a very useful (and free) tool called The Regex Coach which you can use interactively to see how the matching process works. Set the "Regular Expression" field to "(q?)b\1" and the "Target string" field to "qb", switch to the "Step" tab at the bottom to see how the matching process works in this case.
In short:
1. "(q?)" first matches (and captures) the "q" at the beginning of "qb". Then, after matching the "b", the "\1" (which holds the "q") cannot be matched.
2. At this point the engine backtracks and... the rest of the explanation is from the page I have linked before:
Hope this helps.
nbsp wrote:The regex (q?)b\1 should match b, qb and qbq (see here). However, it only matches all 3 using "FindReplace.js", and only qbq when using the Find dialog.
Try this:KDJ wrote:SearchReplace.js does not match qb.
- 1. Create a new file with the following content:
2. Execute "SearchReplace.js" and set:
Code: Select all
b qb qbq
3. Clicking "Find all" yields the following on my system:Code: Select all
What: (q?)b\1 With: @ Check: Regular expressions Direction: Beginning
4. Clicking "Replace all" changes the new file created in step 1 to:Log::Output wrote:3:
(1,1) b
(2,2) qb
(3,1) qbqCode: Select all
@ q@ @
This is also what I have thought when I initially played with this regex. I spent quite a few hours to understand why it matches "qb". It seems that several regex engines I have tested this regex on match "qb" in addition to "b" and "qbq". These include: JavaScript and JScript regex engines, PCRE (Perl Compatible Regular Expressions), PCRE2, CL-PPCRE (which should be a PCRE implementation for Common Lisp), and Python regex engine.KDJ wrote:It seems to me that (q?)b\1 should match only b and qbq (and not qb).
I found a very useful (and free) tool called The Regex Coach which you can use interactively to see how the matching process works. Set the "Regular Expression" field to "(q?)b\1" and the "Target string" field to "qb", switch to the "Step" tab at the bottom to see how the matching process works in this case.
In short:
1. "(q?)" first matches (and captures) the "q" at the beginning of "qb". Then, after matching the "b", the "\1" (which holds the "q") cannot be matched.
2. At this point the engine backtracks and... the rest of the explanation is from the page I have linked before:
If you have RegexBuddy, you can also use the Debug feature.q? is optional and matches nothing, causing (q?) to successfully match and capture nothing. b matches b and \1 successfully matches the nothing captured by the group.
Hope this helps.
-
Offline
- Site Admin
- Posts: 6311
- Joined: Thu Jul 06, 2006 7:20 am
Test versionKDJ wrote:... but still [\x{100}-\x{FFFF}] matches the new line.
-
Offline
- Posts: 1949
- Joined: Sat Mar 06, 2010 7:40 pm
- Location: Poland
Instructor
In test version is OK. Thank you.
-------
nbsp
In your example:
(q?)b\1 matches:
b - in line 1,
b - in line 2 (not qb),
qbq - in line 3.
And this is what I had in mind.
In test version is OK. Thank you.
-------
nbsp
In your example:
Code: Select all
b
qb
qbq
b - in line 1,
b - in line 2 (not qb),
qbq - in line 3.
And this is what I had in mind.