A regular expression for primeness
Lazy Saturday afternoons spent on the Internets lead to the niftiest discoveries.
A Regular Expression that tests a number for (un)primeness:
/^1?$|^(11+?)\1+$/
The input string that the above regex is matched with is a sequence of N ones, where N is the number tested for primeness.
The confounding terseness of regular expressions isn’t surprising after an year of daily use, but this one led to record amounts of head-scratching.
Let’s expand that out, shall we?
qr/
^1?$ # 1 or nothing, not prime
| # OR
^(11+?) # 2 or more 1’s, matched minimally
\1+$ # Followed by one or more instances of captured pattern (pattern in parens above)
/x
When matched against a sequence of N ones (111..), the special cases of zero or one ones is handled by the first part of the regex; the latter part is the workhorse.
The string is tested against an integer number of repeats of the captured pattern. 111111 is matched with three occurrences of 11- once during the capture and twice during the backreference (\1)+, so 111111 is not prime.
But here’s the trick: Since (11+?) matches minimally, failure to match nine (111111111) as four occurrences of 11 causes the regex engine to backtrack and capture 111; Nine is then matched with three occurrences of 111! So the regular expression engine is trying to match the input with a multiple of a successively increasing sequence of ones, which, of course, is the usual way of testing a prime.
This piece of regex magic was picked off StackOverflow, where there’s a full explanation with examples.
One comment