# A regular expression for primeness

Lazy Saturday afternoons spent on the Internets lead to the niftiest discoveries.

A Regular Expression that tests a number for (un)primeness:

/^1?\$|^(11+?)\1+\$/

The input string that the above regex is matched with is a sequence of N ones, where N is the number tested for primeness.

The confounding terseness of regular expressions isn’t surprising after an year of daily use, but this one led to record amounts of head-scratching.

Let’s expand that out, shall we?

qr/
^1?\$       # 1 or nothing, not prime
|             # OR
^(11+?) # 2 or more 1’s, matched minimally
\1+\$       # Followed by one or more instances of captured pattern (pattern in parens above)
/x

When matched against a sequence of N ones (111..), the special cases of  zero or one ones is handled by the first part of the regex; the latter part is the workhorse.

The string is tested against an integer number of repeats of the captured pattern. 111111 is matched with three occurrences of 11- once during the capture and twice during the backreference (\1)+, so 111111 is not prime.

But here’s the trick: Since (11+?) matches minimally, failure to match nine (111111111) as four occurrences of 11 causes the regex engine to backtrack and capture 111; Nine is then matched with three occurrences of 111! So the regular expression engine is trying to match the input with a multiple of a successively increasing sequence of ones, which, of course, is the usual way of testing a prime.

This piece of regex magic was picked off StackOverflow, where there’s a full explanation with examples.

# One comment

1. Pingback: >> RIGHTSHIFT » The Illegal Prime