Note
This post is not written by me, just mirrored/archived from pboyd.io. See original post here.
The regex [,-.]
I stumbled on this regex recently: \d{2}[,-.]\d{2}
.
The intention is clear enough: match two sets of two digits separated by a comma, a dash, or a period. Of course, it shouldn’t work.
Dashes in character classes are special because they’re used for ranges (like [a-z]
to match lower-case ASCII letters).
If you want - in a character class you put it at the beginning, or the end, never the middle. So this should be [-,.]
not [,-.]
.
I assumed that [,-.]
was a typo and it wouldn’t match -, but I couldn’t find a bug. In fact, it works fine, you can try it yourself:
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[,-.]\d{2}/'
ok
Try a few variations with other characters, if you want. It won’t match - unless it’s at the beginning or end of the character class, except for [,-.]
:
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[*-,]\d{2}/'
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[a-z]\d{2}/'
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[-*,]\d{2}/'
ok
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[-a-z]\d{2}/'
ok
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[,-.]\d{2}/'
ok
What’s going on? Well, if you haven’t figured it out already, the comma, the dash, and the period are right next to each other in ASCII:
$ man ascii
...
054 44 2C ,
055 45 2D -
056 46 2E .
...
So grabbing all characters from , to . also includes - and nothing else. [,-.]
is the only possible character class with a - in the middle that only matches -.
I can’t tell if the author was being clever or if it was a particularly lucky typo. Either way, I will continue putting - at the beginning: [-,.]
.
But if you feel the need to send someone down a rabbit trail [,-.]
is an option for you.