Contract Work

Friday, November 22, 2013

Session 5: Regular Expressions

For those of you who don’t follow me closely on twitter, you may have missed that I recently wrote my first regular expression!! Now, it was a very simple one but I think regular expressions can be some of the most intimidating code. You look at this blob of symbols and weirdness and there is no chance in hell you’ll be able to interpret it until you actually do some research. The first time I saw a regular expression I literally thought, what the hell is that? And is that actually I’m supposed to be able to understand one day? And then I actually wrote one, which was awesome. BUT more awesome than writing my first regular expression was going to Nell’s talk on Regex’s. It was mind-blowing. I think the best thing about Nell’s talk was that it was high level enough that more experienced developers in the room were really interested in what she was saying and n00bs like me understood everything she said because she stepped through regex’s slowly, clearly, and explained it amazingly well. I know that this will be a talk that I continue to look back on as I write more regex’s and gain more experience with them.

TL;DR: Regular expressions explained in a comprehensible, clear way are not so scary and can be really powerful.

First, Nell spoke about the onigmo library, which I had never even heard of before. Basically, it is onigmo that reads a regular expression and parses it into abstract syntax tree and then it compiles that into instructions. The expressions relate to a finite state machine. Breaking this down the machine is the subject, the state is the state of that machine, and finite are what the possible states are.

When making a regex, in order to call the regex in code you set it up this way:

Re = ‘/force/’
String = “Use the force”
re.match(string)


She then went into exactly what happens. Basically, the regex starts at the beginning of the string and it works through the string until it finds the beginning of what could be a match… in the case ‘f’ of ‘force’. Once it finds that first letter, it will work through the remaining letters to determine if there is a match and if there is one, then it’ll return that match.

From here, the talk went into different ways you can create regex’s. For example, if there is a pipe in the phrase then a match can be one part OR the other part of a string. The example used was /Y(olk|oda)/ so a match would be either Yolk or Yoda. Another variation is a plus quantifier. The + at the end of a string means that you can have infinite repetitions of the last letter of the string. The example she used was /No+/ which means a match could be “Noo”, “Nooo”, or “Noooooooooooo”. There are also lazy quantifiers, star quantifiers, possessive quantifiers which she shows using a bunch of cool examples.

Then, and this part was just really fascinating, she brought it together using an example of a challenge sent out by Avdi a little while ago. The challenge was to make a snake_case into a CamelCase using a regex. Now, beyond just showing us that, she TDD’ed it!!! I had absolutely no idea how you’d TDD a regex (or that you even could) so that was great to walk through. She started by mapping out what needed to be done and then going to it. To give an example of picking apart a regex she did this:

Def upcase_chars(string)
Re = /\A_?[a-z] /
String.gsub(re){|char| char.upcase}


Which means: \A anchors the expression to the beginning of the string
_ is the underscore
? means that the underscore is optional (it will match the string with or without the underscore)
[a-z] matches any letters and not the underscore
and then the next line says for the string, substitute the characters for uppercase characters.

This regex continued to get more complex for the next few minutes including a look back, figuring out some additional complications based on the underscore, and then combining the methods to create the solution.

You can check out the final code and final specs here (which I highly recommend doing).

At the end she made some final recommendations for building regular expressions. First, develop regex’s in small pieces and the combine them. Second, use Rubular and make a permalink of it. This way you can put the permalink into the code as a comment! Last, don’t be afraid of regular expressions.

1 comment: