Musings of an anonymous geek

May 18, 2007

Regular Expressions with Python’s “re” Module

Filed under: Python,Scripting,Sysadmin,Technology — m0j0 @ 3:11 pm

If you’re moving over from PHP, Perl, Ruby or something similar, don’t be intimidated by all the Python regular expression documentation. It doesn’t really have to be complicated or even all that much different from Perl (though it can be, if you want to go there).

Here’s a search and replace I ripped out of a Perl script for use in a Python script that replaces it. It insures that any MAC address fed to it has two digits in every field. So, for example, this would change “0:c:e:fe:d0:ae” to “00:0c:0e:fe:d0:ae”. This is good if you need to insert the value into a PostgreSQL column of type ‘macaddr’, or you just want to be consistent.

Perl: $macaddr =~ s/\b([0-9a-f])\b/0\1/ig

Python: macaddr = re.sub(r'(?i)\b([0-9a-f])\b', r'0\1', macaddr)

There are a few differences when moving to Python. First, there’s only one assignment operator in Python (to my knowledge – comment to correct me if I’m wrong) – so we’re calling a function instead of using “=~”. That’s fine with me. Less cryptic symbols are better.

Second, part of calling a function also means that the operation is explicit: we’re doing substitution using the “sub” method. There’s no “s/” like there is in Perl.

Third, there’s also no “/ig” in Python like at the end of the Perl example. The “i” means “ignore case”, and in Python, that indication (the “(?i)”) goes next to the pattern in question instead of at the end of the line. That’s easier for my brain to parse. I like to read what I’m doing in my native language (English), and if you think in that context, then reading regexes in Perl is kinda like reading in German, not English.

Finally, calling a function also means that the pattern and the thing you want to apply it to are separate arguments to the function instead of things that are delimited by more “/” characters. In fact, in Python, the only slashes of any kind appear only in the regular expression syntax. None of the actual language syntax contains a slash.

Though there are lots of differences in just this one very very simple example, I’ll also note that the actual regex syntax itself (the parts inside quotes for the Python example), are not different at all except for the addition in the Python example of the “ignore case” operator “(?i)”!

Technorati Tags: , , , , , ,

Social Bookmarks:
Advertisements

2 Comments »

  1. internet blackjack best internet blackjack free internet blackjack…

    Finally video poker tutorial super slots casino…

    Trackback by www géant casinos — May 6, 2008 @ 7:25 am | Reply

  2. […] re.sub, it looks like this might […]

    Pingback by Case insensitive Python regular expression without re.compile | ASK AND ANSWER — December 25, 2015 @ 1:01 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: