Skip to content

Lawtec

Sections
Personal tools
You are here: Home Members antonh's Home Regular Expressions in Python
Document Actions

Regular Expressions in Python

by antonh last modified 07-Sep-06 11:48 AM

Today I (finally) had the need to substitute something in a string. My immediate thought, based on my time as a Perl nut, was "use regular expressions!". So I had to get together some resources about how to use regular expressions in Python.

I used 3 resources in putting this together:

Character Escape Codes


Naturally, the first place to start. Just wanted to check that they are the same as Perl. Here they are:

 \               Newline Continuation
\\ Backslash
\' Single quote
\" Double quote
\a Bell
\b Backspace
\e Escape
\n Line feed
\v Vertical tab
\t Horizontal tab
\r Carriage return
\0 Null
\0XX Octal character value
\xXX Hex character value

Regular Expression Metacharacters


Again, nothing surprising here - the same as Perl.
.    		Any character
^    Beginning of the string
$    End of the string
\s    Whitespace character
\S    Non-whitespace character
\d    Digit character
\D    Non-digit character

Repetition Specifiers

*            	0 or more repetitions
+            1 or more repetitions
?            1 or 0 repetitions
{n}            Exactly n repetitions
{n,}        At least n repetitions
{n,m}        At least n but not more than m repetitions

Flags


DOTALL, S  	Make . match any character, including newlines
IGNORECASE, I Do case-insensitive matches
LOCALE, L Do a locale-aware match
MULTILINE, M Multi-line matching, affecting ^ and $
VERBOSE, X Enable verbose REs.

Search Functions

match()  	Determine if the RE matches at the beginning of the string.
search() Scan through a string, looking for any location where this RE matches.
findall() Find all substrings where the RE matches, and returns them as a list.
finditer() Find all substrings where the RE matches, and returns them as an iterator.

Result Object Functions


group()  	Return the string matched by the RE
start() Return the starting position of the match
end() Return the ending position of the match
span() Return a tuple containing the (start, end) positions of the match

Simple example

>>> import re
>>> src = 'abracadabra'
>>> result = re.search('*ab*', src)
>>> print result.group()
'ab'

Note that if you are going to use the search a lot, you can improve performance by compiling the regex ahead of time using 're.compile(regex)'

Other points of interest

You can use a regex as the basis for a split or substitution, as follows

>>> regex = re.compile(r'\W+')   	#the regex for word boundaries
>>> result = regex.split('This is a test')
>>> print result
['This','is','a','test']
 


This site conforms to the following standards: