13 January 2021

grep and Regular Expressions

Here is grep's usage.

unix> grep search_string file_name

The argument file_name is optional but nearly always used because, by default, it is stdin. The grep command will put all lines containing search_string to stdout.

Open the man page for grep Look at the options. Which ones do you think will be handy? I put one on the table below; let's discuss others.

OptionAction
-i, --ignore-caseignores case

Wait!!!! But there's more!

There is a language for textual patterns called regular expressions. We can use this along with grep as an immensely powerful search tool. You can also use regular expressions in vi to search in a file. We are going to learn how to use this tool today.

Character Classes

These are character wildcards and are the "bricks" of regular expressions. Download the file sampler.txt at the left; we will use it to do some spelunking.

Regexes level 1: Juxtaposition

The regex101 site is an excellent tool for practicing regexes and for debugging.

Kris Jordan on regular expressions

Special Character Classes

^ start of line
$ end of line
. any one character except \n
\d any decimal digit
\s whitespace (\n, \t, " ", "\r")
\b word boundary
clue: butter

g__t
All multiplicity operators are postfix uanry operators
Multiplicity has precedence over juxtaposition.
Override with ()
{2} matches pattern occurring twice
{n} matches pattern occurring n times
+   one or more of
?   once or nonce
*   zero or more of

BEGIN
a possible + or -                     [+-]
a sequence of one or more digits      \d, or [0-9]
END
^[+-]?\d+$
| is or
M 1000
D 500
C 100
L 50
X 10
V 5 
I 1

CM 900
CD 400
XC 90
XL 40
IX 9
IV 4

Regexes level 2: Multiplicity

These are all postfix unary operators with precedence over juxtapositions. Use () to override the order of operations.

Orring

The | symbol does or. Bound its enthusiasm with parentheses.

A Python or Java Warmup Write cat in Python or Java. This program can take one or more files and puts them to stdout. If no file is specified, have it use sys.stdin as a file. Can you make it behave like UNIX's cat.

Using Regexes in Python

Using Regexes in Java

sort

uniq

tr

tee What does this do?

fold

nl