grep and Regular Expressions
Here is grep's usage.
unix> grep search_string file_name
The argument file_name is optional but nearly always used
because, by default, it is stdin. The grep command
will put all lines containing search_string to stdout.
Open the man page for grep Look at the options. Which ones do
you think will be handy? I put one on the table below; let's discuss others.
| Option | Action |
|---|---|
| -i, --ignore-case | ignores case |
Wait!!!! But there's more!
There is a language for textual patterns called regular expressions. We can
use this along with grep as an immensely powerful search tool. You can
also use regular expressions in vi to search in a file. We are going to learn
how to use this tool today.
Character Classes
These are character wildcards and are the "bricks" of regular expressions.
Download the file sampler.txt at the left; we will use it to do some
spelunking.
Regexes level 1: Juxtaposition
The regex101 site is an excellent tool for practicing regexes and for debugging.
Kris Jordan on regular expressions
Special Character Classes
^ start of line $ end of line . any one character except \n \d any decimal digit \s whitespace (\n, \t, " ", "\r") \b word boundary
clue: butter g__t
All multiplicity operators are postfix uanry operators Multiplicity has precedence over juxtaposition. Override with ()
{2} matches pattern occurring twice
{n} matches pattern occurring n times
+ one or more of
? once or nonce
* zero or more of
BEGIN a possible + or - [+-] a sequence of one or more digits \d, or [0-9] END
^[+-]?\d+$
| is or
M 1000 D 500 C 100 L 50 X 10 V 5 I 1 CM 900 CD 400 XC 90 XL 40 IX 9 IV 4
Regexes level 2: Multiplicity
These are all postfix unary operators with precedence over juxtapositions. Use () to override the order of operations.
{n}Exactly n times{m, n}At least m, but not more than n times*+?
Orring
The | symbol does or. Bound its enthusiasm with parentheses.
A Python or Java Warmup
Write cat in Python or Java. This program can take
one or more files and puts them to stdout. If no file
is specified, have it use sys.stdin as a file.
Can you make it behave like UNIX's cat.
Using Regexes in Python
Using Regexes in Java
sort
uniq
tr
tee What does this do?
fold
nl