Regular Expressions and Stream Editor

Xinyi Xiang
2 min readFeb 18, 2021

Even if you are not an expert with UNIX commands, you would probably encounter grep or sed for text searching/replacing/adding purposes. Even though both commands are used to editing, there are different specifications you could add to accelerate your flow of work.

grep stands for Global regular expression print. grep is a command line utility tool designed for UNIX systems and work on some other operation systems like UNIX as well.

sed is non-interactive by default, which means the command does not modify your files unless you add instructions for sed to do so. First developed in 1973 by Mr.Lee E. McMahon from the Bells Labs, the stream editor sed is supposed to serve for text transformations.

With some background for these two commands, let us now take a look at what the commands would look like typically.

sed -r 's/REGEX/TEXT/' file.txt

The REGEX in this line of command refers to the regular expressions of the text pattern you are trying to match, the replaced part would be TEXT and output is printed to the console.

Imagine then if we were to have a list of names, last name following first name and separated by a comma. If we want to extract the names and print them as first name then last name separated by a blank space, we could use the following command

sed -r 's/^(.*),(.*)$/\2 \1/' names.txt

The first . * expression is referring to zero or more characters in front of the comma, and by putting parenthesis on their sides, we could save and refer to them later. Same thing with the second expression that extracts last names from the file. And if you were to parenthesize more expressions, you could use the espaces character + n where n is the nth expression being stored.

Inconsistent formats can also be handled by sed commands, the * we used in the above example indicates none or more char/char(s) specified in front, there are also ? which specifies zero or one char/char(s) and + which specifies one or more.

Sometimes our revisions are expected to modify the files, and remember we have mentioned before that sed is by default non-volatile. This could easily be changed by adding an ‘I’ after the third / in the command.

sed -r 's/^(.*),(.*)$/\2 \1/I' names.txt

Some other conditions we could specify with character(s) appending is to perform global search and replacement with updating the file in place, like the following:

sed -r 's/^(.*),(.*)$/\2 \1/gI' names.txt

References

Originally published at http://xinyix.wordpress.com on February 18, 2021.

--

--