Regular Expressions:
The term regular expressions comes from theoreticalcomputer science. In
its simplest form it is defined as a language
for specifying patterns that match a
sequence of character. Unix evaluates text against thepattern to determine if the text and the pattern match. Some of the most powerful unix utilities, such as
grep and sed , use Regular Expressions.
In Unix , regular expression are constructed
using all the alphanumeric characters
along with
certain metacharacters like ^ (caret) , $
(dollar) , . (dot) and * (asterisk).
Metacharacters and their meaning: Special
characters or metacharacters , have a special meaning
to the shell. They can be used as wildcards
to specify the name of a file without having
to type out
file’s full name.
(^) The Caret or Circumflex Character: This
metacharacter is used to search and extract
lines or records that begins with a specific
pattern . for example , if all the lines or
records are begin
with the word Murthy are to be searched
and extracted , then the search pattern will
be ‘^Murthy’.
($) The Dollar Character: This
metacharacter is used to search and extract
lines or records
that end with a specific pattern. For
example , if all the lines or records that end
with the word
Murthy are to be searched and extracted
then the search pattern will be ‘Murthy$’ .
(.) The Dot Character: The dot is used to
match any single character except a new line
character. For example , if the user is
interested in extracting all lines or records
having the name
spelled either as Murthy or Murthi, the
search pattern will be ‘Murth.’
(*) The Asterisk Character: Asterisk is used
to match multiple characters. This
metacharacter stands for zero or more
occurrences of the preceding character.
For example, to search
for all the lines that contain the pattern made with the letter M , the search pattern will be ‘M*’ .
Character Class: There are situation when it
is necessary to match a character from
within a set of
characters. In unix set of characters out of
which, only one character is matched is
referred to as a
character class. This set of characters
presented within a pair of square brackets.
Searching for patterns having
Metacharacters : Sometimes it is necessary to search and extract
lines containing metacharacters. This can
be done by de-specialising the
metacharacter that appears
in the search pattern. The metacharacter \
(backslash) is used to de-specialize the
special meaning
associated with any character that
immediately follows it.
Searching for words that Begin or End with
a specific pattern:
All the lines or records that begins with
same pattern or character such as
Indonesia, India ,
Ink and others that begins with the pattern
In ,anywhere in the line are searched and
extracted by
using the regular expression ‘<In’ .
All the lines or records having words such
as Asia, India, Bolivia and others that end
with the
pattern ia and could be anywhere in a line
or record are searched and extracted by
using the regular
expression ‘ia\>’.
0 Comments