Monday, December 21, 2009

The Unix Grep Command Under Linux

I love the grep command.
It is so useful!

grep can be used to find
useful information inside of
a file.

For example, if I'm looking for
a name in my address book, I might
use grep. That's one way
to do it.

Here. Let me look for my own name
as an example:

grep -i Abbott AddressBook

The above grep command looks
for every occurance of the word
Abbott in my AddressBook
file.

I've asked grep to ignore
case sensitivity. That is to say,
I want to find Abbott, which
is capitalized, or abbott, which
is uncapitalized.

The following grep option ensures
that case sensitivity is ignored:

-i

What does grep do? It prints
all lines that have Abbott in
them on my screen.

Here's the command again:

grep -i Abbott AddressBook

Here's the output:

@Abbott, Ed

That one line of output.

If I want more, I can ask
the grep command to
print more.

For example, let's say I
want 3 lines of context.

Here's how I would specifiy
3 lines of context:

grep -i -c 3 Abbott AddressBook

The above command should give
me 7 lines.

  • Three lines before
  • The line with Abbott in it
  • Three lines after

This could be handy as my name in my
address book could immediately be
followed by my phone number.
Sometimes, context is helpful.

My examples here are somewhat
contrived. In reality, I use
the grep command for other things.

For example, I keep a journal
on my laptop computer. If I want
to look up a specific topic, such
as basketball, which may span
more than one journal, I can do it
this way:

grep -i -C 3 basketball *

This simple command will look at
every journal in the current directory
and pick out 7 lines of context for
each mention of basketball.

This is very helpful if you are trying
to find information fast.

The -C option of grep
is actually part of a family of options.

This family gives context. Here's the
family:

-Aafter context
-Bbefore context
-Cbefore and after context

Note that in this family of context options,
each member of the family is mnemonic for
something.

Here's the same table expressed as a memory
aid:


-Aafter
-Bbefore
-Ccontext

The context options, as I call them,
always print context lines in addition
to the line specified.

I'll give an example.

Let's say I have a file called Numbers
that consists of the following ten lines
of text:

ONE
TWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN

OK. Now let's say I
type the following command:

grep SIX Numbers

The grep command will
go looking for the pattern
SIX in the file. When
it finds it, we get the following
input/output:

$ grep SIX Numbers
SIX
$

The final $ is just my
command line prompt being returned
to me.

Notice that grep defaults
to printing just one line. That's
important! This helps you to understand
the behavior of the context options.

All context options print lines that
are in addition to the one line that
grep defaults to.

Here's a chart of how many lines each
context option will print:


-A 34 lines total
-B 34 lines total
-C 37 lines total

Here's a chart that describes what each
option does:

-A 3print the line plus 3 lines after the line
-B 3print 3 lines before the line plus the line
-C 3print 3 lines before the line, the line, and 3 lines after the line


Ed Abbott

Tuesday, December 1, 2009

The Ping Command

Ping is very useful if you
wish to lookup an IP address
for a website. For example:

ping www.websiterepairguy.com

OK. I just did a ping on my
own website.

Here's one of the lines of output:

64 bytes from box458.bluehost.com
   (74.220.219.58): 
   icmp_seq=1 ttl=52 time=86.0 ms

Ping will keep pinging away and
produce many many lines of output
that looks like the above.

Here's how to limit ping to one
line of output only:

ping -c 1 www.websiterepairguy.com

OK. Now my website gets pinged once
and then ping quits. In most cases,
this is what I want. I only need to
ping once.

Here's the total output for pinging
my website just once:

PING websiterepairguy.com 
    (74.220.219.58) 56(84) 
    bytes of data.
    64 bytes from 
    box458.bluehost.com 
    (74.220.219.58): 
    icmp_seq=1 ttl=52 
    time=90.2 ms

--- websiterepairguy.com 
    ping statistics ---
    1 packets transmitted, 
    1 received, 
    0% packet loss, 
    time 0ms
    rtt min/avg/max/mdev = 
    90.219/90.219/90.219/0.000 ms

Now let's say I'm only interested in
one thing and that is the IP address
for my website.

In this case, I'm only interested in
one line of output, the line that reveals
my IP addresss.

If I pipe the ping command to the grep
command, it might look like this:

ping -c 1 www.websiterepairguy.com | 
     grep PING

OK. The grep command narrows the amount
of output I get to just one line and only one
line of output. Here's that one line:

PING websiterepairguy.com 
     (74.220.219.58) 56(84) bytes of data.


How does grep work? It is a line printer
but a very special line printer. It only prints
lines that match what you gave it.

Since I gave it a capital PING as a pattern
to match, it only prints lines that match this
pattern.

Now lets narrow the output further. Let's
go for my domain name and my IP address only.
Here's the command:

ping -c 1 www.websiterepairguy.com | 
     grep PING | cut -f2-3 -d " "

Note the addition of the cut command.
I'm cutting out fields two and three based
on the spacebar as my field delimator.

Delimators are what define fields. They are
field separator characters.

I've placed a spacebar character between double
quotes to indicate that is my delimator. Here's
what it looked like when I did this:

-d " "


I only want the second and third field. Here's
how I specified this:

-f2-3


Here's the whole command again:

ping -c 1 www.websiterepairguy.com | 
      grep PING | cut -f2-3 -d " "


Here's the output to this command:

websiterepairguy.com (74.220.219.58)


OK. The only undersireable that remains
is the parenthesis. Here's a tricky way
to get rid of these:

ping -c 1 www.websiterepairguy.com | 
     grep PING | cut -f2-3 -d " " | 
     sed s/[\(\)]//g


In this case, the syntax for the sed
command is not very pretty. Here's what the
sed looks like:

sed s/[\(\)]//g


The sed command is being asked to substitute
one thing for another. In this case, it is being
asked to substitute parenthesis for nothing.

More simply, take the parenthesis out, both left
and right, and replace them with nothing.

The parenthesis appear with a backslash in front
because without the backslash, the parenthesis take
on a special meaning.

The sed command is using its own substitution
operator. Here's how the sed command works:

sed s/before/after/g


before is replaced by after

Here's the whole command again:

ping -c 1 www.websiterepairguy.com | 
      grep PING | 
      cut -f2-3 -d " " | 
      sed s/[\(\)]//g


Here's the output:

websiterepairguy.com 74.220.219.58


Ok. Seems like a lot of code but
actually, it is a quick way to get
things done, once you know what you
are doing.

Ed Abbott

Friday, November 20, 2009

The Unix ls Command

One of my all-time favorite
Unix commands is the ls command.

In its simplest form, ls looks
like this:

ls banana


The above lists a file named banana
if there is such a file.

Here's one that's more useful:

ls


This lists all the files in the current
directory.

OK. Here's something that could be even
more useful:

ls -l


This is the so-called long listing.
It lists eight columns of information on
every file in the current directory.

Included in the long listing is a time-stamp
for the file.

Note that a directory is just simply a
folder. Folders are directories and
directories are folders.

People who work on the command line
call them directories. People who
use a GUI (Graphical User Interface)
call them folders.

OK. Here's one that lists hidden files:

ls -A


The -A option means all.
That is to say, list all, both hidden
and non-hidden files.

Here's one that does a long listing on
all files, including hidden files:

ls -lA


Here's one that will look for subdirectories
inside of directories. Not only does it list
the current directory, it also lists anything
that belongs to the current directory:

ls -R


The -R suggests that the command is
recursive. It recursively descends into
directories finding sub-directories under
those directories going just as deep and
as far as it can.

Here's one that does a recursive long listing
on everything from the current directory on
down:

ls -lAR


Here's one that potentially looks at thousands
of files, listing those most recently modified
last:

ls -lAR | sort -k6


Use the above command to find a needle in a
haystack. The needle? A recently modified
file that is buried somewhere in listings of
thousands of files.

I love Unix commands because they are so simple.
Yet, you can do very complex things with them
when you start to string them together.

Ed Abbott

Thursday, November 19, 2009

The Unix File Command

Another favorite Unix command
that I frequently use is the
file command.

The file command is great
because it allows you to easily
determine what kind of file you
have in front of you.

Here's how you use the file
command:

file mystery-file


The file command tells you
what kind of file mystery-file
is.

Often, you can tell a file by extension.

For example, .jpg is a file extension.
The file, photo.jpg, is one such file.

To those in-the-know, any file with a
.jpg extension is an image or
photograph.

But what if you don't know? It's hard to keep
up with all possible file types out there.

If you don't know, the Unix file command
can be very very handy.

Ed Abbott

Thursday, November 12, 2009

The Unix Sort Utility

OK. This is a new blog.

Here's where I talk about some
of my favorite Unix commands under
Linux.

One of my favorites is the sort
utility. Often, I use this one
with a pipe.

Here's an example:

ls  -l  |  sort  -k6

This one sorts the output of a
long listing by date.

Therefore, if I'm only interested
in files that have been worked on
recently, I will find these at the
end of the listing.

Note that the -k option for
sort picks a whitespace
separated field as the one to do
the sort on.

That is to say, whitespace is what
separates potential sort keys.

Let's see. The long form of the
ls command is ls  -l.
ls  -l has 8 fields.

Therefore, ls  -l has 8
potential sort keys that you can
sort on.

A sort key is something to sort on.
A filename could be a sort key. A
date could be a sort key.

It is the sixth field that contains
the date. Therefore, -k6 means
sort on the sixth field which is the
date field.

Ok. Putting it all together, I'm taking
an ls command and sending
it to a sort command.

Putting the two commands, ls
and sort, together gives me
something greater than either command alone.

In many ways, that's what Unix is all about.
Putting commands together to make something
greater.

In this case, it is the pipe symbol, which lies
between the two commands, that allows me to put
them together to make something greater.

Make sense?

Here's a wonderful web page that describes in
detail many of the sort utility options:

More About Sort

Ed Abbott