Friday, October 29, 2010

Finding Files On Your Hard Drive
With the Linux Find Command

 
I love the Linux find command. The
find command is used to find files.

Here are some of my favorite things about
the find command:

  1. You can use it to find a file by name
  2. You can use wildcards with it
  3. It automatically descends into folders
    underneath the current folder
  4. It prints out the path to every file
    it finds

The ability of the find command to descend
into directories (folders) is known as recursive
descent. Each layer of directories found under
the current layer of directories is another layer
of recursion. Recursive descent is a well-known
computer algorithm used by many programmers.

Basically, the find command consists of
4 parts:

  1. The name find
  2. Where to start looking
  3. What files to look for
  4. What to do when you find files

Here's an example:

find . -name abc -print

Here's the 4 parts of the above
find command:

  1. The name of the command is find
  2. Dot is the name of the current directory
  3. We are looking for a file called abc
  4. Once a file called abc is found, the
    find command will print the path to it

Here's what happens in plain English:

We start looking in the current directory
(dot or period) for a file called abc.
We will uncover all possible sub-directories
of the current directory. Any files found
that are called abc will be printed.

Here's the only thing that is tricky about
the find command: It shares wildcards with
the shell. This can be trickier than it
sounds. Let's say I wish to find a file
that starts with the letters abc.
I might type the following command:

find . -name abc* -print

This will probably work. As long as
there are no files in the current
directory that start with an abc,
all will be well.

However, lets say we have a file called
abcdef in the current directory.
We are now in trouble. We are in trouble
because the shell is going to do file-name
expansion prior to executing
the find command.

Here's what we type:
find . -name abc* -print

Here's how the shell interprets what we
typed:

find . -name abcdef -print

Do you see the problem? The find command
never sees the asterisk. What happens is that
the find command sees abc* only
after it has been expanded to abcdef.
Big difference!

Of course, there is a way around this and that
is to remove the special meaning of the asterisk
with a backslash. Here's what this would look like:

find . -name abc\* -print

In actual practice, though, the practice of using
backslashes on a command line is very clumsy. Most
people use double quotes instead. Here's what double
quotes look like:

find . -name "abc*" -print

The double quotes escape any special meanings
including the special meaning of the asterisk.
Now we can rest easy and know that our filename
expansion characters will reach the
find command untouched.

The double quotes are a wonderful habit to get
into. Basically, you can use the double quotes
regardless of whether you are using filename
expansion characters or not. Let's say, for
example, we are looking for a file called
abc.

Here's how I might apply the double quotes:

file . -name "abc" -print

In this case, the double quotes do not
matter. Since there are no filename
expansion characters, the double
quotes serve no purpose.

Here's why I use double quotes anyway:

If you always use double quotes, you
never need rethink the find command.
It just works no matter what. Rather
than think whether double quotes are needed,
just use them. They don't cost anything
other than 2 keystrokes.

This is more valuable than it might
appear. When you are in the heat of
battle and you are trying to solve
a problem, considering whether or not
to use double quotes is a mental
distraction.

Rather than suffer the distraction, just
use the double quotes. It's not hard to
figure out whether or not you need double
quotes, but why think about it at all?

Ed Abbott

Tuesday, October 12, 2010

The diff Command Under Linux

 
The diff command under Linux
is one of my favorite Unix commands.
It's an ancient command that's still very
useful in a modern world. I call any command
that dates back to the 1970's ancient.

I was recently asked by someone over the
phone how to find the changes made to a
website by a web developer. How do you
find their changes if you have a complete
copy of the website before the changes and
a complete copy of the website after the
changes?

I told him that you need 3 things to do
this:

  1. A complete copy of the website
    before and after
  2. The Unix ls -lt command
  3. The diff command

Start by looking at the complete copy
of the new website. Start in the topmost
directory (folder) of the website and do
this command:

ls -lt

This will give you a list of both files
and directories sorted in timestamp
order. Directories recently modified
need to be investigated further. Files
recently modified need to be noted.

In any case, both files and directories
of recent vintage will rise to the top
of the ls -t listing.

Using this list, you can easily find things
that have been modified after the web developer
(who made changes) took over.

If a file, make a note. If a directory, look
further.

Keep looking into directories that have been
modified since the new web developer started
working on the site. Once you've found all
the files that have been modified after a certain
date, you are done with ls -lt.

This will take less time than it might seem as
web developers typically only modify a few files
on each occasion that they work on a site. For
example, if the web developer only worked on the
Contact Us page, this may be the only file
that was modified. This being the case, you will
find the file relatively quickly.

Next, use the diff command to figure out
what changed on the Contact Us page.

Here's how you might use the diff command
hypothetically:

diff ../old/contactus.html ../new/contactus.html >temp

I've fictionalized the directories where the
old and new Contact Us pages would be
found. Undoubtedly, you will have to do a bit
more typing than I did in my hypothetical example
above to get a diff on the two files.

Notice that I've placed the difference between
the two files in a file called temp. This
is a temporary file that has all the changes.

If the changes are not too extensive, the file
called temp will be quite short. It could
be something as simple as a new phone number or
a new business address.

A Contact Us page consists of contact
information so the changes to it would not
necessarily be anything more than a slight
update.

How long would it take me to find all of this
out? Discounting the time it takes me to obtain
two copies the the two websites, I'd say maybe
5 minutes.

Here's the steps I would take in that 5 minutes:

  1. Find the most recent timestamp in the old
    copy of the website. In other words, do a
    ls -lt on the old topmost directory. Be
    sure to discount things like server logs and
    other things that are automatically updated
  2. Use the timestamp discovered at the old site
    to determine what is new at the new site
  3. Do the steps given above to discover what
    files are newer than the timestamp discovered
    on the old copy of the website

That, in a nutshell, would be how I would discover
work done recently by a web developer. Here
are some basic principles that are at work here:

  1. In life you generally need a reference point
    if you are to get anywhere. In this case the
    reference is the file last worked on on the old
    copy of the website
  2. In life, it is helpful to know how far you've
    come since you last referenced where you were. The
    technique of using ls -lt to progressively
    descend directories looking for recent file changes
    to the new copy of the website does this. It tells
    you have things have progressed since the last
    checkpoint
  3. It helps to have a basis of comparison. The
    diff command gives you a wonderful way to
    compare two files looking for changes

Because of their primitive nature, I don't know
of anything that supersedes the old Unix command-line
commands. I've never ever discovered anything that
is quite like them in flexibility, scope, and power.

Of course, it takes a little bit of creativity to
combine and use these commands effectively. If there
is a downside, that would be it. You cannot be half
asleep and use Unix commands effectively. You have
to be a person who does not mind exercising a little
creativity. If you enjoy being creative, Unix command-line
commands may be for you.

One more thing: I've oversimplified things somewhat
to help you find files on a small website. On a
large website you may have problems with the
techniques outline above.

While the parent directory of a file will reflect the
most recently modified file in the parent directory, the
grandparent directory will not necessarily reflect this
same timestamp. In other words, the topmost directory
of a website does not necessarily reflect the most recently
modified file on the website. That's one problem.

Another problem is if you have to go digging through many
many files and directories. If this is the case, the
technique outlined above could prove difficult or impossible.

In the case of added complexity, you may wish to change
technique somewhat. The following web page describes how
to use the find command to find a file of recent
vintage:

Using the -newer option of the find command

Even though the find command is more efficient
if you absolutely need to know the most recently modified
files on a website, the ls -lt command is still
useful. The ls -lt is a much quicker and simpler
way to survey the general situation and get a general
take on how recently the website has been modified.

There's another principle at work here and that's scaling
your solutions to your problems. In life you generally
don't want to use a large-scale solution to solve a small
problem. You typically would not use a backhoe to dig
a hole for a fence post.

So, depending on the scale of what you are looking at,
either find -newer or ls -lt may be the
ideal tool to pick up and use in your situation.

Ed Abbott

Sunday, September 26, 2010

Feeding Standard Input
to The bc Command

 
I'm always learning something
new.

For years I've used the
bc command in interactive
mode. bc stands for
basic calculator.

Here's a nice blog post that
introduces bc:

Unix Basic Calculator

Interactive mode is fine. That's
where you sit there and type things
like this:

1 + 1 + 1

After you hit enter, you
get this answer back:

3

That's great if you are doing
a simple calculation. But what
about lots and lots of numbers.
How do you deal with this?

This morning, I was working with
the vim editor. I was pulling
some numbers out of a web page. I
wanted to add all these numbers
together like this:

24 + 7 + 9 + 27

It was a lot more numbers then
I'm showing you here. There were
about 52 numbers embedded in a web
page that I wanted to add together.
That's way too many numbers to add
together interactively. What if I
made a typo?

Also, why should I have to type the
numbers at all. That's what vim
is for, right? In this case, I used vim
to format the numbers for me and then
put plus signs (+) in between each number.

After I was finished editing the web page
with vim, I was left with a single line
that consisted of nothing but numbers separated
by plus signs (+).

Vim is my all-time favorite programmer's
editor:

My Favorite vim Commands

With vim, I used regular expressions
and vim commands to pare my saved
web page down to just 52 numbers on one line
with a plus sign (+) separating each number.

The next step? I was wondering about that.
I figured there must be a way to use bc
in batch mode. By batch mode, I mean
a way to get bc to run a bunch of commands
that are stored in a text file.

Turns out there is a way. I discovered
it when I read the bc man page. It's
simple. Just put the math operations in
a text file and feed these numbers to bc.

Here's what I typed:

bc <numbers.txt

It worked like a charm! My 52 numbers
were all added together. This is a very
powerful feature that I'm sure I'll be
making great use of in the future.

The simplicity of this approach is that
you feed your batch file into bc
via standard input and you get your
answer on standard output. In other
words, a Unix filter.

Here's the input and output together:

bc <numbers.txt
256


I love Linix for this reason. So many
primitive commands that you can do such
great things with. Linix is a great time-saver.

Ed Abbott

Saturday, May 22, 2010

The Unix-Linux Identify Command

 
Another great command discovered!

The identify command enables
me to dicover the width and height
of an image without having to go
into an image viewer.

This is very handy if, say, I'm in
the Vim editor and I need to know
the dimensions of an image without
leaving the editor.

Here's what I might type in Vim:

:r !identify goofy.jpg

The above command pulls into Vim
the width and height of the
image called goofy.jpg.

This is another example of how a
primitive command can be put to
great use when combined with other
tools.

The old cliche is a cliche because
it is so true: The sum is greater
than the parts
.

Ed Abbott

Monday, March 22, 2010

Spell Check Web Pages Easily With Ispell

 
I use a command-line utility
called ispell to spell
check my text files under linux.

Here's an example of what I mean:

ispell readme.txt

The above command will go through
the readme.txt file looking
for misspelled words. It will
offer suggestions as to how the
word might be spelled correctly.
Should you choose to accept the
suggested corrected spelling, it
will correct the spelling of the
word for you with a single keypress.

How about web pages that you are
working on locally on your hard
drive? How do you spell check
these?

Of course, you could upload the web
page to the web server and then
spell check it on the web. That's
one way of doing things.

However, what if you want to spellcheck
locally? What if you'd like to spellcheck
your text that is marked up with HTML
without having to go to the web?

Here's a sample command that demonstrates
what I do in this case:

ispell -h index.html

In the above example, index.html
is on my hard drive in the current
directory. I add the -h option
to let ispell know that I want
it to ignore HTML markup and only
spellcheck the body of text itself.

This is very very handy if you work in
a simple text editor but wish to
spellcheck without having to go to
the web to find a spellcheck application.

A favorite feature of mine is ispell's
ability to respond on one keypress. One
keypress gets you many things.

One of my favorite keypresses is the
letter i. The letter i allows you
to add a word to your own personal
dictionary. Here's where your personal
dictionary is stored in a hidden
file under your home directory:

~/.ispell_default

Once a word has been stored in the
.ispell_default file under your
home directory, it becomes a regular
word that ispell now considers to
be correctly spelled. It will even
suggest that word from time to time
should you come up with a misspelling
that is an approximation of the correct
spelling.

Words from ispell's built-in dictionary
and words from your personal dictionary
are both first-class citizens. Words
from both sources are likely to be suggested
as possible correct spellings.

How does ispell suggest words? It's
one single keypress all over again.
ispell might suggest 10 different
spellings. The suggestions will appear
as keypresses 0 through 9.

Let's say ispell suggests 36 different
spellings. In that case, the choices
will range from 00 through 35. Thus
two keypresses will be required to make
a choice.

Normally, though, only one keypress is
needed. Ten suggestions or less is typical.
More than ten suggestions is the exception.

Ed Abbott

Monday, March 1, 2010

Linux cp Command

 
One of my favorite commands under
Linux is the cp command. In
it's most primitive form, you use
it to copy a file like this:

cp strawberry raspberry

With the cp command, you
don't make extra work
for yourself
. Instead
of recreating a file, you
copy it.

In the above example, I copied
strawberry to raspberry.
After I've done this, I should have
two identical files.

Note that the copy command is not
limited to files. Here's how you
copy a directory and all its
contents:

cp -R banana apple

In the above example, banana
is the original directory. The
new directory is apple.

Copy always goes in this direction:

cp old new

You always copy left to right. Another
way of saying this is that you always
copy an old file to a new file, the
old file being on the left, the new
file being on the right.

Back to cp -R. Here's the example
I gave above:

cp -R banana apple

In this example, I recursively copy
a directory called banana into a new
directory called apple. I started
with banana but I ended up with both
banana and apple.

Note that it is not just the directory that
is copied. It is also all the contents of
the directory, including other files and
directories to any level of depth.

If you are used to using the term folder,
think of a directory as a folder. Folders
and directories are the same thing.

I use the term directory because the
cp command is used on the command line.
On the command line, folders are directories.

Here's one more example of the cp command:

cp -Rp pear peach

In this example, the directory pear is
being copied to the new directory called peach.
However, there is an additional nuance here. The
directory peach may be new but its timestamp
is old. That's because the -p option asks
copy to preserve both permissions and timestamps.

Had we left off the -p option, peach
would have a different timestamp. The timestamp
would be the moment peach came into existence.

Also, peach could potentially have a different
owner as well. With the -p option, ownership
defaults to the person who typed and executed the
cp command.

Again, the -p option preserves both permissions
and timestamps.

Knowing the cp command can save you
much time and energy. This is especially
true if you know the many different ways in
which it can be used.

Ed Abbott

Thursday, January 14, 2010

The Unix ls Command Sorted by Date

In the past, I've written
about using the sort
command
to sort long
listings by date.

There's an easier way, I've
just learned.

The ls command has a
-t option. You
can use this to sort files by
timestamp.

It looks like this:

ls -t

If you wish to confirm that
the files are really being
sorted by date, you could
turn it into a long listing:

ls -lt

A long listing will not tell
you if files that are more
than 6 months old are in
precise date order to the
minute and to the second.

Here's yet another way to do
it:

ls -t --full-time

The above command will give a
long listing, but with the
modification time in hours,
minutes and seconds included.

Of course, it is the -t that
sorts the listing in timestamp
order.

I believe that the --full-time
option is of recent vintage. I don't
think we ever had it in the old days.

In fact, the only ls that I know
of that has a --full-time option
is GNU ls.

However, since I use Linux currently,
this is not a problem for me. I suspect
that the ls command under Linux is,
in most cases, GNU ls.

What's the lesson in all of this?
Orderliness and Godliness are related.
Commonplace things take on a Godly nature
when then are put into good order.

Just the other day, I used ls -t
on a directory of sub-directories. I was
trying to find the directory I had most
recently worked on. I could not find it.

I was so confused. What could possibly be
wrong? I thought maybe the ls -t was
broken.

Turned out I was logged into linux as
another user and had forgotten that fact.
I had forgotten I was logged into an
account other than my normal user account.

As it turns out, both user accounts have a directory
tree that is a mirror in terms of directory names but
not in terms of file content. However, to all
appearances, the name of the directory and the names
of the sub-directories are the same.

The confusion I felt had over the ls -t not
working later cleared up when I realized I was in
the wrong file hierarchy entirely.

This is what I mean by orderliness and Godliness being
related. A clear mental vision and a clear spiritual
vision often are the result of living an orderly life.

While the ls -t cannot order my entire life,
it can take the chaos out of a small corner of it.

Ed Abbott

Wednesday, January 6, 2010

The Unix du Command Under Linux

One of my favorite Linux commands
is the du command.

Here's an article:

The du Command

Why do I like the
du command?
Because it tells me
if I'm being a disk
hog. That's why.

Also, it helps me
to identify disk hog
directories.

As mentioned in the article,
there's a command that will
give you a summary on a specific
directory. Here it is:

du -hs

Note that you have to have permissions
on all the directories below this current
directory or you end up getting a lot of
goofy error messages.

Perhaps the quickest and easiest way to
get around the goofy error messages is
to login as root.

Note that the above command will only
give you one line of output.

Here's an input and output example:

$ du -hs mary
363M mary
$

Note that the final dollar sign is
my prompt coming back.

In this case, I'm being told that
the directory called mary
has used up 363 megabytes of
storage.

Ed Abbott

Sunday, January 3, 2010

The Unix df Command Under Linux

One of the commands I use the
most frequently under Linux is
the df command.

df stands for disk
free
. Basically, the
df command tells you
how much of your hard drive
has been used up.

This is very useful.

With df, you know whether
you have 50 percent of your hard
drive left for additional storage
or only 10 percent left.

Big difference.

Again, df means disk
free
. This is exactly what
df tells you. How much
of the hard drive is free for
you to add other things to it.

Here's how to use the df
command:


df

Simple, isn't it?

Here some input and output:

$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda3            123079960  83666812  33161052  72% /
tmpfs                   518136         0    518136   0% /lib/init/rw
udev                     10240       672      9568   7% /dev
tmpfs                   518136         0    518136   0% /dev/shm
$

The final dollar sign is my prompt
coming back.

You can make the df command
quite a bit more useful by adding
the -h option to it.

The -h means human
readable
. With -h, df
reports usage in megabytes
and gigabytes, and other units of
measure that are easy to digest.

Here's what df looks like with
-h present:

df -h

Here's a sample input and output for
df with the -h present:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda3             118G   80G   32G  72% /
tmpfs                 506M     0  506M   0% /lib/init/rw
udev                   10M  672K  9.4M   7% /dev
tmpfs                 506M     0  506M   0% /dev/shm
$

Again, the final dollar sign
is my prompt coming back.
The initial dollar sign is the
prompt as it appears before
I've typed anything.

Note that df gives me 6
columns of information.

Perhaps the most important column
is the second column. This is the
size column.

This tells me, in megabytes or
gigabytes, how much space I have
left on my hard drive.

In my case, I have used up 80
gigabytes of storage and have
only 32 gigabytes left.

To see this, look at line one and
ignore the other lines.

Like many Unix commands, df
gives you more information that you
want initially.

Likely as not, the information you
will want from df is all on
the first line. At least, that's
true in this case.

Ed Abbott