The diff command under Linux
is one of my favorite Unix commands.
It's an ancient command that's still very
useful in a modern world. I call any command
that dates back to the 1970's ancient.
I was recently asked by someone over the
phone how to find the changes made to a
website by a web developer. How do you
find their changes if you have a complete
copy of the website before the changes and
a complete copy of the website after the
changes?
I told him that you need 3 things to do
this:
- A complete copy of the website
before and after - The Unix ls -lt command
- The diff command
Start by looking at the complete copy
of the new website. Start in the topmost
directory (folder) of the website and do
this command:
ls -lt
This will give you a list of both files
and directories sorted in timestamp
order. Directories recently modified
need to be investigated further. Files
recently modified need to be noted.
In any case, both files and directories
of recent vintage will rise to the top
of the ls -t listing.
Using this list, you can easily find things
that have been modified after the web developer
(who made changes) took over.
If a file, make a note. If a directory, look
further.
Keep looking into directories that have been
modified since the new web developer started
working on the site. Once you've found all
the files that have been modified after a certain
date, you are done with ls -lt.
This will take less time than it might seem as
web developers typically only modify a few files
on each occasion that they work on a site. For
example, if the web developer only worked on the
Contact Us page, this may be the only file
that was modified. This being the case, you will
find the file relatively quickly.
Next, use the diff command to figure out
what changed on the Contact Us page.
Here's how you might use the diff command
hypothetically:
diff ../old/contactus.html ../new/contactus.html >temp
I've fictionalized the directories where the
old and new Contact Us pages would be
found. Undoubtedly, you will have to do a bit
more typing than I did in my hypothetical example
above to get a diff on the two files.
Notice that I've placed the difference between
the two files in a file called temp. This
is a temporary file that has all the changes.
If the changes are not too extensive, the file
called temp will be quite short. It could
be something as simple as a new phone number or
a new business address.
A Contact Us page consists of contact
information so the changes to it would not
necessarily be anything more than a slight
update.
How long would it take me to find all of this
out? Discounting the time it takes me to obtain
two copies the the two websites, I'd say maybe
5 minutes.
Here's the steps I would take in that 5 minutes:
- Find the most recent timestamp in the old
copy of the website. In other words, do a
ls -lt on the old topmost directory. Be
sure to discount things like server logs and
other things that are automatically updated - Use the timestamp discovered at the old site
to determine what is new at the new site - Do the steps given above to discover what
files are newer than the timestamp discovered
on the old copy of the website
That, in a nutshell, would be how I would discover
work done recently by a web developer. Here
are some basic principles that are at work here:
- In life you generally need a reference point
if you are to get anywhere. In this case the
reference is the file last worked on on the old
copy of the website - In life, it is helpful to know how far you've
come since you last referenced where you were. The
technique of using ls -lt to progressively
descend directories looking for recent file changes
to the new copy of the website does this. It tells
you have things have progressed since the last
checkpoint - It helps to have a basis of comparison. The
diff command gives you a wonderful way to
compare two files looking for changes
Because of their primitive nature, I don't know
of anything that supersedes the old Unix command-line
commands. I've never ever discovered anything that
is quite like them in flexibility, scope, and power.
Of course, it takes a little bit of creativity to
combine and use these commands effectively. If there
is a downside, that would be it. You cannot be half
asleep and use Unix commands effectively. You have
to be a person who does not mind exercising a little
creativity. If you enjoy being creative, Unix command-line
commands may be for you.
One more thing: I've oversimplified things somewhat
to help you find files on a small website. On a
large website you may have problems with the
techniques outline above.
While the parent directory of a file will reflect the
most recently modified file in the parent directory, the
grandparent directory will not necessarily reflect this
same timestamp. In other words, the topmost directory
of a website does not necessarily reflect the most recently
modified file on the website. That's one problem.
Another problem is if you have to go digging through many
many files and directories. If this is the case, the
technique outlined above could prove difficult or impossible.
In the case of added complexity, you may wish to change
technique somewhat. The following web page describes how
to use the find command to find a file of recent
vintage:
Using the -newer option of the find command
Even though the find command is more efficient
if you absolutely need to know the most recently modified
files on a website, the ls -lt command is still
useful. The ls -lt is a much quicker and simpler
way to survey the general situation and get a general
take on how recently the website has been modified.
There's another principle at work here and that's scaling
your solutions to your problems. In life you generally
don't want to use a large-scale solution to solve a small
problem. You typically would not use a backhoe to dig
a hole for a fence post.
So, depending on the scale of what you are looking at,
either find -newer or ls -lt may be the
ideal tool to pick up and use in your situation.
Ed Abbott
No comments:
Post a Comment