This page is a companion to my Useless Use of Cat Award Page and probably not much fun unless you've read the Award page already.
This is just an annotated example shell script snippet, with some observations of two phenomena you see a lot in newbie scripts.
wc -l
. Of course, it's overdone, and
also contains several Useless Uses of Backticks, too. It's also
a bit off the mark because it's not a real-world example.
Don't spend too much time studying this, in other words:
The point I originally wanted to make was that if you're usingif [ `echo \`cat food | grep *.* | wc -l\` | grep -v 0 | wc -l` gt 0 ]; then ...
wc
just to see if something produced any output,
you're probably doing something wrong. Particularly if the "something"
was a grep.
However misdirected this example might be, it might be worth considering what the hypothetical newbie author was trying to accomplish, and why it came out this way. (For the newbie authors out there, it also serves as a warning to not post "live" code examples of your pretzel logic, because too many syntax errors and clumsy constructions will divert attention away from your actual question, and in many situations produce replies at least as long as this web page, and probably at least as intimidating :-)
Let's say the program "searchpattern" produces a string which you
later want to look for in some files with the aid of grep
.
So you say
This means, take the output ofgrep `searchpattern` file1.txt file2.txt
searchpattern
and use that
as the first parameter for grep
(to which the first parameter
means the pattern to look for. Oftentimes, you want to look for a literal
string, not a regular expression, in which case you should probably use
fgrep
instead, or massage the output from searchpattern a
little bit before handing it over to grep).
Next, you find that the output of grep needs to be assigned to a variable, so you need a second pair of backticks. The basic syntax for that is
but now you'd like the actual grep to use backticks, too. You need to escape the inner set of backticks:VAR=`grep ...`
Incidentally, POSIX specifies a nicer alternate syntax for this which doesn't lead to heavy backslashitis when you want something like backticks inside backticks inside backticks inside backticks:VAR=`grep \`searchpattern\` file1.txt file2.txt`
This is not compatible with Bourne ClassicTM but it sure is convenient for your private Bash or Ksh scripts.VAR=$(grep $(searchpattern) file1.txt file2.txt)
*.*
is not a valid regular expression (and does in fact lead one to
believe that (a) the script author has had previous exposure
to MS-DOG scripting, which is probably a Bad Thing, just like they
used to think experience with BASIC would make you a bad high-level
language programmer; and (b) the author needs to get straight
the difference between glob patterns and regular expressions once and
for all).
To say "any string" in regex-ese, you say
which means any character (.) any number of times (*). But that includes zero times, so it's completely redundant to search for this; every concievable input line will match this pattern..*
Probably the author would be content to find any one character (followed by, and in theory preceded by, anything, but that's implicit in how grep works):
(yes, that's a single dot) or perhaps any non-whitespace character:.
(that's open square bracket, caret, space, tab, close square bracket, or in slightly higher-level terms, any one character other than space or tab, or newline, of course).[^ ]
There is a second problem with the second grep, which is however more
of a thinko. The intent is probably to throw away matches where
wc
produced a line count of zero lines, but this
expression will of course throw away anything with the character zero
anywhere in it (such as if wc found ten lines, or twenty, or a hundred).
Perhaps a better guess would be to use grep's -x
option:
Whoops, that didn't work out either. This is because the output from$ wc -l </dev/null | grep -vx 0 0
wc
is padded with spaces (you could find this out by
piping it to od
or cat -A
or
viz
or whatever your system has for looking at character
codes). We have to ask grep to tolerate leading whitespace:
That regular expression we pass to grep means beginning of line, followed by a character class containing space and tab, any number of times, followed by a literal zero.$ wc -l </dev/null | grep -v '^[ ]*0'
gt
argument to test
needs a dash in front of it.
You may not be aware of this, but the open bracket often seen after
the if
keyword is actually the name of a program which
is also known by the name test
. This is the reason why
naming your own programs test
is a bad idea, by the way.
The test program is usually implemented as a built-in in modern
shells, but it doesn't have to be, and indeed in "classic" Unix, it
was an external program.
Unfortunately, there are various incompatible versions of
test
out there, and not all of them even understand the
-gt
(numeric greater than) test. Putting that aside for
the moment, we now have the following fixed shell code snippet:
Hold it.if [ `echo \`cat food | grep . ...
Since readers should by now be painfully aware of the Useless Use of Cat Award, we might as well fix that immediately:
Now we're ready to start dissecting this script.if [ `echo \`grep . food | wc -l\` | grep -v '^[ ]*0' | wc -l` -gt 0 ]; then ...
food
,
and count the number of lines.
Like we already saw on the
Award Page,
this can be shortened, because grep
already knows
how to report the number of matching lines. This can be repeated with
the outer backticks, too:
Hold it again. This is one of the really classical examples of completely redundant backticks. The commandif [ `echo \`grep -c . food\` | grep -cv ...
produces exactly the same thing as just the programecho `backticks`
backticks
, n'est-ce pas?
(Strictly speaking, that is not true, because
(a)
the backticks will trim away all trailing newlines in the output, and
(b)
because the argument to echo is not quoted, any runs of whitespace
-- including newlines -- will be replaced with single space characters
in the output from echo,
but none of that makes any significant difference here.)
Taking into account the fact that grep -c
doesn't produce
space-padded output (so we can simplify the regular expression for the
second grep, which we adjusted to cater for wc's space-padded ouput
format above) we now have
Of course, the silly check against zero output lines is completely redundant, so we take it out:if [ `grep -c . food | grep -cv '^0'` -gt 0 ]; then ...
and this is already almost decent-looking.if [ `grep -c . food` -gt 0 ]; then ...
However, we can simplify this even more if we understand what this
does. So let's dissect it a little bit more. Remember that the open
bracket is actually the name of a program, test
? What the
if
builtin does is, run a program, and look at its exit
code. If the exit code is zero (this is the conventional exit code for
success on Unix), take the then
branch. Otherwise, if
there is an else
branch, take that instead.
As it quite conveniently happens, grep
and all other
well-behaving Unix programs return an exit code which is useful
precisely for this -- if grep returns zero, it means there was a
match, if it returns one, it means there wasn't (and if it's something
else, it means there was some sort of error -- see the manual page for
details). So in fact we can say
(The redirection to /dev/null is necessary because we run grep simply for its side effect of setting the exit code. This would "work" without the redirection, but you'd end up having all matching lines in the fileif grep . food >/dev/null; then ...
food
copied to standard output. Not nice ...)
Incidentally, many newish implementations of grep have a
-q
option which means to not print anything, just set the
exit code.
:-)
Seriously, the most important insight you should have from this is that exit codes from programs are useful and something you need to think about.
Generally, running programs for side effects they have is a tricky
thing to do, and some schools of thought argue that you should not
play too much with side effects. In the present case, the "side effect"
of finding out whether there was a match in food
is
precisely why we're running grep, but if you're writing code that
newbies need to understand, you should usually add a comment when
you play around with side effects.
(Philosophical remark: Whether finding a match is a "side effect" of grep or not depends on how you define its primary purpose. The name historically stands for Global Regular Expression Printer and so any use where you don't specifically use grep to print out the matches is a use of side effects. On the other hand, as the presence of the -q option sort of proves, you could argue that finding matches is the primary purpose of grep, and printing them is a side effect. Sort of ...)
Back to Useless Use of Cat Award Page