Useless Use of Backticks -- An Example

This page is a companion to my Useless Use of Cat Award Page and probably not much fun unless you've read the Award page already.

This is just an annotated example shell script snippet, with some observations of two phenomena you see a lot in newbie scripts.

The Example

The following snippet is something I cooked up as an example of Useless Use of wc -l. Of course, it's overdone, and also contains several Useless Uses of Backticks, too. It's also a bit off the mark because it's not a real-world example.

Don't spend too much time studying this, in other words:

if [ `echo \`cat food | grep *.* | wc -l\` | grep -v 0 | wc -l` gt 0 ]; then ...
The point I originally wanted to make was that if you're using wc just to see if something produced any output, you're probably doing something wrong. Particularly if the "something" was a grep.

However misdirected this example might be, it might be worth considering what the hypothetical newbie author was trying to accomplish, and why it came out this way. (For the newbie authors out there, it also serves as a warning to not post "live" code examples of your pretzel logic, because too many syntax errors and clumsy constructions will divert attention away from your actual question, and in many situations produce replies at least as long as this web page, and probably at least as intimidating :-)

Backticks Inside Backticks

Most newbies can't figure out how to get backticks inside backticks, so this example is somewhat misleading in that respect. But let's take this example apart a little bit, or rather try to understand it with some more examples.

Let's say the program "searchpattern" produces a string which you later want to look for in some files with the aid of grep. So you say

	grep `searchpattern` file1.txt file2.txt
This means, take the output of searchpattern and use that as the first parameter for grep (to which the first parameter means the pattern to look for. Oftentimes, you want to look for a literal string, not a regular expression, in which case you should probably use fgrep instead, or massage the output from searchpattern a little bit before handing it over to grep).

Next, you find that the output of grep needs to be assigned to a variable, so you need a second pair of backticks. The basic syntax for that is

	VAR=`grep ...`
but now you'd like the actual grep to use backticks, too. You need to escape the inner set of backticks:
	VAR=`grep \`searchpattern\` file1.txt file2.txt`
Incidentally, POSIX specifies a nicer alternate syntax for this which doesn't lead to heavy backslashitis when you want something like backticks inside backticks inside backticks inside backticks:
	VAR=$(grep $(searchpattern) file1.txt file2.txt)
This is not compatible with Bourne ClassicTM but it sure is convenient for your private Bash or Ksh scripts.

Regular Expression Errors

My example contains a subtle syntax error -- the pattern *.* is not a valid regular expression (and does in fact lead one to believe that (a) the script author has had previous exposure to MS-DOG scripting, which is probably a Bad Thing, just like they used to think experience with BASIC would make you a bad high-level language programmer; and (b) the author needs to get straight the difference between glob patterns and regular expressions once and for all).

To say "any string" in regex-ese, you say

	.*
which means any character (.) any number of times (*). But that includes zero times, so it's completely redundant to search for this; every concievable input line will match this pattern.

Probably the author would be content to find any one character (followed by, and in theory preceded by, anything, but that's implicit in how grep works):

	.
(yes, that's a single dot) or perhaps any non-whitespace character:
	[^ 	]
(that's open square bracket, caret, space, tab, close square bracket, or in slightly higher-level terms, any one character other than space or tab, or newline, of course).

There is a second problem with the second grep, which is however more of a thinko. The intent is probably to throw away matches where wc produced a line count of zero lines, but this expression will of course throw away anything with the character zero anywhere in it (such as if wc found ten lines, or twenty, or a hundred). Perhaps a better guess would be to use grep's -x option:

	$ wc -l </dev/null | grep -vx 0
	       0
Whoops, that didn't work out either. This is because the output from wc is padded with spaces (you could find this out by piping it to od or cat -A or viz or whatever your system has for looking at character codes). We have to ask grep to tolerate leading whitespace:
	$ wc -l </dev/null | grep -v '^[ 	]*0'
That regular expression we pass to grep means beginning of line, followed by a character class containing space and tab, any number of times, followed by a literal zero.

Syntax Errors Fixed

There is one more syntax error remaining before we can start dissecting the problems with this script. Namely, the gt argument to test needs a dash in front of it.

You may not be aware of this, but the open bracket often seen after the if keyword is actually the name of a program which is also known by the name test. This is the reason why naming your own programs test is a bad idea, by the way. The test program is usually implemented as a built-in in modern shells, but it doesn't have to be, and indeed in "classic" Unix, it was an external program.

Unfortunately, there are various incompatible versions of test out there, and not all of them even understand the -gt (numeric greater than) test. Putting that aside for the moment, we now have the following fixed shell code snippet:

	if [ `echo \`cat food | grep . ...
Hold it.

Since readers should by now be painfully aware of the Useless Use of Cat Award, we might as well fix that immediately:

if [ `echo \`grep . food | wc -l\` | grep -v '^[ 	]*0' | wc -l` -gt 0 ]; then ...
Now we're ready to start dissecting this script.

Dissecting the Example Script

Let's start by resolving the innermost set of backticks. These say, look for non-blank lines in the file food, and count the number of lines.

Like we already saw on the Award Page, this can be shortened, because grep already knows how to report the number of matching lines. This can be repeated with the outer backticks, too:

	if [ `echo \`grep -c . food\` | grep -cv ...
Hold it again. This is one of the really classical examples of completely redundant backticks. The command
	echo `backticks`
produces exactly the same thing as just the program backticks, n'est-ce pas? (Strictly speaking, that is not true, because (a) the backticks will trim away all trailing newlines in the output, and (b) because the argument to echo is not quoted, any runs of whitespace -- including newlines -- will be replaced with single space characters in the output from echo, but none of that makes any significant difference here.)

Taking into account the fact that grep -c doesn't produce space-padded output (so we can simplify the regular expression for the second grep, which we adjusted to cater for wc's space-padded ouput format above) we now have

	if [ `grep -c . food | grep -cv '^0'` -gt 0 ]; then ...
Of course, the silly check against zero output lines is completely redundant, so we take it out:
	if [ `grep -c . food` -gt 0 ]; then ...
and this is already almost decent-looking.

However, we can simplify this even more if we understand what this does. So let's dissect it a little bit more. Remember that the open bracket is actually the name of a program, test? What the if builtin does is, run a program, and look at its exit code. If the exit code is zero (this is the conventional exit code for success on Unix), take the then branch. Otherwise, if there is an else branch, take that instead.

As it quite conveniently happens, grep and all other well-behaving Unix programs return an exit code which is useful precisely for this -- if grep returns zero, it means there was a match, if it returns one, it means there wasn't (and if it's something else, it means there was some sort of error -- see the manual page for details). So in fact we can say

	if grep . food >/dev/null; then ...
(The redirection to /dev/null is necessary because we run grep simply for its side effect of setting the exit code. This would "work" without the redirection, but you'd end up having all matching lines in the file food copied to standard output. Not nice ...)

Incidentally, many newish implementations of grep have a -q option which means to not print anything, just set the exit code.

Conclusions?

Don't do that :-)

Seriously, the most important insight you should have from this is that exit codes from programs are useful and something you need to think about.

Generally, running programs for side effects they have is a tricky thing to do, and some schools of thought argue that you should not play too much with side effects. In the present case, the "side effect" of finding out whether there was a match in food is precisely why we're running grep, but if you're writing code that newbies need to understand, you should usually add a comment when you play around with side effects.

(Philosophical remark: Whether finding a match is a "side effect" of grep or not depends on how you define its primary purpose. The name historically stands for Global Regular Expression Printer and so any use where you don't specifically use grep to print out the matches is a use of side effects. On the other hand, as the presence of the -q option sort of proves, you could argue that finding matches is the primary purpose of grep, and printing them is a side effect. Sort of ...)

Back to Useless Use of Cat Award Page


$Id: award-example-backticks.prep,v 1.8 1999/08/26 08:32:35 era Exp $