Displaying more with less
=========================

by Wolfgang Friebel (Wolfgang.Friebel@desy.de)

Who does not know the situation: You just downloaded a file from the Internet
that is supposed to contain a promising program. But before extracting and
installing it you wanted just to look into the README or consult the
accompanying man page. Yes, you know how to use tar and zip commands, but how
the heck do you extract a single file from an RPM archive. Yes, there are
tools like Midnight Commander and such that do the job quite well, at least
when running Linux. Or do you always remember the options for the man command
when you want to display a man page not listed in the MANPATH?
For all these problems there are solutions of course, but in some cases that is
asking too much of casual UNIX users.
To browse files under UNIX you can use the excellent viewer less [1], the better
alternative to "more". By making use of the environment variable LESSOPEN, less
can be enhanced by external filters to become even more powerful.
Most Linux distributions come already preconfigured with a filter "lesspipe.sh"
that covers the most common situations.
I would like to present here an input filter for less that is understanding a
lot of the more common file formats. It is easily extendable for new formats
to be included.
The input filter which is also called "lesspipe.sh" is written in a ksh
compatible language (ksh, bash, zsh) as one of these is nearly always installed
on UNIX systems and uses comparably few resources. Otherwise an implementation
in perl for example would have been somewhat simpler to code.
The input filter lesspipe.sh is based on two main ideas. The recognition of the
file format is not based on the file suffix. This method from the DOS world is
error prone and keeping the suffix list up to date is a tedious job.
UNIX comes with the "file" command [2] that recognizes lots of formats. Up to
date file descriptions are included in the tarball, maintaining a list of
file formats is therefore only a matter of obtaining a current version of the
"file" package.
The second idea is to being able to call lesspipe.sh with a hierarchy of file
names and to pull out finally the file at the bottom of the hierarchy. This
would allow to look at individual files contained in an archive which itself
could be part of a still bigger archive.
As lesspipe.sh is accepting only a single argument, a hierarchical list of file
names has to be separated by a nonblank character. As the colon is rarely found
in file names, it has been chosen as the separator character. At each stage in
extracting files from such a hierarchy the file type is determined. This
guarantees a correct processing and display at each stage of the filtering.
To give an example I show, how one could display the man page "file.man"
found in the RPM source archive file-xxx.spm.
The less command enhanced with the lesspipe.sh filter

less file-3.27-43.i386.spm

yields the following output

...
SuSE series: a
-rw-r--r--   1 root     root        12953 Feb  3 11:45 file-3.27.dif
-rw-r--r--   1 root     root       123541 Jul  6  1999 file-3.27.tar.gz
-rw-r--r--   1 root     root         3398 Mar 25 07:31 file.spec

then the command

less file-3.27-43.i386.spm:file-3.27.tar.gz

produces the output

...
-rw-rw-r-- christos/christos  8740 1999-02-14 18:16 file-3.27/file.c
-rw-rw-r-- christos/christos  4886 1999-02-14 18:16 file-3.27/file.h
-rw-rw-r-- christos/christos 13428 1999-02-14 18:16 file-3.27/file.man
...

The desired man page can finally be viewed with

less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.man

The subcomponents of the argument to less were easily obtained by cut and paste
using information contained in the previous lines of output.
If you would have liked to display the nroff sources instead, appending
another colon at the end of the argument would have done the job:

less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.man:

If the man page was even compressed (e.g. as file.man.gz) it would have been
uncompressed anyway. To also disallow uncompressing the source file.man.gz
a second colon would have to be appended to the argument.

Even extracting single files from an archive is possible, like with

less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.c > file.c

The script is able to extract files up to a depth of 6 where applying a
decompression algorithm counts as a separate level. In a few rare cases the
file command does not recognize the correct format (especially with nroff).
In such cases the filtering can be suppressed by a trailing colon on the file
name.
The most recent additions to lesspipe.sh allow you to browse M$ Word files
(using the very fast antiword command) and looking at contents of DOS
formatted disks by accessing the proper device file.

To activate lesspipe.sh you have to define the environment variable LESSOPEN
in the following way:

LESSOPEN="|lesspipe.sh %s"; export LESSOPEN  (sh like shells)
setenv LESSOPEN "|lesspipe.sh %s"            (csh, tcsh)

If the wrong lesspipe.sh is in the UNIX search path or if lesspipe.sh is
not in your search path, then the full path to lesspipe.sh should be given
in the above commands.

Currently lesspipe.sh [3],[4] supports the following formats:

Compressed files
================
gzip and compress	uncompressed with gzip -c -d
bzip2			uncompressed with bzip2 -c -d
zip			uncompressed with unzip -lv (extracting with unzip -avp)

Other file formats
==================
tar			using GNU tar tvf (extracting files with tar 0xf)
nroff			using groff -s -p -t -e -Tascii -mandoc
ar library		using ar vt (extracting with ar p)
nm shared lib		using nm
executable		using strings
directory		using ls -lAL
rpm			using rpm -qiv -p and rpm2cpio | cpio -i -tv
			(extracting with rpm2cpio and GNU cpio)
Debian			using dpkg -c (extracting with dpkg --fsys-tarfile)
html			using lynx -dump
Word			using antiword
pdf			using pdftotext
unmounted media		using tar or mdir (extracting with mtype)
rtf			using unrtf

-------------
[1] http://home.flash.net/~marknu/less
[2] ftp://ftp.astron.com/pub/file
[3] http://www.ifh.de/~friebel/unix/lesspipe.html
[4] ftp://ftp.ifh.de/pub/unix/utility/lesspipe.tar.gz
