Displaying more with less
=========================

by Wolfgang Friebel (Wolfgang.Friebel@desy.de)

Who does not know the situation: You just downloaded a file from the Internet
that is supposed to contain a promising program. But before extracting and
installing it you wanted just to look into the README or consult the
accompanying man page. Yes, you know how to use tar and zip commands, but how
the heck do you extract a single file from an RPM archive. Yes, there are
tools like Midnight Commander and such that do the job quite well, at least
when running Linux. Or do you always remember the options for the man command
when you want to display a man page not listed in the MANPATH?
For all these problems there are solutions of course, but in some cases that is
asking too much of casual UNIX users.
To browse files under UNIX you can use the excellent viewer less [1], the better
alternative to "more". By making use of the environment variable LESSOPEN, less
can be enhanced by external filters to become even more powerful.
Most Linux distributions come already preconfigured with a filter "lesspipe.sh"
that covers the most common situations.
I would like to present here an input filter for less that is understanding a
lot of the more common file formats. It is easily extendable for new formats
to be included.
The input filter which is also called "lesspipe.sh" is written in a ksh
compatible language (ksh, bash, zsh) as one of these is nearly always installed
on UNIX systems and uses comparably few resources. Otherwise an implementation
in perl for example would have been somewhat simpler to code.
The input filter lesspipe.sh is based on two main ideas. The recognition of the
file format is not based on the file suffix. This method from the DOS world is
error prone and keeping the suffix list up to date is a tedious job.
UNIX comes with the "file" command [2] that recognizes lots of formats. Up to
date file descriptions are included in the tarball, maintaining a list of
file formats is therefore only a matter of obtaining a current version of the
"file" package.
The second idea is to being able to call lesspipe.sh with a hierarchy of file
names and to pull out finally the file at the bottom of the hierarchy. This
would allow to look at individual files contained in an archive which itself
could be part of a still bigger archive.
As lesspipe.sh is accepting only a single argument, a hierarchical list of file
names has to be separated by a nonblank character. As the colon is rarely found
in file names, it has been chosen as the separator character. At each stage in
extracting files from such a hierarchy the file type is determined. This
guarantees a correct processing and display at each stage of the filtering.
To give an example I show, how one could display the man page "file.man"
found in the RPM source archive file-xxx.spm.
The less command enhanced with the lesspipe.sh filter

less file-3.27-43.i386.spm

yields the following output

...
SuSE series: a
-rw-r--r--   1 root     root        12953 Feb  3 11:45 file-3.27.dif
-rw-r--r--   1 root     root       123541 Jul  6  1999 file-3.27.tar.gz
-rw-r--r--   1 root     root         3398 Mar 25 07:31 file.spec

then the command

less file-3.27-43.i386.spm:file-3.27.tar.gz

produces the output

...
-rw-rw-r-- christos/christos  8740 1999-02-14 18:16 file-3.27/file.c
-rw-rw-r-- christos/christos  4886 1999-02-14 18:16 file-3.27/file.h
-rw-rw-r-- christos/christos 13428 1999-02-14 18:16 file-3.27/file.man
...

The desired man page can finally be viewed with

less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.man

The subcomponents of the argument to less were easily obtained by cut and paste
using information contained in the previous lines of output.
If you would have liked to display the nroff sources instead, appending
another colon at the end of the argument would have done the job:

less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.man:

If the man page was even compressed (e.g. as file.man.gz) it would have been
uncompressed anyway. To also disallow uncompressing the source file.man.gz
a second colon would have to be appended to the argument.

Even extracting single files from an archive is possible, like with

less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.c > file.c

As less is not passing all bytes to STDOUT (e.g. it is suppressing binary 0)
it is recommended to invoke lesspipe.sh directly:

lesspipe.sh file-3.27-43.i386.spm:file-3.27.tar.gz:: > file-3.27.tar.gz

Here the two colons after file-3.27.tar.gz are required to suppress the
unzipping of the resulting file and to extract the tar file instead of
interpreting it.

The script is able to extract files up to a depth of 6 where applying a
decompression algorithm counts as a separate level. In a few rare cases the
file command does not recognize the correct format (especially with nroff).
In such cases the filtering can be suppressed by a trailing colon on the file
name.
The most recent additions to lesspipe.sh allow you to browse M$ Word files
(using the very fast antiword command) and looking at contents of DOS
formatted disks by accessing the proper device file.

To activate lesspipe.sh you have to define the environment variable LESSOPEN
in the following way:

LESSOPEN="|lesspipe.sh %s"; export LESSOPEN  (sh like shells)
setenv LESSOPEN "|lesspipe.sh %s"            (csh, tcsh)

If the wrong lesspipe.sh is in the UNIX search path or if lesspipe.sh is
not in your search path, then the full path to lesspipe.sh should be given
in the above commands.

Recently syntax highlighting was added through a perl script 'syntax' which
is derived from code2html [5]. That script comes with colorizing support for
the languages ada, asm, awk, c, c++, groff, html, xml, java, javascript, lisp,
m4, make, pascal, patch, perl, povray, python, ruby shellscript and sql.
The choice of colors is just a first guess (proof of concept) and still needs
refinements.

ATTENTION: Syntax highlighting is only activated if the environment variable
LESS is existing and contains the option -R or -r or less is called with one
of these options. This guarantees, that instead of literal escape sequences
colors are displayed.

As syntax highlighting is rather resource intense it can be switched off by
appending two colons after the file name. On the contrary to force syntax
highlighting a colon followed by a suffix has to be appended to the file name
as follows (assuming this is a file with perl syntax):

less config_file:.pl

The following suffixes are recognized:
.ada .asm .inc .awk .c .h .cpp .cxx .groff .html .php .xml .java .js .lsp .m4
Makefile .pas .patch .diff .pm .pl .pod .pov .py .rb .sh .sql

Currently lesspipe.sh [3],[4] supports the following formats:

Compressed files
================
gzip, pack and compress	uncompressed with gzip -c -d
bzip2			uncompressed with bzip2 -c -d
zip			uncompressed with unzip -lv (extracting with unzip -avp)

Other file formats
==================
tar			using GNU tar tvf (extracting files with tar 0xf)
nroff			using groff -s -p -t -e -Tascii -mandoc
ar library		using ar vt (extracting with ar p)
nm shared lib		using nm
executable		using strings
directory		using ls -lAL
rpm			using rpm -qiv -p and rpm2cpio | cpio -i -tv
			(extracting with rpm2cpio and GNU cpio)
Debian			using dpkg -c (extracting with dpkg --fsys-tarfile)
html			using lynx -dump or html2text -style pretty
Word			using antiword
pdf			using pdftotext
unmounted media		using tar or mdir (extracting with mtype)
rtf			using unrtf
dvi			using dvi2tty
ps			using pstotext or ps2ascii and gs
mp3			using mp3info
iso images		using isoinfo

-------------
[1] http://www.greenwoodsoftware.com/less/
[2] ftp://ftp.astron.com/pub/file
[3] http://www.ifh.de/~friebel/unix/lesspipe.html
[4] ftp://ftp.ifh.de/pub/unix/utility/lesspipe.tar.gz
[5] http://www.palfrader.org/code2html/
