NETLIB Index Format Guidelines

Introduction:

Netlib should be not just a warehouse but a library, and for that it must have adequate search tools. Each chapter (library) of netlib comes with an "index" file in a specified format, to promote searchability. The netlib find command merely uses Unix grep to perform a keyword search of all the index files, but more sophisticated search engines can be constructed.

For general help about netlib, see our home page.

Index file format:

An index file contains one paragraph per regular-file or directory, each paragraph ending with an empty line. Each line of the paragraph starts with a attribute, white space, and a corresponding value. If no value is known or applicable, that line is omitted. If all the files in a directory share a value, that line is moved up in the hierarchy to the directory entry in the parent index. Long values (e.g. past column 80) should be continued by starting the next line with a comma.

The entry for a regular file starts with the attribute "file" and a full path name relative to netlib. Directories use the attribute "lib". (There is exactly one file or library described per paragraph.)

The attributes and allowed values or {intended meaning} are:

key   value
~~~   ~~~~~
file	path/name   {use lower case;  many systems confuse upper/lower}
lib	path/name
for	{what problem does it solve?  This should be intelligible out of
	context, since a common use of the index is keyword search. }
alg	{algorithm: what methods does it use?}
by	{name, email address, ...}
contact	{email address of group that might answer polite questions}
ref	{terse citation}
gams	classification codes  {"send gams from bib"}
size	999kB
prec
	single		{half;  may be written "single real" in some contexts}
	double		{full}
	single/double	{contains both precisions with #ifdef or other switch}
	complex
	doublecomplex
see	{related files}
title	{if not same as file}
master	{site from which all others mirror this file or directory}
editor	{name, email address    appointed by editor-in-chief}
rel	excellent	{widely tested code; firm theory; tractable problem}
	good		{good reputation, but intrinsically hard problem}
	ok		{some counterexamples, but as good as most alternatives}
	weak		{fails without warning; better methods known}
age	stable		{untouched in years; author still supports it}
	old		{untouched in years; no one supports it}
	research	{believed stable, but still relatively new}
	experimental	{known to need polishing}
	superseded	{Obsolete; kept for archival purposes}
kind	function	{library routine  (This is the default.)}
	command		{standalone program}
	data		{input data, measurements, sample output, etc.}
	text		{documentation}
lang	{Omit if the filename suffix matches one of the implicit rules:}
	.ada	Ada
	.awk	awk
	.bas	Basic
	.C	C++
	.c	C
	.f	Fortran-77
	.html	HTML (hypertext)
	.pdf	Adobe Acrobat Portable Document File
	.ps	PostScript
	.r	Ratfor
	.sed	UNIX stream editor script
	.tex	TeX (commonly LaTeX)
	.bib	BibTex input
	.bbl	BibTeX output
	.dvi	TeX device independent typeset output
	.ltx	LaTeX wrapper file, e.g. for printing bibliography
	.sok	spelling exception dictionary for .bib file
	.sub	bibnet substitution file for citesub
	.twx	title-word cross-reference file
	{see also crc/net/compressed for more suffixes}
keywords	{terms as would be drawn from a subject thesaurus}
		{see toms/index for examples}
instance_of	{file that this is derived from by some automatic means
	such as compression.   The practice is discouraged, but may be
	necessary in isolated cases.}
encoding	{encoding of the files.  In "mime-type" format.  For example,
	application/x-tar.  They are listed left to right in the order that
	they were encoded, separated by white space.  This may be elided
	if obvious from the filename, as in foo.tar.gz. }
,	continuation
#	{miscellaneous comments;  usually bad, because automated tools
	won't know what to do with them.  Use the proper attribute from this
	list, and put general information in the "readme" file.}

In general, we do NOT put information in the index that can be derived automatically. For example, we do not put the revision date, because that can be obtained from netlib's checksum file or by an ftp list command. One exception is size; for files larger than 200kB, a remark is justified to warn people that might inadvertently download something much larger than they expected.

For several of the attributes (such as gams, see, lang) the attributes may be a comma-separated list.

The index is as flexible as we could think to make it while retaining a simple model. A "flat" list is unfortunate, but it appears that the author of a (somewhat) factorizable naming scheme can write a script to generate the index file more easily and with more certainty than outsiders can parse a more sophisticated index.

Besides the index file, each chapter also has (in principle) the files

readme		general, unstructured information about the chapter;
changes		who did what to which files when;
index.html	same as index, but reformated for World Wide Web clients
.depend		which symbols are defined in which files;
links.html	pointers off to outside resources; some netlib cross-links
The latter two are generated automatically and are not listed in the index.

There are two kinds of "test" subdirectories

  ex	example "drivers"
  chk	self-testing, for use during installation or to check compilers, etc.

Finally, here are some attributes that are not yet in use in netlib, but may be in the future. See the RIG Proposed Standard RPS-0002 (1994), A Uniform Data Model for Reuse Libraries (UDM), available from the Reuse Library Interoperability Group via AdaNET at 800-444-1458. Their concept of "Asset" corresponds very roughly to netlib's concept of "lib"; their "Element" is our "file".

URN		"Universal Resource Name"  (when appropriate mechanism is
		someday agreed upon by the Internet community)
Abstract	an extended version of "for" and maybe other material
AcceptanceDate	date first version entered the collection
ComplianceToStandards	e.g. does the source assume only ANSI C, or also
		POSIX.1, or TCP/IP, or Winsock
Encrypted	true/false. By some other means you ask how and with what key.
DerivedFrom	(This might allow some of the searches made possible by Science
		Citation Index in normal publication.)

Netlib Hierarchy:

The organization of netlib is based on a logical ordering of packages and software into various libraries. Generally, most new contributions can be placed into a pre-existing library (e.g., linalg, c++). Larger contributions, especially those with several seperable modules, can be placed into a sub-library within a top-level library. If the new package is large enough (or doesn't logically fit into a top level library) a new top level library will be created.

An example:

Top Levels libraries                 Files and sub-libraries

|--- index aicm ---------------| . |--- smmp -- This is a file, listed in . the aicm index as: . file aicm/smmp . . . |--- cbdsqr.f . | . . | . . | . lapack ---------------|--- index . | . . | . . |--- lawns/ -- This is a sub-library, . | . listed as: lib lapack/lawns . | . This library will also have . | . an index file in it. . |--- zupmtr.f . . other libraries

The depth of nesting of libraries is limited only by what makes sense.

Change log:

15 May 1992 Eric Grosse First draft.

6 Apr 1994 Eric Grosse Changed the infrequently used attribute "name" to "title", which still was descriptive enough for the existing uses, but now is suggestive for the growing number of PostScript files.

19 Apr 1994 Eric Grosse Dropped the unused contact, obj, and hdr attributes. Added "keywords" attribute at Sloane's suggestion.

21 Apr 1994 Stan Green sgreen@cs.utk.edu Added "instance_of", "encoding", and hierarchy description.

23 Apr 1994 Reed Wade Changed "tab" to "white space". [I strongly encourage writers to use tabs, but readers should be prepared to accept white space instead. Similarly, writers should not use more than 80 characters per line, but readers should accept at least 512. - ehg]

12 Jun 1994 Reed Wade, Eric Grosse, Tom Rowan Changed "leading white space" to "comma" for continuation convention. "readme" and "changes" explicitly in index files, for Mosaic users. Numerical clarifications.

11 Jul 1994 Shirley Browne Removed .shar and .tar from "lang". Standardize on "doublecomplex".

22 Jul 1994 Shirley Browne Changed "keyword" to "attribute" and better described "keywords".

2 Sep 1994 Shirley Browne "contact"

21 Feb 1995 Eric Grosse started converting to HTML

12 Feb 1996 Eric Grosse added links.html


Please send comments to Eric Grosse, ehg@research.bell-labs.com.