\author{Generated from the Hypertext}\title{CERN Server User Guide} \maketitle \cleardoublepage \pagenumbering{roman} \setcounter{page}{1} \tableofcontents \cleardoublepage \pagenumbering{arabic} \setcounter{page}{1}


\chapter{{}
CERN httpd 3.0
Guide for Prereleases}

CERN WWW Server
\lbrack {\tt httpd}, HyperText Transfer Protocol Daemon\rbrack  is a generic,
full featured server for serving files using the HTTP protocol.
This is a TCP/IP based protocol running by convention on port 80. \par 

Files can be real or synthesized, produced by scripts generating
virtual documents.  It handle clickable images, fill-out forms, and
searches etc. \par 

CERN {\tt httpd} can also be run as a proxy server to allow people behind firewalls
to use the Web as if the firewall was not present.  A powerful
feature is caching performed by the
proxy, which makes {\tt cern\_httpd} as proxy attract even
those not inside a firewall. \par 


\begin{itemize}
\item  This documentation is also available in PostScript.

\item  Documentation for older versions is still available: \lbrack 2.14 or older\rbrack  \lbrack 2.15\rbrack  \lbrack 2.16\rbrack  \lbrack 2.17 \& 2.18\rbrack .

\item  If you upgrade see also release notes for \lbrack 2.15\rbrack  \lbrack 2.16\rbrack  \lbrack 2.17\rbrack  \lbrack 2.18\rbrack  $<$A
	HREF="ReleaseNotes\_3.0pre.html"$>$\lbrack 3.0pre1-3\rbrack .

\item  {\bf Current VMS Version is 2.16beta.  See 
	distribution.} See also Foteos Macrides' fixes.  \par 
\end{itemize}


\par 
\section{In This Guide...}

\begin{DL}{allow this much space}

\item[Installation
] The steps necessary to install CERN server.

\item[Administration
] How to set up document protection, index search, clickable
images, server-side scripts, ...

\end{DL}

\par 


\section{About documents generated from hypertext}Paper manuals generated from hypertext
are made for convenience, for example
for reading when one has no computer
to turn to.  We have tried to make
the hypertext into fairly conventional
paper documents, but they may seem
a little strange in some ways.\par 
All the links have been removed.
Therefore, it is worth looking at
the table of contents to see what
there is in the manual.  Something
which is not explained in place may
be explained in detail elsewhere.\par 
We have tried to keep related matter
together, but sometimes necessarily
you might have to check the table
of contents to find it.\par 
Please remember that these are for
the most part "living documents".
That is, they are constantly changing
to reflect current knowledge. If
you see a statement such as "Product
xxx does not support this feature",
remember that it was the case when
the document was generated, and may
not be the same now.   So if in doubt,
check the online version. Of course,
the living document may be out of
date too, in which case it is helpful
to mail its author.


\chapter{{}
Installing CERN Server}

{\bf VMS note:}
There are special instructions if you are
installing under VMS. \par 

\par 

\section{Getting the Program}

CERN server distribution is available from {\tt info.cern.ch}
anonymous ftp account.
Often you don't need to compile the server yourself, precompiled
binaries are available for many Unix platforms.
If there is no precompiled version for your platform, of if it doesn't
work (e.g. the name resolution doesn't work), you should get the
source code and compile it yourself.

\begin{itemize}
\item  Precompiled versions can be found under directory
     
     {\tt ftp://info.cern.ch/pub/www/bin}
     (in the subdirectory corresponding your machine architecture). \par 

\item  Source code 
     {\tt ftp://info.cern.ch/pub/www/src/cern\_httpd.tar.Z}. \par 
     Compilation:
	\begin{itemize}
	\item  Uncompress and untar the distribution tar file:
\begin{verbatim}
        uncompress cern_httpd.tar.Z
        tar xvf cern_httpd.tar
\end{verbatim}


	\item  Go to newly-created {\tt WWW} directory, and give
	     command {\tt ./BUILD}:
\begin{verbatim}
        cd WWW
        ./BUILD
\end{verbatim}


	\item  Executable {\tt httpd} appears in directory
	     {\tt .../WWW/Daemon/sun4} (if you have a Sun4
	     machine), or in another subdirectory corresponding to
	     your machine architecture.  The utility programs go to
	     the same directory
	     ({\tt htadm},
	      {\tt htimage},
	      {\tt cgiparse} and
	      {\tt cgiutils}).
	\end{itemize}
\end{itemize}


\par 

\section{Configuration File}

\begin{itemize}
\item 	{\tt httpd} requires a configuration file, the
	default configuration file is {\tt /etc/httpd.conf}.
	If this doesn't suit you, you can specify another location to
	it using the {\tt -r } option:
\begin{verbatim}
        httpd -r <I>/other/place/httpd.conf</I>
\end{verbatim}


\item 	Sample configuration
	files are available from
	\begin{itemize}
	\item  directory {\tt cern\_httpd/config} inside the
	     binary distribution, or
	\item  under {\tt WWW/server\_root} inside the source code
	     distribution.
	\item  If this is missing you can get them from
	     
	     {\tt ftp://info.cern.ch/pub/www/src/server\_root.tar.Z}
	\end{itemize}
\end{itemize}

If you have all your documents in a single directory tree, say
{\tt /Public/Web}, the easiest way to make them available to
the world is to specify the following rule in your configuration file:
\begin{verbatim}
        Pass	/*	/Public/Web/*
\end{verbatim}

This maps all the requests under the directory
{\tt /Public/Web} and accepts them. \par 

The default welcome document (what you get with URL of form
{\tt http://your.host/}) is now {\tt Welcome.html} in
the directory {\tt /Public/Web}. \par 


\par 

\section{First Trying It Out In Verbose Mode}

Often it is easy to make mistakes in the configuration file that makes
configuring {\tt httpd} feel tedious - this doesn't have to be
so.  In the beginning start {\tt httpd} by hand in verbose mode
to listen to some port, and look what happens when you make a request
to that port with your browser. \par 

Typically test servers are run on a non-priviledged port above 1024
(you don't have to be {\tt root} to bind to them), often 8001,
8080, or such.  Official HTTP port is 80. \par 

The server port is defined in the configuration file with the {\tt Port} directive,
but you can override it with the {\tt -p } command line option
while testing; e.g.
\begin{verbatim}
        httpd -v -r /home/you/httpd.conf -p 8080
\end{verbatim}

This will start {\tt httpd} in verbose mode, use configuration
file {\tt httpd.conf} in your home directory, and accept
connections to port 8080. \par 

You can now try to request a document form your server using a URL of
form:
\begin{verbatim}
        http://your.host:8080/document.html
\end{verbatim}

where {\tt document.html} is relative to the directory that you
have exported in your configuration file. \par 

If you get an error message back see the verbose output to find out
what is going wrong - it is usually self-explanatory. \par 

And remember, you should always feel free to ask advice from
{\bf httpd@info.cern.ch}. \par 

\par 


\section{The Actual Installation of httpd}

In Unix you can run the server either as stand-alone, or from
Internet Daemon {\tt (inetd)}.
A stand-alone server is typically started once at system-boot time.
It waits for incoming connections, and forks itself to serve a
request. {\bf This is much faster} than letting
{\tt inetd} spawn {\tt httpd} every time a request
comes.  {\bf We therefore recommend that you run CERN httpd in
stand-alone mode.} \par 


\subsection{Stand-alone Installation}

A stand-alone server is started from the bootstrap
command file (for example {\tt /etc/rc.local)} so that it runs
continuously like the {\tt sendmail} daemon, for example. \par 

This method has the advantage over using the {\tt inetd} that
the response time is reduced. \par 

Add a line starting {\tt httpd} to your system startup file
(usually {\tt /etc/rc.local} or {\tt /etc/rc}).  If you
have the configuration file in the default place,
{\tt /etc/httpd.conf}, and if it specifies the port to listen
to via the {\tt Port} directive, you don't need any command
line options:
\begin{verbatim}
    /usr/etc/httpd &
\end{verbatim}

{\tt httpd} will automatically go background so there is really no
need for an ampersand in the end (as long as your configuration file
{\tt /etc/httpd.conf} really exists). \par 

Or a little more safely in case httpd is removed:
\begin{verbatim}
    if [ -f /usr/etc/httpd ]; then
        (/usr/etc/httpd  && (echo -n ' httpd') ) & >/dev/console
    fi
\end{verbatim}


Naturally you can use any of the
command line options, if necessary. \par 

\par 


\section{Registering Your Server}

Once you have your {\tt httpd} up and running, and you have
documents to show the word, announce
your server, so that others can find it. \par 

\par 

\section{If It Doesn't Work...}

...first run it in verbose mode with the {\tt -v } option and
try to figure out what goes wrong.  See also the debugging chart and the FAQ.  If you can't figure out what's going
wrong, feel free to send mail to {\bf httpd@info.cern.ch} \par 

\par 


\section{
{}
Installing httpd Under inetd}

This is how to to set up {\tt inetd} to run {\tt httpd}
whenever a request comes in.  (These steps are the same for any daemon
under unix: you will probably find a similar thing has been done for
the FTP daemon, {\tt ftpd,} for example.) \par 

\par 

\subsection{Step 1: Install httpd Binary}

Copy {\tt httpd} into a suitable directory such as
{\tt /usr/etc.}  Make it owned by {\tt root}, and make
it writable only to {\tt root,} for example by saying:
\begin{verbatim}
        chmod 755 httpd
\end{verbatim}


\par 

\subsection{Step 2: Add http Service to /etc/services}

Put "http" in the {\tt /etc/services} file, or use the name of
a specific service of your own if you want to use a special port
number.  Standard port number for HTTP is 80.
\begin{verbatim}
        http    80/tcp           # WWW server
\end{verbatim}


{\bf Exceptions:}
\begin{itemize}
\item  On a NeXT, see  using the NetInfomanager
\item  On any machine running NIS (yellow pages), see specicial instructions.
\end{itemize}

\par 

\subsection{Step 3: Add a Line to /etc/inetd.conf}

Put a line in the internet daemon configuration file,
{\tt /etc/inetd.conf.}
\begin{verbatim}
    http  stream  tcp  nowait  root  /usr/etc/httpd  httpd
\end{verbatim}

First word is the same as in {\tt /etc/services} file. \par 

If you want to pass command line options or
parameters to {\tt httpd,} they would listed be in the end
of line, for example to set the rule file to something else than the
default {\tt /etc/httpd.conf:}
\begin{verbatim}
    http  stream  tcp  nowait  root  /usr/etc/httpd  httpd -r /my/own/rules
\end{verbatim}


{\bf Note:} For {\tt httpd} version 2.15 and later
we recommend that it is run as user {\tt root.}

Running {\tt httpd} as {\tt root} is safe, since it
automatically resets its user-id to {\tt nobody.} However, if
you decide to use access authorization features, and you need to serve
protected files, {\tt httpd} will have to be able to set its
user-id to some other uid as well.  In any case, {\tt httpd}
always sets its user-id to something other than {\tt root}
before serving the file to the client. \par 

{\bf Note:} {\tt /etc/inetd.conf} syntax varies from
system to system, for example all systems don't have the field
specifying the user name, in which case the default is
{\tt root.}  If in doubt, sopy the format of other lines in
your existing {\tt inetd.conf.} \par 

{\bf Note:} There seems to be a limit of 4 arguments passed
across by {\tt inetd,} at least on the NeXT. \par 

\par 

\subsection{Step 4: Send HUP Signal to inetd }

When you have updated {\tt inetd.conf,}
find out the process number of {\tt inetd,} and send a "HUP"
signal to it. \par 

For example on BSD unix do this:
\begin{verbatim}		
        > ps -aux | grep inetd | grep -v grep
        root    85   0.0  0.9 1.24M  304K ?  S  0:01 /usr/etc/inetd
        > kill -HUP 85
\end{verbatim}

For system V, use {\tt ps -el} instead of {\tt ps -aux}.

Be aware that on some systems your local file /etc/services may not be
consulted by your system (see notes on debugging). \par 

\par 

\subsection{Test It!}

\par 


\subsection{{} Using NIS (Yellow Pages)}

If your machine is running Sun's "Network Information Service",
originally know as "yellow pages", read this.\par 

You must:
\begin{itemize}
\item 	First make an addition to the {\tt /etc/services}
	file just as for a normal unix system.
\item 	Then, change directory to {\tt /var/yp}
	and run {\tt make}.
\end{itemize}

This will load the {\tt /etc/services} file info the NIS
information system.\par 

Some people have found that they needed to reboot he system afterward
for the change to take effect. \par 

\par 


\subsection{{} Adding a Service on the NeXT}

The NeXT uses the the "netinfo" database instead of the
{\tt /etc/services} file.  This is managed with the
{\tt /NextAdmin/NetInforManager} application. Here's how to add
the service {\tt http}:
\begin{itemize}
\item 	Start the NetInfomanager by  double-clicking on its icon. \par 

\item 	If you are operating in a cluster, open either your local
	domain {\tt (/hostname)} or if you have authority, the
	whole cluster domain {\tt (/)}. If you're not in a
	cluster, just use the domain you are presented with. \par 

\item 	Select {\tt "services"} from the browser tree. \par 

\item 	Select {\tt "ftp"} from the list of services. \par 

\item 	Select {\tt "dupliacte"} from the edit menu. \par 

\item 	Select {\tt "copy of ftp"} and double-click on its icon
	to get the property editor. \par 

\item 	Click on {\tt "name"} and then on the value {\tt "copy
	of ftp"}. Change this to {\tt "http"} by typing
	"http" in the window at the botton, and hitting return.
\item 	Click on {\tt "port"}, and then on the value
	{\tt 21}. Change it to {\tt 80}. \par 

\item 	Use {\tt "Directory:Save"} menu
	{\tt (Command/s)} to save the result. You will have to
	give a root password or netinfo manager password. \par 
\end{itemize}
\par 


\section{{} Priviliged ports}

The TCP/IP port numbers below 1024 are special in that normal users
are not allowed to run servers on them.  This is a security feaure, in
that if you connect to a service on one of these ports you can be
fairly sure that you have the real thing, and not a fake which some
hacker has put up for you. \par 

The normal port number for W3 servers is port 80.  This number has
been assigned to WWW by the Internet Assigned Numbers Authority, IANA.
\par 

When you run a server as a test from a non-priviliged account, you
will normally test it on other ports, such as 2784, 5000, 8001 or
8080. \par 

\par 

\subsection{Under Unix}

The Internet Daemon {\tt inetd} (running as root) can listen
for incomming conections on port 80 and pass them down to a process
with a safer uid for the server itself. However, the
{\tt httpd} versions 2.14 and later can be safely run as
{\tt root} since they automatically change their user-id to
{\tt nobody} or some other user-id depending on server setup.
\par 

\par 

\subsection{Under VMS }

Under UCX, the process running as a server needs BYPASS privilege to
listen to ports below 1024.  This might mean you have to install the
server.  With other TCP/IP packages, privilege of some sort is
similarly required. \par 

\par 


\section{{} Debugging httpd}

Suppose you think you have installed
{\tt httpd} but it doesn't work.
Here we assume you have
used port 80.  If you have a situation
not handled by this problem-solving
guide, please mail {\tt httpd@info.cern.ch}. \par 
\par 
Type
\begin{verbatim}
        www http://myhost.domain/
\end{verbatim}

What happens?
\par 

\subsection{Connection Refused}

The browser tries to connect to the daemon but gets this status in the
trace. \par 

This means that nobody was listening on that port number. Check the
port numbers match between server and client.  Make sure you specify
the port number explicitly in the document address for
{\tt www}.\par 

If you are running the daemon standalone (as you should be), check
that it is actually running by taking a list of processes, and that it
is listening to the correct port (specified with {\tt -p }
{\it port\/} option), or try running it from the terminal with
{\tt -v} option as well.  The trace for the server should say
{\tt "socket, bind and listen all ok".} If it does, and you
still get "{\tt connection refused}", then you must be talking
to the wrong host (or, conceivably, different ethernet adapters on the
same host).\par 

If you are running with the inet daemon, then check both the services
file {\tt (/etc/services)} or database (yellow pages, netinfo)
if your system uses it, and the {\tt /etc/inetd.conf} file.
Check the service name matches between these two (e.g.
{\tt http}).\par 

Did you remember to kill -HUP the {\tt inetd} when you changed
the {\tt inetd.conf} file? \par 

{\em Be aware that on some systems your local file
{\tt /etc/services} will not be consulted\/} E.g. when
{\tt ypbind} is running on Suns, then you should type
\begin{verbatim}
        ypwhich -m services
\end{verbatim}


and ask the administrator of the machine named to change its own
{\tt /etc/services}. \par 

Try running the deamon from a shell
window to see better what happens. \par 

\par 


\subsection{Cannot Connect To Information Server}

The usual cause of this is that the server is not running, or it's
running on a different port. \par 

There is more information you can get.  Use the "verbose" option on
the LineMode browser to find out what went wrong:
\begin{verbatim}
        www -v http://myhost.domain:80/
\end{verbatim}

\par 
What do you get? A load of trace messages. There are several cases.
\begin{itemize}

\item 	The browser can't look up the name of the host. If it can, it
	will display "Parsed address as" message. If not, try fixing
	your name server or {\tt /etc/hosts} file, or quoting
	the IP number of the host in decimal notation (like
	128.141.77.45) instead. \par 

\item 	The browser can get to the host but gets
	{\tt Connection refused} status back. \par 

\item 	Your browser gets an error number but prints "error message
	not translated".  This is because when it was compiled on your
	platform it didn't know what form the error message table
	took. Try the same thing form a unix platform for example. \par 

\item 	You get some network error like "network unreachable".
	Depending on whether the IP network is your responsibility or
	not, and your attitude to life, either fix it, try again in an
	hour's time, or complain to someone. \par 

\end{itemize}
\par 


\subsection{Unable To Access Document}

Typical cause of this is that the configuration file is incorrect, or
files are not readable by the user-id under which the server runs.
When you are running the server as {\tt root,} it will
automatically switch it to {\tt nobody} just before serving the
document. This can be changed with the {\tt UserId}
configuration directive. \par 

\par 

\subsection{An Empty Document Is Displayed}

The document sent back is empty, but there is no error message.\par 

The {\tt inetd} has started a process to run your server but it
immediately failed.  Possibilities include:
\begin{itemize}

\item 	When running from {\tt inetd},
	the daemon may not be in the file specified, or may not be
	executable by the specified user (or, if a user id is not
	specified in your variety of {\tt inetd.conf},
	{\tt root}). \par 

\item 	For some reason server crashes when it's trying to serve
	the request.  If you can, try to tract down when this happens,
	and send mail to {\tt httpd@info.cern.ch}.
	Try running the daemon from a terminal
	window to see what happens. \par 

\item 	Script fails to produce any
	result, which may be due to the fact that there is no empty
	line after the header section output by the script, causing
	server to read the entire generated document as the header
	section. \par 

\end{itemize}

\par 


\subsection{Document Address Invalid Or Access Not Authorized...}

...or some similar kind of error message.
This means either:
\begin{itemize}

\item You have been passed a bad document address. If you are following
a link, check with the author of the document which contained the
link.

\item The document has been moved. Check with the server administrator.
You should be able to find out who runs the server by going to the
welcome page (type "g /" with the line mode browser) and seeing a link
to information about the maintainers.

\end{itemize}

If you are the server administrator, and you can't understand why the
daemon refuses to deliver the file,
\begin{itemize}

\item  Check the configuration file (rule
file, by default {\tt /etc/httpd.conf}) if you have one. Think
out way the document name will be mapped successively by each line,
and what the result will be. \par 

\item  Run the daemon in debug mode from a
terminal session to get trace information. \par 

\end{itemize}

\par 


\subsection{Bad Output}

A document is displayed, but not the one you wanted. \par 

These are some ideas:
\begin{itemize}

\item  Try running the server from the terminal. \par 

\item  Check the HTML source the daemon produces with
\begin{verbatim}
        www -source http://my.host.domain/
\end{verbatim}


\item  Try telnetting to httpd and
simulating the client:
\begin{verbatim}
        > <B>telnet my.host.domain 80</B>
        Connected to my.host.domain on port 80
        Escape is ^[
        <B>GET /document/name</B>
\end{verbatim}


\end{itemize}

\par 
\par 


\subsection{{} Running Under Shell}

You don't have to run the daemon under the {\tt inetd} if it
doesn't work (and we recommend running it standalone anyway).  You can
run it from a shell session.\par 

Run {\tt httpd} from your terminal turned on, with a different
port number like 8080:
\begin{verbatim}
        httpd -p 8080
\end{verbatim}


{\bf Note:} You must be {\tt root} (under VMS, have
some privilege) to run with a port number below 1024.  If you select a
port above 1024, then you can run as a normal user. This way, anyone
can publish files on the net. Howeever, it isn't very reliable, as
your server will not automatically come back up if the machine is
rebooted. In the long term it is best to install it to be started from
the system startup file {\tt /etc/rc} or
{\tt /etc/rc.local}. \par 

You may not be able to use a port number which has been used by a
daemon process recently (port may still be bound), so you may have to
switch port number if you {\char94}C and restart {\tt httpd}.  When it
is running like this, you can also read the debugging messages (when
running with {\tt -v} option), and use a debugger on it if
necessary. (See also: telnetting to the
server). \par 

\par 

\subsubsection{Debugging using Trace}

If you can't understand why a server refuses to give back a document,
then run with the {\tt -v} option to turn on debugging
messages.  Use {\tt -v} as the very first command line option
(this way debugging is turned on right away).  You will see the daemon
setting up the rules for translating requests into local URLs, and you
will see its attept to access the file (assuming you map requests onto
files).
\begin{verbatim}
        httpd -v -p 8080
\end{verbatim}


Try to access the document from a client using another terminal
window.  Look at the debugging output.  It will probably explain what
is happening.
If you still can't figure out the problem, mail your local guru help
desk or if desperate {\tt httpd@info.cern.ch}
{\bf enclosing} a copy of debugging output. \par 

\par 

\subsubsection{Even simpler}

For testing a daemon very simply,
without using a client, you can make
the terminal be the client.  With
{\tt httpd} try just running
it with the terminal and typing {\tt GET} {\it /document/url\/}
into its input:
\begin{verbatim}
        httpd -v
        GET /document/url
\end{verbatim}


\par 


\subsection{{} Telnetting to httpd}

Most implementations of telnet allow you to specify a port number.
Under unix this is often just a second parameter, under VMS a
{\tt /PORT} option. \par 

The HTTP
protocol is a telnet protocol, so you can simulate it just by
typing things in.  This will help you to see exactly what a sending
back, and it will check you that it really is the server not the
browser which has a problem. \par 

Here is a simple example (keybord input is in {\bf boldface}):
\begin{verbatim}	> <B>telnet myhost.domain 80</B>
	Connected to myhost.domain on port 80
	Escape is ^[
	<B>GET /document/url</B>
	<I>...document or error message...</I>
\end{verbatim}


\par 


\chapter{{} Command Line of CERN httpd}

The command line syntax for {\tt httpd} allows a number of
options and an optional directory argument:
\begin{verbatim}
        httpd  [-opt -opt -opt ...] [directory]
\end{verbatim}


The directory argument, if present, indicates the directory to be
exported.  If not present, either a rule file is be used, to export
combinations of directories, or else the default is to export the
{\tt /Public} directory tree. \par 

\par 

\section{Options}
\begin{DL}{allow this much space}

\item[ {\tt -r } {\it rulefile\/}
] Use {\it rulefile\/} as configuration file. {\bf This is the
     only necessary command line option} if you don't have the
     default configuration file, {\tt /etc/httpd.conf}. All the
     other options can be given as directives in the configuration file.

\item[ {\tt -p } {\it port\/}
] Listen to port {\it port\/}. Without this argument
     {\tt httpd} assumes that it has been run by 
     {\tt inetd}, and uses
     {\tt stdin} and {\tt stdout} as its communication
     channel. {\bf Note} that port numbers under 1024 are
     privileged.

\item[ {\tt -l } {\it logfile\/}
] Use {\it logfile\/} to log the requests.

\item[ {\tt -restart}
] Restart an already running {\tt httpd}.
     {\tt httpd} finds the out the process number of the
     running server from
     {\tt PidFile}
     and sends it the {\tt HUP} signal (HangUP).  This will
     cause {\tt httpd} to reload its configuration files and
     reopen its log files. {\bf Important:} To find out the
     {\tt PidFile} {\tt httpd} will have to read the
     same configuration file as the running {\tt httpd} has, so
     you have to specify the same {\tt -r } options on the
     command line as for the actual {\tt httpd}.

\item[ {\tt -gc\_only}
] \lbrack only for proxies\rbrack 
     Do only garbage collection and then exit.  This can be used to
     run {\tt httpd} periodically by {\tt cron} to do
     garbage collection on a cache that is used by {\tt httpd}
     run from the {\tt inetd} daemon rather than standalone.
     When {\tt httpd} is not running standalone it cannot
     monitor the cache, nor perform automatic garbage collection.

\item[ {\tt -v}
] Verbose, turn on debugging messages.

\item[ {\tt -vv}
] Very Verbose, turn on even more verbose debugging messages.

\item[ {\tt -version}
] Print version number of {\tt httpd} and
     {\tt libwww} (the WWW Common Library).


$<$!$--$ DT $<$CODE -newlog $<$/CODE $<$I logfile$<$/I
$<$DD  Use $<$I logfile$<$/I to log the requests using the new, common
     logfile format.  This will eventually become the default.

$<$DT  $<$CODE -errlog $<$/CODE  $<$I errorlogfile$<$/I 
$<$DD  Use $<$I errorlogfile$<$/I  to log errors.  If this is not specified,
     but $<$I logfile$<$/I  is (with $<$CODE -l$<$/CODE  or
     $<$CODE -newlog$<$/CODE  option), $<$I logfile.error$<$/I  is used.

$<$DT  $<$CODE -gmt$<$/CODE 
$<$DD  Use GMT instead of localtime in logfile (localtime is default).

$<$DT  $<$CODE -nolog $<$/CODE  $<$I template$<$/I 
$<$DD  Don't log accesses from hosts matching $<$I template$<$/I .  Template
     is either an IP number mask like $<$CODE 128.141.*.*$<$/CODE  or a
     hostname template containing at most one wildcard, for example
     $<$CODE *.cern.ch$<$/CODE 

$<$DT  $<$CODE -disable $<$/CODE  $<$I METHOD$<$/I 
$<$DD  Disable $<$I METHOD$<$/I  on this server.  You can also use the
     $<$CODE Disable$<$/CODE  directive in configuration file.

$<$DT  $<$CODE -enable $<$/CODE  $<$I METHOD$<$/I 
$<$DD  Enable $<$I METHOD$<$/I  on this server.  You can also use the
     $<$CODE Enable$<$/CODE  directive in configuration file.

$<$DT  $<$CODE -setuid$<$/CODE 
$<$DD  When using user authentication, set server user-id to
     authenticated user id (for people who have login accounts on the
     same machine as the documents reside, and nobody else needs to
     access them).

 $--$$>$

\end{DL}


\par 

\subsection{Directory Browsing}
You can set these also with the {\tt DirAccess}
configuration directive.
\begin{DL}{allow this much space}
\item[ {\tt -dy}
] Enable direcory browsing.  Directories are returned as hypertext
     documents. See browsing
     directories.  {\em Default.\/}

\item[ {\tt -dn}
] Disable directory browsing. An attempt to access a directory will
     generate an error response.

\item[ {\tt -ds}
] Selective directory browsing; enabled only for directories
     containing a file named {\tt .www\_browsable}
\end{DL}


\par 
\subsection{README Feature}
It is common practice to put a file named {\tt README} into a
directory containing instructions or notices to be read by anyone new
to the directory. {\tt httpd} will by default embed any
{\tt README} file in the hypertext version of a directory. \par 

You can set these also with the {\tt DirReadme}
configuration directive.
\begin{DL}{allow this much space}
\item[ {\tt -dt}
] For any browsable directory which contains a {\tt README}
     file, include the text of the {\tt README} file at the top
     of the document before the listing.  {\em Default.\/}

\item[ {\tt -db}
] As {\tt -dt} but put the {\tt README} at the
     bottom, after the listing.  The {\tt -db} and
     {\tt -dt} options may be combined with {\tt -dy} as
     {\tt -dyb}, {\tt -dty} etc.

\item[ {\tt -dr}
] Disables the {\tt README} inclusion feature.

\end{DL}

\par 

\section{Examples}
\begin{verbatim}
        httpd -r /usr/etc/httpd.conf -p 80
\end{verbatim}

This is a standalone server running on port 80. Configuration file is
{\tt /usr/etc/httpd.conf} instead of the default,
{\tt /etc/httpd.conf}. \par 

{\bf Note} that if the {\tt Port} directive is given in the
configuration file the {\tt -p } option is not necessary (it
can be used to override the value set in the configuration file). \par 

\begin{verbatim}
        httpd
\end{verbatim}

{\tt httpd} uses its default configuration file
{\tt /etc/httpd.conf}. If that file doesn't exist,
{\tt httpd} exports the {\tt /Public} directory tree.
This tree may contain soft links to other directory trees. \par 

If the configuration file {\tt /etc/httpd.conf} didn't define
the port number to listen to
this is an {\tt httpd} reading its {\tt stdin} and
writing to its {\tt stdout}, so it is run by
{\tt inetd}. \par 

\begin{verbatim}
        httpd -r /usr/local/lib/httpd.conf
\end{verbatim}

The same as before, but uses {\tt /usr/local/lib/httpd.conf} as
a rule file instead of the default {\tt /etc/httpd.conf}. \par 

\par 


\chapter{{} Configuration File of CERN httpd}

The configuration file (often referred to as the rule file)
defines how {\tt httpd} will translate a request into
a document name.  The directives controlling
{\tt httpd} features are also put into the
configuration file, as well as protection configuration.
This is essential to prevent unauthorized access to your
private documents. \par 


\section{Default Configuration File}

By default, the configuration file {\tt /etc/httpd.conf} is
loaded, unless specified otherwise with the {\tt -r} command line
option:
\begin{verbatim}
        httpd -p 80 -r /your/own/httpd.conf
\end{verbatim}


See also example configuration files. \par 


\section{Comments in Configuration File}

Each line consists of an operation code and one or two parameters,
referred to as the template and the result.  Lines starting with a
hash sign {\tt \#} are ignored, as are empty lines. \par 


\par 
\section{Restarting the Server}

When you are running the server in standalone mode (not from
{\tt inetd}), and modify the configuration file, send the
{\tt HUP} signal to {\tt httpd} to make it re-read the
configuration file.  You can find out the process number from the pid file written by httpd, e.g.
\begin{verbatim}
        > cat /server_root/httpd-pid
        2846
        > kill -HUP 2846
        >
\end{verbatim}


{} You must specify the
configuration file as an {\bf absolute pathname} for the {\tt -r} option because
when the server is started in standalone mode it changes its current
directory to {\tt /} so after startup it cannot reload
configuration files that were specified with relative filenames. \par 

To make restarting easier {\tt httpd } has a {\tt -restart
} option, which will automatically send the HUP signal to
another {\tt httpd} process. {\bf Important:} To find out the
{\tt PidFile} {\tt httpd} will have to read the same
configuration file as the running {\tt httpd} has, so you have
to specify the same {\tt -r } options on the command line as
for the actual {\tt httpd}, e.g.
\begin{verbatim}
        > httpd -r /usr/etc/httpd.conf -restart
        Restarting.. httpd
        Sending..... HUP signal to process 21379
        >
\end{verbatim}


\par 
\section{Exhaustive List of Configuration Directives}
\begin{itemize}

\item  General settings:
	\begin{itemize}
	\item  {\tt ServerRoot}
	\item  {\tt HostName}
	\item  {\tt Port}
	\item  {\tt PidFile}

	\item  {\tt UserId}
	\item  {\tt GroupId}

	\item  {\tt Enable}
	\item  {\tt Disable}

	\item  {\tt IdentityCheck}

	\item  {\tt Welcome}
	\item  {\tt AlwaysWelcome}

	\item  {\tt UserDir}

	\item  {\tt MetaDir}
	\item  {\tt MetaSuffix}

	\item  {\tt MaxContentLengthBuffer}
	\end{itemize}

\item  URL translation rules:
	\begin{itemize}
	\item  {\tt Map}
	\item  {\tt Pass}
	\item  {\tt Fail}
	\item  {\tt Redirect}
	\item  {\tt Protect}
	\item  {\tt DefProt}
	\item  {\tt Exec}
	\end{itemize}


\item  Filename suffix definitions:
	\begin{itemize}
	\item  {\tt AddType}
	\item  {\tt AddEncoding}
	\item  {\tt AddLanguage}
	\item  {\tt SuffixCaseSense}
	\end{itemize}


\item  Accessory scripts:
	\begin{itemize}
	\item  {\tt Search}
	\item  {\tt POST-Script}
	\item  {\tt PUT-Script}
	\item  {\tt DELETE-Script}
	\end{itemize}

\item  Directory listings:
	\begin{itemize}
	\item  {\tt DirAccess}
	\item  {\tt DirReadme}
	\item  {\tt DirShowIcons}
	\item  {\tt DirShowBrackets}
	\item  {\tt DirShowMinLength}
	\item  {\tt DirShowMaxLength}
	\item  {\tt DirShowDate}
	\item  {\tt DirShowSize}
	\item  {\tt DirShowBytes}
	\item  {\tt DirShowHidden}
	\item  {\tt DirShowOwner}
	\item  {\tt DirShowGroup}
	\item  {\tt DirShowMode}
	\item  {\tt DirShowDescription}
	\item  {\tt DirShowMaxDescrLength}
	\item  {\tt DirShowCase}
	\end{itemize}

\item  Icons in directory listings:
	\begin{itemize}
	\item  {\tt AddIcon}
	\item  {\tt AddBlankIcon}
	\item  {\tt AddUnknownIcon}
	\item  {\tt AddDirIcon}
	\item  {\tt AddParentIcon}
	\end{itemize}

\item  Logging:
	\begin{itemize}
	\item  {\tt AccessLog}
	\item  {\tt ErrorLog}
	\item  {\tt LogFormat}
	\item  {\tt LogTime}
	\item  {\tt NoLog}
	\item  {\tt CacheAccessLog}
	\end{itemize}

\item  Timeouts:
	\begin{itemize}
	\item  {\tt InputTimeOut}
	\item  {\tt OutputTimeOut}
	\item  {\tt ScriptTimeOut}
	\end{itemize}

\item  Proxy Caching:
	\begin{itemize}
	\item  {\tt Caching}
	\item  {\tt CacheRoot}
	\item  {\tt CacheSize}
	\item  {\tt NoCaching}
	\item  {\tt CacheOnly}
	\item  {\tt CacheClean}
	\item  {\tt CacheUnused}
	\item  {\tt CacheDefaultExpiry}
	\item  {\tt CacheLastModifiedFactor}
	\item  {\tt CacheTimeMargin}
	\item  {\tt CacheNoConnect}
	\item  {\tt CacheExpiryCheck}
	\item  {\tt Gc}
	\item  {\tt GcDailyGc}
	\item  {\tt GcMemUsage}
	\item  {\tt CacheLimit\_1}
	\item  {\tt CacheLimit\_2}
	\item  {\tt CacheLockTimeOut}
	\item  {\tt CacheAccessLog}
	\end{itemize}

\item  Going through many proxies:
	\begin{itemize}
	\item  {\tt http\_proxy}
	\item  {\tt ftp\_proxy}
	\item  {\tt gopher\_proxy}
	\item  {\tt wais\_proxy}
	\item  {\tt no\_proxy}
	\end{itemize}
\end{itemize}


\par 


\section{{}
General CERN httpd Configuration Directives}
\begin{itemize}
\item  {\tt ServerRoot}
\item  {\tt HostName}
\item  {\tt Port}
\item  {\tt PidFile}

\item  {\tt UserId}
\item  {\tt GroupId}

\item  {\tt Enable}
\item  {\tt Disable}

\item  {\tt IdentityCheck}

\item  {\tt Welcome}
\item  {\tt AlwaysWelcome}

\item  {\tt UserDir}

\item  {\tt MetaDir}
\item  {\tt MetaSuffix}

\item  {\tt MaxContentLengthBuffer}
\end{itemize}


\par 
\subsection{ServerRoot}

Server's "home" diretory is specified via {\tt ServerRoot}
directive.  If server root is specified, but no {\tt AddIcon} directive has been used in
configuration file to set up icons, the default icon directory is
under server root {\tt icons}.  The default icons that should
be present are:
\begin{itemize}
\item  {\tt blank.xbm} blank icon for aligning the header with listing
\item  {\tt directory.xbm} for directories
\item  {\tt back.xbm} for parent directory
\item  {\tt unknown.xbm} for unknown types
\item  {\tt binary.xbm} for binary files
\item  {\tt text.xbm} for text files
\item  {\tt image.xbm} for image files
\item  {\tt movie.xbm} for movies
\item  {\tt sound.xbm} for audio files
\item  {\tt tar.xbm} for tar files
\item  {\tt compressed.xbm} for compressed files
\end{itemize}
If these defaults don't please you you can define all from the scratch.
As an example of {\tt AddIcon} directive, the defaults would be
specified as follows:
\begin{verbatim}
    Pass  /httpd-internal-icons/*  /server_root/icons/*

    AddBlankIcon   /httpd-internal-icons/blank.xbm
    AddDirIcon     /httpd-internal-icons/directory.xbm  DIR
    AddParentIcon  /httpd-internal-icons/back.xbm       UP
    AddUnknownIcon /httpd-internal-icons/unknown.xbm
    AddIcon        /httpd-internal-icons/binary.xbm     BIN  binary
    AddIcon        /httpd-internal-icons/text.xbm       TXT  text/*
    AddIcon        /httpd-internal-icons/image.xbm      IMG  image/*
    AddIcon        /httpd-internal-icons/movie.xbm      MOV  video/*
    AddIcon        /httpd-internal-icons/sound.xbm      AU   audio/*
    AddIcon        /httpd-internal-icons/tar.xbm        TAR  multipart/*tar
    AddIcon        /httpd-internal-icons/compressed.xbm CMP  x-compress x-gzip
\end{verbatim}


\subsubsection{{} On Proxy Server}

On proxy server the icon URLs {\bf must be full URLs},
because otherwise clients would translate them relative to remote
host. This means that in the above example all the
{\tt AddIcon*} directives have to read:
\begin{verbatim}
    AddIcon  http://your.server/httpd-internal-icons/...
\end{verbatim}

{\bf and} you have to pass also the full icon URL:
\begin{verbatim}
    Pass  http://your.server/httpd-internal-icons/*  /server_root/icons/*
\end{verbatim}

Since future smart browsers might notice that the icon server is the
same one as the proxy server it may be best in this case to also
{\tt Pass} the partial URL as above:
\begin{verbatim}
    Pass  /httpd-internal-icons/*  /server_root/icons/*
\end{verbatim}


\par 
\subsection{HostName}

On some hosts the hostname lookup fails producing only the name
without the domain part.  Full hostname is necessary when
{\tt httpd} is generating references to itself (redirection
responses to clients).  If necessary, provide full server hostname
with {\tt HostName} directive:
\begin{verbatim}
        HostName  <I>full.server.host.name</I>
\end{verbatim}


You may want to use this also when the real host name is different from
what you want the clients to see (you have a DNS alias for the host). \par 


\par 
\subsection{Default Port Setting}

For standalone server (the one running continuously, listening to a
certain port, and forking a child to handle the request) the port to
listen to can be defined via {\tt Port} configuration directive
instead of the {\tt -p }
{\it port\/} command line option.  Normally:
\begin{verbatim}
        Port 80
\end{verbatim}

{\tt -p } {\it port\/} command line line option still overrides
this default. \par 


\par 
\subsection{PidFile}

{\tt httpd} re-reads its configuration file when it receives
a {\tt HUP} signal \lbrack HANGUP\rbrack , the signal number 1.  To make it
easy to find out the parent {\tt httpd} process id, it writes
it to a file. \par 

By default, if {\tt ServerRoot} is
specified, this is the file {\tt httpd-pid} under server root;
if not, it defaults to {\tt /tmp/httpd-pid}. \par 

The {\tt PidFile} directive can be used to set the process
id file name;  it can be either an absolute path, or a relative one.
Relative path is relative to {\tt ServerRoot}, or if not
defined, relative to {\tt /tmp}.

\subsubsection{Example}
\begin{verbatim}
        ServerRoot  /Web/serverroot
        PidFile     logs/httpd-pid
\end{verbatim}

would cause the process id to be written to
{\tt /Web/serverroot/logs/httpd-pid}. \par 


\par 
\subsection{Default User Id}

{\tt UserId} directive sets the default user to run as instead
of {\tt nobody}.  This directive is only meaningful when
running server as {\tt root.}
\begin{verbatim}
        UserId whoever
\end{verbatim}

\par 


\subsection{Default Group Id}

{\tt GroupId} directive sets the default group to run under
instead of {\tt nogroup}.  This directive is only meaningful
when running server as {\tt root.}
\begin{verbatim}
        GroupId whichever
\end{verbatim}


\par 
\subsection{Enabling and Disabling
HTTP Methods}

You can enable/disable methods that you do/don't want your server to
accept:
\begin{verbatim}
        Enable  <I>METHOD</I>
        Disable <I>METHOD</I>
\end{verbatim}

By default {\tt GET}, {\tt HEAD} and
{\tt POST} are enabled, and the rest are disabled. \par 

\subsubsection{Examples}
\begin{verbatim}
        Enable POST
        Disable DELETE
\end{verbatim}


\par 
\subsection{IdentityCheck}

If {\tt IdentityCheck} configuration directive is turned
{\tt On}, {\tt httpd} will connect to the ident daemon
(RFC931) of the remote host and find out the remote login name of the
owner of the client socket.  This information is written to access log file, and put into the {\tt REMOTE\_IDENT }
CGI environment variable. \par 

Default setting is {\tt Off}:
\begin{verbatim}
        IdentityCheck Off
\end{verbatim}

and if you don't need this information you will save the resources by
keeping it off.  Furthermore, this information does not provide any
more security and should not be trusted to be used in access control,
but rather just for informational purposes, such as logging. \par 

\subsubsection{{}
WARNING
{}}

On some systems there is a kernel bug that causes all the connections
to the remote node to be broken if the remote ident request is not
answered (ident daemon not running, for example).  This is reported
for at least SunOS 4.1.1, NeXT 2.0a, ISC 3.0 with TCP 1.3, and AIX
3.2.2, and later are ok.  Sony News/OS 4.51, HP-UX 8-?? and Ultrix 4.3
still have this bug.  A fix for Ultrix is availabe (CSO-8919). \par 

\lbrack Thanks to Per-Steinar Iversen from Norway for pointing this out!\rbrack  \par 

If the operating system on your server host has this bug, {\bf do
not use IdentityCheck!} \par 


\par 
\subsection{Welcome}

{\tt Welcome} directive specifies the default file name to use
when only a directory name is specified in the URL.  There may be many
{\tt Welcome} directives giving alternative welcome page names.
The one that was defined earlier will have precedence. \par 

Default values are {\tt Welcome.html},
{\tt welcome.html} and {\tt index.html}.
{\tt index.html} is there only for compatibility with NCSA
server;  the word "Welcome" is more descriptive, and has precedence.
\par 

All default values will be overridden if {\tt Welcome}
directive is used. \par 

Default values could be defined as:
\begin{verbatim}
        Welcome Welcome.html
        Welcome welcome.html
        Welcome index.html
\end{verbatim}


\par 
\subsection{AlwaysWelcome}

By default there is no difference between directory names with and without
a trailing slash when it comes to welcome pages.  The one without a
trailing slash will cause an automatic redirection to the one with a
trailing slash, which then gets mapped to the welcome page. \par 

If it is desirable to have plain directory names to produce a
directory listing, and only the ones with a trailing slash cause the
welcome page to be returned, set the {\tt AlwaysWelcome}
directive to off:
\begin{verbatim}
        AllwaysWelcome Off
\end{verbatim}

Default value is {\tt On}. \par 


\par 
\subsection{User-Supported Directories}

User-supported directories, URLs of form {\bf /\~username}, are
enabled by {\tt UserDir} directive:
\begin{verbatim}
        UserDir <I>dir-name</I>
\end{verbatim}

The {\it dir-name\/} argument is the directory in each user's home
directory to be exported, for example {\tt WWW}:
\begin{verbatim}
        UserDir WWW
\end{verbatim}


\par 
\subsection{Meta-Information}

It is possible to tell {\tt httpd} to add meta-information to
response. Meta-information is stored in a directory specified by
{\tt MetaDir} directive, under the same directory as the file
being retrieved:
\begin{verbatim}
        MetaDir  <I>dir-name</I>
\end{verbatim}


Meta-information is stored in a file with the same name as the actual
document, but appended with a suffix specified via
{\tt MetaSuffix} directive:
\begin{verbatim}
        MetaSuffix  <I>.suffix</I>
\end{verbatim}

Meta-information files contain RFC822-style headers. \par 

Default settings are:
\begin{verbatim}
        MetaDir    .web
        MetaSuffix .meta
\end{verbatim}

meaning that meta-information files are located in the
{\tt .web} subdirectory, and they end in {\tt .meta}
suffix, i.e. the metafile for file:
\begin{verbatim}
        /Web/Demo/file.html
\end{verbatim}

would be:
\begin{verbatim}
        /Web/Demo/.web/file.html.meta
\end{verbatim}


\par 

\subsection{MaxContentLengthBuffer}

{\tt httpd} normally gives a content-lenght header line for
every document it returns.  When it's running as a proxy it buffers the document
received from the remote server before sending it to the client.  This
directive can be used to set the value of this buffer - if it is
exceeded the document will be returned without a content-lenght header
field. \par 

Default setting is 50 kilobytes:
\begin{verbatim}
        MaxContentLengthBuffer 50 K
\end{verbatim}


\par 


\section{{}
Rules In The Configuration File}

Rules define the mapping between virtual URLs and physical file names.
Currently the following rules are understood:
\begin{itemize}
\item  {\tt Map}
	- Map URLs to actual files
\item  {\tt Pass}
	- Accept a request
\item  {\tt Fail}
	- Fail a request
\item  {\tt Redirect}
	- Redirect a request
\item  {\tt Protect}
	- Set up protection
\item  {\tt DefProt}
	- Default protection setup
\item  {\tt Exec}
	- Executable server scripts
\end{itemize}


\par 
\subsection{Mapping, Passing and Failing}

There are three main rules: {\tt Map,} {\tt Pass} and
{\tt Fail.} The server uses the top rule first, then
{\bf each successive rule} unless told otherwise by a
{\tt Pass} or a {\tt Fail} rule. \par 

\begin{DL}{allow this much space}
\item[ {\tt Map } {\it template result\/}
] If the address matches the {\it template\/}, use the {\it result\/}
     string from now on for future rules.

\item[ {\tt Pass } {\it template\/}
] If the address maches the {\it template\/}, use it as it is,
     porocessing no further rules.

\item[ {\tt Pass } {\it template result\/}
] If the string matches the {\it template\/}, use the {\it result\/}
     string as it is, processing no futher rules.

\item[ {\tt Fail } {\it template\/}
] If the address matches the {\it template\/}, prohibit access,
     processing no futher rules.
\end{DL}

The {\it template\/} string may contain wildcards (asterisks)
{\tt *}.  (Versions earlier than 3.0 support only a single
wildcard.)  The {\it result\/} string may have wildcards only if the
{\it template\/} has them.  In this case they expand to matched strings
in respective order. \par 

{\bf Whitespace, (literal) asterisks and backslashes} are allowed in
templates if they are preceded by a backslash. \par 

{\bf The tilde character} (see user-supported directories) just after
a slash (in other words in the beginning of a directory name) has to
be explicitly matched, i.e. wildcard does not match it. \par 

When matching,
\begin{itemize}

\item  Rules are scanned from the top of the file to the bottom.

\item  If a request matches a {\tt Map} template exactly, the
     result string is used instead of the original string and applied
     to successive rules.

\item  If the request maches a {\tt Map} {\it template\/} with
     wildcard, then the text of the request which matches the wildcard
     is inserted in place of the wildcard in the {\it result\/} string
     to form the translated request. If the result string has no
     wildcard, it is used as it is.

\item  When a {\tt Map} substitution takes place, the rule scan
     continues with the next rule using the new string in place of the
     request.  This is not the case if a {\tt Pass} or
     {\tt Fail} is matched: they terminate the rule scan.

\end{itemize}


\par 
\subsection{Redirecting Requests Elsewhere}

When documents, or entire trees of documents, are moved from one
server to another, you can use {\tt Redirect} rule to tell
{\tt httpd} to redirect the request to another server.  If the
client program is smart enough user won't even notice that the
document is retrieved from a different server.

\begin{DL}{allow this much space}
\item[ {\tt Redirect } {\it template  result\/}
] Document matching {\it template\/} is redirected to {\it result\/},
     which must be a {\bf full URL} (i.e. containing
     {\tt http:} and the host name).
\end{DL}

\subsubsection{Example}
\begin{verbatim}
        Redirect  /hypertext/WWW/*  http://www.cern.ch/WebDocs/*
\end{verbatim}

This redirects everything starting with {\tt /hypertext/WWW} to
host {\tt www.cern.ch} into virtual directory
{\tt /WebDocs}.  For example,
{\tt /hypertext/WWW/TheProject.html} would be redirected to
{\tt http://www.cern.ch/WebDocs/TheProject.html}. \par 


\par 
\subsection{Setting Up User Authentication and Document Protection}

Documents are protected by {\tt Protect} and
{\tt DefProt} rules.  Their syntax is the following:
\begin{DL}{allow this much space}

\item[ {\tt DefProt } {\it template \/} {\it setup-file\/} {\tt \lbrack }{\it uid.gid\/}{\tt \rbrack }
] Any document matching the {\it template\/} is associated with
     protection {\it setup-file.\/}  The documents are not yet taken to
     be protected, but they may become protected by an existing access
     control list file in the same directory as the requested file, or
     by later matching a {\tt Protect} rule.  If that
     {\tt Protect} rule doesn't specify {\it setup-file\/}, the
     one from the latest {\tt DefProt} rule is used.\par 

\item[ {\tt Protect } {\tt \lbrack }{\it template setup-file\/} {\tt \lbrack }{\it uid.gid\/}{\tt \rbrack \rbrack }
] Any document matching {\it template\/} is protected.  The type of
     protection is defined in finer detail in {\it setup-file.\/} \par 

     If {\it setup-file\/} is not specified the one from previous
     matched {\tt DefProt} rule will be used.  If none have
     matched access to the file is forbidden.
\end{DL}

{\it setupfile\/} is always a full pathname for the
protection setup file which specifies
the actual protection parameters. \par 

Setup file can be omitted from {\tt Protect} rule, but it is
obligatory in {\tt DefProt} rule. If setup file is omitted it
is not possible to give the {\it uid.gid\/} part, either. \par 

{\it uid.gid\/} are the Unix user id and group id (either by name or by
number, separated by a dot) to which the server should change when
serving the request. These are only meaningful when the server is
running as {\tt root.} If they are missing they default to
{\it nobody.nogroup\/}.\par 

{\bf Note:} Uid and gid are inherited from
{\tt DefProt} rule to {\tt Protect} rule
{\bf only} when the {\it setup-file\/} is also inherited.
If {\it setup-file\/} is specified for {\tt Protect} rule but
{\it uid.gid\/} is not, they default to {\it nobody.nogroup\/}
regardless of the previous {\tt DefProt} rule. \par 

This is to avoid accidentally running the server under wrong user id
with wrong setup file.  This information should logically go into the
protection setup file, but for safety reasons it cannot be done,
because a non-trustworthy collaboration could specify it to be
{\tt root}.  This way only the main {\tt webmaster} can
control user and group ids. \par 


\par 
\subsection{Executable Server Scripts}

Document address is mapped into a script call by {\tt Exec}
rule:
\begin{verbatim}
        Exec <I>template script</I>
\end{verbatim}


{} In both
{\it template\/} and {\it script\/} there {\bf must be a
{\tt *} wildcard, that matches everything starting from the
script filename.} This is to enable {\tt httpd} to know
what is the script name and what is the extra path information to be
passed to the script.\par 

\subsubsection{Example}
You want to map everything starting with {\tt /your/url/doit}
to execute the script {\tt /usr/etc/www/htbin/doit.}  You do
this by saying:
\begin{verbatim}
        Exec  /your/url/*  /usr/etc/www/htbin/*
\end{verbatim}


Here asterisk mathes the script name {\tt doit} (and everything
else that follows it).  Usually people use some fixed keyword in front
of the pathname in URL to point out that the document is actually a
script call.  Often this keyword is {\tt /htbin}.  That is,
usually your {\tt Exec} rule looks like this:
\begin{verbatim}
        Exec  /htbin/*  /usr/etc/www/htbin/*
\end{verbatim}

and all the URLs pointing to the scripts start with
{\tt /htbin}, for example {\tt /htbin/doit} in the
previous example. \par 


\par 

\subsubsection{Historical Note (HTBin Rule)}

CERN {\tt httpd} versions 2.13 and 2.14 had a hard-coded
handling of URL pathnames starting {\tt /htbin} that mapped
them to scripts in a directory specified via {\tt HTBin}
rule:
\begin{verbatim}
        HTBin <I>/your/htbin/directory</I>
\end{verbatim}

This is still handled automatically by {\tt httpd}, by
translating it to its equivalent {\tt Exec} form:
\begin{verbatim}
        Exec /htbin/*  <I>/your/htbin/directory/*</I>
\end{verbatim}

Always use {\tt Exec} instead $--$ it is more general. \par 


\par 


\section{{}
Suffix Definitions for CERN httpd}

{\tt cern\_httpd} uses suffixes to discover the content-type,
content-encoding and content-language of a file.  Default values are
so extensive that {\tt httpd} knows the usual file types.  The
following configuration directives can be
used to add new suffix bindings and override existing defaults:
\begin{itemize}
\item  {\tt AddType}
	- Filename suffix mappings to MIME Content-Types
\item  {\tt AddEncoding}
	- Filename suffix mappings to MIME Content-Encodings
\item  {\tt AddLanguage}
	- Multilanguage support, suffix mappings to different Content-Languages
\item  {\tt SuffixCaseSense}
	- Set suffix case sensitivity
\end{itemize}


\par 
\subsection{Binding Suffixes to MIME Content-Types}

As well as any mapping lines in the rule file, the rule file may be
used to define the data types of files with particular suffixes.  CERN
{\tt httpd} has an extensive set of predefined
suffixes, so usually you don't need to specify any. \par 

The syntax is:
\begin{verbatim}
        AddType <I>.suffix representation encoding</I> [<I>quality</I>]
\end{verbatim}


The parameters are as follows:
\begin{DL}{allow this much space}

\item[{\it suffix\/}]
The last part of the filename.  There are two special cases.
{\tt *.*} matches to all files which have not been matched by
any explicit suffixes but do contain a dot. {\tt *} by itself
matches to any file which does not match any other suffix. \par 

\item[{\it representation\/}]

A MIME Content-Type style description of the repreentation in fact in
use in the file.  See the HTTP spec.  This need not be a real MIME
type - it will only be used if it matches a type given by a client. \par 

\item[{\it encoding\/}]
A MIME content
transfer encoding type.  Much more limited in variety than
representations, basically whether the file is ASCII (7bit or 8bit) or
binary. A few other encodings are allowed, and maybe extension to
compression. \par 

\item[{\it quality\/}]
Optional. A floating point number between 0.0 and 1.0 which determines
the relative merits of files {\tt xxx.*} which differ in their
suffix only, when a link to {\tt xxx.multi} is being resolved.
Defaults to 1.0. \par 

\end{DL} 

\subsubsection{Examples}
\begin{verbatim}
        AddType .html text/html              8bit     1.0
        AddType .text text/plain             7bit     0.9
        AddType .ps   application/postscript 8bit     1.0
        AddType *.*   application/binary     binary   0.1
        AddType *     text/plain             7bit
\end{verbatim}


\par 

\subsubsection{Historical Note (Suffix Directive)}

{\tt AddType} was previously called {\tt Suffix.} The
old name is still understood, but may be misleading since suffixes are
also used to determine Content-Encoding and language.  Always use
{\tt AddType} instead. \par 


\par 
\subsection{Binding Suffixes to MIME Content-Endocings}

Suffixes are also used to determine the Content-Encoding of a
file ({\tt .Z} suffix for {\tt x-compressed}, for
example).  Syntax is:
\begin{verbatim}
        AddEncoding <I>.suffix  encoding</I>
\end{verbatim}


\subsubsection{Example}
\begin{verbatim}
        AddEncoding .Z  x-compress
\end{verbatim}


\par 
\subsection{Multilanguage Support}

Multilanguage support is also built on using suffixes to determine the
language of a document.  Suffix is bound to a language by
{\tt AddLanguage} rule ({\tt .en} suffix for english,
for example). Syntax is:
\begin{verbatim}
        AddLanguage <I>.suffix  encoding</I>
\end{verbatim}


\subsubsection{Examples}
\begin{verbatim}
        AddLanguage .en  en
        AddLanguage .uk  en_UK
\end{verbatim}


\par 
\subsection{Suffix Case Sensitivity}

Suffix case sensitivity is by default {\it off.\/}  You can make
suffixes case sensitive with {\tt SuffixCaseSense} directive:
\begin{verbatim}
        SuffixCaseSense On
\end{verbatim}


\par 


\section{{}
Accessory Scripts}

In addition to having a fully configurable CGI script interface to handle form
requests, CERN {\tt httpd} has a few special directives to
handle certain tasks always via CGI scripts:
\begin{itemize}
\item  keyword searches
\item  general {\tt POST}
\item  general {\tt PUT}
\item  general {\tt DELETE}
\end{itemize}


\par 
\subsection{Keyword Search Facility}

Server automatically calls a script
to perform search,
if the {\bf absolute pathname} of search script is supplied
by a {\tt Search} directive in the configuration file:
\begin{verbatim}
        Search <I>/search/script/pathname</I>
\end{verbatim}


This script is called with the vital information in the following
CGI environment
variables:
\begin{DL}{allow this much space}
\item[ {\tt PATH\_INFO}
] contains the virtual URL of the file from where the query was
     issued from. \par 

\item[ {\tt PATH\_TRANSLTED}
] contains the physical filename of the document corresponding
     to the virtual URL in {\tt PATH\_INFO}. \par 

\item[ {\tt QUERY\_STRING}
] contains the (URL-encoded) keywords, which are also available
     decoded as command line parameters, one in each of
     {\tt argv\lbrack 1\rbrack }, {\tt argv\lbrack 2\rbrack }, ... \par 
\end{DL}

Search script must conform to CGI/1.1 rules, that is, it has to start
its output with a MIME header {\bf followed by a blank
line}, after which comes the actual document.  MIME header
{\bf must} contain either a {\tt Location: } field,
or a {\tt Content-Type: } field, typically:
\begin{verbatim}
        Content-Type: text/html
\end{verbatim}

if the document is an HTML document. \par 


\par 
\subsection{General POST Method Handler Script}

{\tt POST} requests are handled by calling the script defined
by {\tt POST-Script} directive:
\begin{verbatim}
        POST-Script  <I>/absolute/path/post-handler</I>
\end{verbatim}

POST handler script is called in the normal CGI manner, and its output must be CGI
compliant. \par 

{} Only such {\tt POST}
requests are handled by the POST handler that haven't already matched
an {\tt Exec} rule (which causes
a specified script to be called). \par 


\par 
\subsection{General PUT Method Handler Script}

{\tt PUT} requests are handled by calling the script defined by
{\tt PUT-Script} configuration directive:
\begin{verbatim}
        PUT-Script  <I>/absolute/path/put-handler</I>
\end{verbatim}

PUT handler script is called in the normal CGI manner, and its output must be CGI
compliant. \par 

{} By default {\tt PUT}
method is disabled; you must explicitly enable it in the configuration
file:
\begin{verbatim}
        Enable PUT
\end{verbatim}

This is to enhance security. \par 

{} Since {\tt PUT} can
be a very dangerous method because it allows files to be written back to
the server, it is not possible to use {\tt PUT} without access
authorization module being activated.  This means that you have to
have at least a {\tt DefProt}
rule specifying a default protection setup, which then in turn defines
the {\tt PutMask} containing the list of allowed users and
hosts to perform PUT operation. \par 


\par 
\subsection{General DELETE Method Handler Script}

{\tt DELETE} requests are handled by calling the script defined by
{\tt DELETE-Script} configuration directive:
\begin{verbatim}
        DELETE-Script  <I>/absolute/path/put-handler</I>
\end{verbatim}

DELETE handler script is called in the normal CGI manner, and its output must be CGI
compliant. \par 

{} By default {\tt PUT}
method is disabled; you must explicitly enable it in the configuration
file:
\begin{verbatim}
        Enable DELETE
\end{verbatim}

This is to enhance security. \par 

{} Since {\tt DELETE} can
be a very dangerous method because it allows files to be deleted from
the server, it is not possible to use {\tt DELETE} without access
authorization module being activated.  This means that you have to
have at least a {\tt DefProt}
rule specifying a default protection setup, which then in turn defines
the {\tt DeleteMask} containing the list of allowed users and
hosts to perform DELETE operation. \par 

\par 


\section{{} Directory Browsing}

By default references to directories which don't include a welcome page cause {\tt httpd} to
generate a hypertext view of the directory listing.  There are
numerous configuration directives controlling this feature:
\begin{itemize}
\item  {\tt DirAccess}
	- Enable/Selective/Disable directory listings
\item  {\tt DirReadme}
	- Configure/disable README-feature
\item  Controlling the appearance of directory listings:
	\begin{itemize}
	\item  {\tt DirShowIcons}
		- Show icons in directory listings
	\item  {\tt DirShowDate}
		- show last-modified date
	\item  {\tt DirShowSize}
		- show file sizes
	\item  {\tt DirShowBytes}
		- show byte count for small files
	\item  {\tt DirShowDescription}
		- show descriptions for files
	\item  {\tt DirShowMaxDescrLength}
		- maximum description length
	\item  {\tt DirShowBrackets}
		- use brackets around ALTernative text used instead of an icon
	\item  {\tt DirShowMinLength}
		- minimum width to reserve for filenames
	\item  {\tt DirShowMaxLength}
		- maximum width to reserve for filenames
	\item  {\tt DirShowHidden}
		- show also files starting with a dot (hidden Unix files)
	\item  {\tt DirShowOwner}
		- show owner of the file
	\item  {\tt DirShowGroup}
		- show group of the file
	\item  {\tt DirShowMode}
		- show permissions of the file
	\item  {\tt DirShowCase}
		- do sorting in a case-sensitive manner
	\end{itemize}
\item  Icons:
	\begin{itemize}
	\item  {\tt AddIcon}
		- bind icon URL to a MIME Content-Type or Content-Encoding
	\item  {\tt AddBlankIcon}
		- icon URL used in the heading of the listing to align it
	\item  {\tt AddUnknownIcon}
		- icon URL for unknown file types
	\item  {\tt AddDirIcon}
		- icon URL for directories
	\item  {\tt AddParentIcon}
		- icon URL for parent directory
	\end{itemize}
\end{itemize}


\par 
\subsection{Controlling Directory Browsing}
\begin{DL}{allow this much space}

\item[{\tt DirAccess on}]
	Enable directory browsing in all directories (which are not
	forbidden by rules).
	Synonym with {\tt -dy} command line option.
	{\it Default.\/}\par 

\item[{\tt DirAccess off}]
	Disable directory browsing.
	Synonym with {\tt -dn} command line option.
	\par 

\item[{\tt DirAccess selective}]
	Enable selective directory browsing - only directories
	containing the file {\tt .www\_browsable} are allowed.
	Synonym with {\tt -ds} command line option.
	\par 
\end{DL} \par 


\par 
\subsection{README Feature}
\begin{DL}{allow this much space}
\item[{\tt DirReadme top}]
	For any browsable directeory containing a {\tt README}
	file, include the text at the top of the directory listing.
	Synonym with {\tt -dt} command line option.
	{\it Default.\/}
	\par 

\item[{\tt DirReadme bottom}]
	Same as previous, but contents of {\tt README} appear
	on the bottom.
	Synonym with {\tt -db} command line option.
	\par 

\item[{\tt DirReadme off}]
	Disables the {\tt README} inclusion feature.
	Synonym with {\tt -dr} command line option.
	\par 
\end{DL} \par 


\par 
\subsection{Controlling The Look of Directory Listings}

The following {\tt On/Off} directives control how the directory
listings look like.  The default is to show icons, use brackets around
ALTernaltive text, show last-modifid, size and description, and allow
filename field width to vary between 15-22 characters, and reserve
25 characters for description. \par 

\begin{DL}{allow this much space}

\item[ {\tt DirShowIcons}
] Generate inlined image calls in front of each line.  Icons
     visualize the content-type of the file, and they are defined by
     {\tt AddIcon} configuration
     directive. {\em Default.\/} \par 

\item[ {\tt DirShowDate}
] Show last modification date. {\em Default.\/} \par 

\item[ {\tt DirShowSize}
] Show the size of files. {\em Default.\/} \par 

\item[ {\tt DirShowBytes}
] By default files smaller than 1K are shown as just 1K.  Setting
     this directive to {\tt On} will cause the exact byte count
     to appear. \par  

\item[ {\tt DirShowDescription}
] Show description if available. {\em Default.\/} \par 

     At the time of release of 2.17 there was no consensus about
     where the descriptions come from, and the mechanism is currently
     undocumented.  For HTML files description it the TITLE element;
     for other files the description field is left empty. \par 

\item[ {\tt DirShowMaxDescrLenght}
] The maximum number of characters to show in the description field. \par 

\item[ {\tt DirShowBrackets}
] Use brackets around ALTernative text used by browsers not capable
     of displaying images. {\em Default.\/} \par 

\item[ {\tt DirShowHidden}
] Show hidden Unix files (the ones starting with a dot). \par 

\item[ {\tt DirShowOwner}
] Show the owner of the file. \par 

\item[ {\tt DirShowGroup}
] Show the group of the file. \par 

\item[ {\tt DirShowMode}
] Show the permissions of files. \par 

\item[ {\tt DirShowCase}
] Sort entries in a case-sensitive manner, i.e. all capital letters
     before lower-case letters. \par 
\end{DL}

\par 
\subsection{Filename Length}

There is a minimum and maximum width for the filename field.  Entries
longer than the maximum value will be truncated.  Default values are
15 and 25, and they can be changed with these directives:
\begin{DL}{allow this much space}
\item[ {\tt DirShowMinLength } {\it num\/}
] At least this amount of characters is always reserved for
     filenames.  If the longest filename in the directory is longer
     than {\it num\/} the field will be extended, but no more than the
     maximum limit (see next directive).\par 

\item[ {\tt DirShowMaxLength } {\it num\/}
] Filenames longer than {\it num\/} will be truncated to fit in length. \par 
\end{DL}

\subsubsection{Example}

The default values would be set by saying:
\begin{verbatim}
        DirShowMinLength  15
        DirShowMaxLength  25
\end{verbatim}


\par 


\section{ {}
Icons In The Directory Listings}

{\tt cern\_httpd} directory icons
are used, if enabled, for
both regular directory listings, and FTP listings (when runnins as a
proxy). \par 
\begin{itemize}
\item  {\tt AddIcon}
	- bind icon URL to a MIME Content-Type or Content-Encoding
\item  {\tt AddBlankIcon}
	- icon URL used in the heading of the listing to align it
\item  {\tt AddUnknownIcon}
	- icon URL for unknown file types
\item  {\tt AddDirIcon}
	- icon URL for directories
\item  {\tt AddParentIcon}
	- icon URL for parent directory
\end{itemize}
These directives are specified in the configuration file. \par 


\par 
\subsection{AddIcon Directive}

The {\tt AddIcon} directive binds an icon to a MIME
Content-Type or Content-Encoding:
\begin{verbatim}
        AddIcon  <I>icon-url</I> <I>ALT-text</I> <I>template</I>
\end{verbatim}

\begin{DL}{allow this much space}
\item[ {\it icon-url\/}
] is the URL of the icon. \par 

\item[ {\it ALT-text\/}
] is the alternative text to use on character terminal browsers. \par 

\item[ {\it template\/}
] is either a Content-Type template or a Content-Encoding template.
     Content-Type template must always contain a slash, whereas
     Content-Encoding template never has it. \par 
\end{DL}

The following important remarks serve also as examples. \par 


\subsubsection{{}
CERN httpd as a Normal HTTP Server}

Understand that the {\it icon-url\/} is a virtual URL - one that will
be translated through the rules.  Therefore you must make sure that
your configuration rules allow the icon URLs to be passed, e.g.:
\begin{verbatim}
    AddIcon  /icons/UNKNOWN.gif  ???  */*
    AddIcon  /icons/TEXT.gif     TXT  text/*
    AddIcon  /icons/IMAGE.gif    IMG  image/*
    AddIcon  /icons/SOUND.gif    AU   audio/*
    AddIcon  /icons/MOVIE.gif    MOV  video/*
    AddIcon  /icons/PS.gif       PS   application/postscript
    Pass /icons/*  /absolute/icon/dir/*
    ...other rules...
\end{verbatim}


\subsubsection{{}
CERN httpd as a Proxy}

When using {\tt httpd} as a proxy the icon URL {\bf must
be} an absolute URL pointing to your server; otherwise clients
would translate it relative to the remote host. \par 

{\bf Furthermore,} you must have a mapping from this
absolute URL to your local file system, e.g.:
\begin{verbatim}
    AddIcon  http://your.server/icons/UNKNOWN.gif  ???  */*
    AddIcon  http://your.server/icons/TEXT.gif     TXT  text/*
    AddIcon  http://your.server/icons/IMAGE.gif    IMG  image/*
    AddIcon  http://your.server/icons/SOUND.gif    AU   audio/*
    AddIcon  http://your.server/icons/MOVIE.gif    MOV  video/*
    AddIcon  http://your.server/icons/PS.gif       PS   application/postscript

    Pass http://your.server/icons/*  /absolute/icon/dir/*
    Pass /icons/*                    /absolute/icon/dir/*
    Pass http:*
    Pass ftp:*
    Pass gopher:*
\end{verbatim}


{}
Both the full and partial icon URLs are {\tt Pass}'ed because
smart clients may be configured to connect to local
servers directly, instead of through the proxy, and in that case the proxy
server (which is then just a normal HTTP server from client's point
of view) will be requested for {\tt /icons/...} instead of
{\tt http://your.server/icons/...}.  The proxy server has no way
of knowing which will happen. \par 


\par 
\subsection{Icons in Gopher Listings}
There are special internal (to {\tt httpd}) MIME content types
that can be bound to icons for gopher listings (the names should be
self-explanatory):
\begin{itemize}
\item  {\tt application/x-gopher-index}
\item  {\tt application/x-gopher-cso}
\item  {\tt application/x-gopher-telnet}
\item  {\tt application/x-gopher-tn3270}
\item  {\tt application/x-gopher-duplicate}
\end{itemize}


\par 
\subsection{Special Icons}

{\tt httpd} needs some special icons:
\begin{DL}{allow this much space}
\item[ {\tt AddBlankIcon}
] Icon URL used in the heading of the listing to align it.
     This is typically a blank icon, but may contain some nice image
     that you wish to have on top of all your listings.  The only criterion
     is that it must be the same size as the other icons. \par 

\item[ {\tt AddUnknownIcon}
] Icon URL used for unknown file types, i.e. files for which no
     other icon binding applies.  If you have an exhaustive set of
     {\tt AddIcon} directives this needs not be used. \par 

\item[ {\tt AddDirIcon}
] Icon URL for directories. \par 

\item[ {\tt AddParentIcon}
] Icon URL for parent directory. \par 
\end{DL}


\subsubsection{Example For a Regular HTTP Server}
{}
Remember to {\tt Pass} the icon URLs! \par 
\begin{verbatim}
        AddBlankIcon    /icons/BLANK.gif
        AddUnknownIcon  /icons/UNKNOWN.gif  ???
        AddDirIcon      /icons/DIR.gif      DIR
        AddParentIcon   /icons/PARENT.gif   UP

	Pass  /icons/*  /absolute/icon/dir/*
        ...other rules...
\end{verbatim}


\subsubsection{Example For a Proxy Server}
{}
Icon URLs {\bf must be absolute URLs}, and you must have
a mapping from the absolute form to local form, and remember to
{\tt Pass} them:
\begin{verbatim}
        AddBlankIcon    http://your.server/icons/BLANK.gif
        AddUnknownIcon  http://your.server/icons/UNKNOWN.gif  ???
        AddDirIcon      http://your.server/icons/DIR.gif      DIR
        AddParentIcon   http://your.server/icons/PARENT.gif   UP

        Pass http://your.server/icons/*  /absolute/icon/dir/*
        Pass /icons/*                    /absolute/icon/dir/*
        Pass  http:*
        Pass  ftp:*
        Pass  gopher:*
\end{verbatim}


\par 


\section{{}
Logging Control In CERN httpd}

{\tt cern\_httpd} logs all the incoming requests to an access
log file.  It also has an error log where internal server errors are
logged.
\begin{itemize}
\item  {\tt AccessLog}
	- Set access log file name
\item  {\tt ErrorLog}
	- Set error log file name
\item  {\tt LogFormat}
	- Set access log file format
\item  {\tt LogTime}
	- Set time zone for log files
\item  {\tt NoLog}
	- No log entries for listed hosts/domains
\item  {\tt CacheAccessLog}
	- Log cache accesses to a different log file
\end{itemize}


\par 
\subsection{Access Log File}

Access log file contains a log of all the requests.  The name of the
log file is spesified either by {\tt -l }{\it logfile\/} command
line option, or with {\tt AccessLog} directive:
\begin{verbatim}
        AccessLog <I>/absolute/path/logfile</I>
\end{verbatim}


\par 
\subsection{Error Log File}

Error log contains a log of errors that might prove useful when
figuring out if something doesn't work.  Error log file name is set by
{\tt ErrorLog} directive:
\begin{verbatim}
        ErrorLog <I>/absolute/path/errorlog</I>
\end{verbatim}

If error log file is not specified, it defaults to access log file
name with {\tt .error} extension. If the filename extension
already exists, {\tt .error} will replace it. \par 


\par 
\subsection{Log File Format}

Previously every server used to have its own logfile format which made
it difficult to write general statistics collectors.  Therefore there
is now a {\em common logfile format\/} (which will eventually become
the default).  Currently it is enabled by
\begin{verbatim}
        LogFormat  Common
\end{verbatim}

The old CERN {\tt httpd} format can be used by
\begin{verbatim}
        LogFormat  Old
\end{verbatim}


\par 
\subsection{Log Time Format}

Times in the log file are by default local time.  That can be changed
to be GMT time by {\tt LogTime} directive:
\begin{verbatim}
        LogTime  GMT
\end{verbatim}

Default is:
\begin{verbatim}
        LogTime  LocalTime
\end{verbatim}


\par 
\subsection{Suppressing Log Entries For Certain Hosts/Domains}

It's not always necessary to collect log information of accesses made
by local hosts.  The {\tt NoLog} directive can be used to
prevent log entry being made for hosts matching a given IP number or
host name template:
\begin{verbatim}
        NoLog  <I>template</I>
\end{verbatim}

\subsubsection{Examples}
\begin{verbatim}
        NoLog 128.141.*.*
        NoLog *.cern.ch
        NoLog *.ch  *.fr  *.it
\end{verbatim}


\par 


\section{{}
Timeout Settings}

Something may go wrong with the connection to the client causing
{\tt httpd} to hang infinitely doing nothing.  This can be
avoided by setting timeouts on different tasks that the server
performs.  All of these timeouts have relatively good default values
by default and they don't usually need to be changed. \par 

All the times for these directives are of form:
\begin{verbatim}
        45 secs
        10 mins
        2 mins 30 secs
        1 hour
\end{verbatim}


\par 
\subsection{InputTimeOut}

{\tt InputTimeOut} diretictive specifies the time to wait for
the client to send the request (the MIME-header part of it, not the
message body). Default value is:
\begin{verbatim}
        InputTimeOut  2 mins
\end{verbatim}


\par 
\subsection{OutputTimeOut}

{\tt OutputTimeOut} diretictive specifies the time to allow for
sending the response. Default value is:
\begin{verbatim}
        OutputTimeOut  20 mins
\end{verbatim}

If you are serving huge files for clients behind slow connections you
may want to increase this value if you hear of connections being cut
in the middle of transfer. \par 


\par 
\subsection{ScriptTimeOut}

{\tt ScriptTimeOut} diretictive specifies the time to allow for
server scripts to finish.  If a script doesn't return in the time
specified {\tt httpd} will send {\tt TERM} and
{\tt KILL} signals to it (with 5 seconds in between to let
scripts do cleanup upon exit).
Default value is:
\begin{verbatim}
        ScriptTimeOut  5 mins
\end{verbatim}


\par 


\section{{}
Proxy Caching}

When {\tt cern\_httpd} is run as a
proxy it can perform caching of
the documents retrieved from remote hosts to make futher requests
faster. \par 

\begin{itemize}
\item  {\tt Caching}
	- Turn caching on

\item  {\tt CacheRoot}
	- Set cache root directory for a proxy server

\item  {\tt CacheSize}
	- Specify cache size (in megabytes)

\item  {\tt NoCaching}
	- No caching for URLs matching a given mask

\item  {\tt CacheOnly}
	- Cache only if URL matches a given set of URLs

\item  {\tt CacheClean}
	- Remove everything older than this (in days)

\item  {\tt CacheUnused}
	- Remove if has been unused this long (in days)

\item  {\tt CacheDefaultExpiry}
	- Default expiry time if not given by remote server (in days)

\item  {\tt CacheLastModifiedFactor}
	- Factor used in approximating expiry date

\item  {\tt CacheTimeMargin}
	- Time accuracy between hosts

\item  {\tt CacheNoConnect}
	- Standalone cache mode - no external document retrievals

\item  {\tt CacheExpiryCheck}
	- Turn off expiry checking for standalone operation

\item  {\tt Gc}
	- Enable and disable garbage collection

\item  {\tt GcDailyGc}
	- Time for daily garbage collection

\item  {\tt GcTimeInterval}
	- Interval to do cache garbage collection (in hours)

\item  {\tt GcReqInterval}
	- Number of requests between garbage collections

\item  {\tt GcMemUsage}
	- Garbage collector memory usage directive

\item  {\tt CacheLimit\_1}
	- First cache file size limit (kilobytes)

\item  {\tt CacheLimit\_2}
	- Second cache file size limit (kilobytes)

\item  {\tt CacheLockTimeOut}
	- Break cache locks after this timeout

\item  {\tt CacheAccessLog}
	- Log cache accesses to a different log file
\end{itemize}


\par 
\subsection{Turning Caching On and Off}

Caching is normally turned implicitly on by specifying the
Cache Root Directory, but it can be
explicitly turned on and off by {\tt Caching} directive:
\begin{verbatim}
        Caching On
\end{verbatim}


\par 

\subsection{Setting Cache Directory}

Caching is enabled on a server running as a gateway (proxy) by
{\tt CacheRoot} directive, which is used to set the absolute
path of the cache directory:
\begin{verbatim}
        CacheRoot <I>/absolute/cache/directory</I>
\end{verbatim}

\par 


\subsection{Cache Size}

{\tt CacheSize} directive sets the maximum cache size in
megabytes.  Default value is 5MB, but its preferable to have several
megabytes of cache, like 50-100MB, to get best results.  Cache may,
however, temporarily grow a few megabytes bigger than specified.

\subsubsection{Example}
\begin{verbatim}
        CacheSize 20 M
\end{verbatim}

sets cache size to 20 megabytes. \par 


\par 
\subsection{NoCaching}

URLs matching a template given by {\tt NoCaching} directive
will never be cached, e.g.:
\begin{verbatim}
        http://really.useless.site/*
\end{verbatim}

From version 3.0 on templates can have any number of wildcard characters
{\tt *}. \par 


\par 
\subsection{CacheOnly}

Only the URLs matching templates given by {\tt CacheOnly}
directives will be cached, e.g.:
\begin{verbatim}
        http://really.important.site/*
\end{verbatim}

From version 3.0 on templates can have any number of wildcard characters
{\tt *}. \par 


\par 
\subsection{Maximum Time to Keep Cache Files}

All cached documents matching a specified template and that are older
than specified by {\tt CacheClean} directive will be removed.
This value overrides expiry date in that no file can be stored longer
than this value specifies, regardless of expiry date.

\subsubsection{Examples}
\begin{verbatim}
        CacheClean http:*     1 month
        CacheClean ftp:*     14 days
        CacheClean gopher:*   5 days 12 hours
\end{verbatim}


\par 
\subsection{Maximum Time to Keep Unused Files}

Cache files matching a template and having been unused longer than
specified by {\tt CacheUnused} directive will be removed.

\subsubsection{Examples}
\begin{verbatim}
        CacheUnused *                      4 days 12 hours
        CacheUnused http://info.cern.ch/*  7 days
        CacheUnused ftp://some.server/*   14 days
\end{verbatim}

Note that the last matching specification will have precedence;
therefore HTTP files from {\tt info.cern.ch} will be kept
7 days, and {\bf not} 4.5 days. \par 


\par 
\subsection{Default Expiry Time}

Files for which the server gave neither {\tt Expires:} nor
{\tt Last-Modified:} header will be kept at most the time
specified by {\tt CacheDefaultExpiry} directive.
Default values are zero for HTTP (script replies shouldn't be cached),
and 1 day for FTP and Gopher. \par 

\subsubsection{Example}
\begin{verbatim}
        CacheDefaultExpiry ftp:*     1 month
        CacheDefaultExpiry gopher:*  10 days
\end{verbatim}


{} Default expiry for HTTP will
almost always cause problems because there are currently many scripts
that don't give an expiry date, yet their output expires immediately.
Therefore, it is better to keep the default value for
{\tt http:} in zero. \par 


\par 
\subsection{CacheLastModifiedFactor}

Currently HTTP servers give usually only the
{\tt Last-Modified} time, but not {\tt Expires} time.
{\tt Last-Modified} can often be successfully used to
approximate expiry date.  {\tt CacheLastModifiedFactor} gives
the fraction of time since last modification to give the remaining
time to be up-to-date. \par 

Default value is {\tt 0.1}, which means that e.g. file modified
20 days ago will expire in 2 days. \par 


\subsubsection{Examples}
\begin{verbatim}
        CacheLastModifiedFactor  0.2
\end{verbatim}

would cause files modified 5 months ago to expire after one month. \par 

This feature can be turned off by specifying:
\begin{verbatim}
        CacheLastModifiedFactor  Off
\end{verbatim}


\par 
\subsection{CacheTimeMargin}

Sometimes inaccurate times on other hosts cause confusion in caching.
It often also makes sense not to cache documents that will expiry in
a couple of minutes anyway.  {\tt CacheTimeMargin} defines this
time margin, by default:
\begin{verbatim}
        CacheTimeMargin  2 mins
\end{verbatim}

No document expiring in less than two minutes will be written to disk.
\par 


\par 
\subsection{CacheNoConnect}

This directive puts proxy to standalone cache mode, i.e. only the
documents found in the cache are returned, and ones no in the cache
will return error rather than connection to the outside world.  This
is useful for demo-purposes and in other cases without network
connection:
\begin{verbatim}
        CacheNoConnect On
\end{verbatim}

Default setting is naturally {\tt Off}. \par 

This directive is typically used with expiry checking also turned
{\tt Off}. \par 


\par 
\subsection{CacheExpiryCheck}

If (for demo-reasons etc) it's desired that the proxy always returns
documents from the cache, even if they have expired,
{\tt CacheExpiryCheck} can be turned off:
\begin{verbatim}
        CacheExpiryCheck  Off
\end{verbatim}

Default setting is {\tt On}, meaning that proxy never returns an
expired document. \par 

This is usually used in standalone cache
mode ({\tt CacheNoConnect} diretive turned
{\tt On}). \par 


\par 
\subsection{Garbage Collection}

When caching is enabled garbage collection is also activated by
default.  This can be explicitly turned off with {\tt Gc}
directive:
\begin{verbatim}
        Gc  Off
\end{verbatim}


\par 
\subsection{When to Do Garbage Collection}

Garbage collection is launched right away when cache size limit is
reached.  However, to keep cache smaller it might be desirable to
remove expired files even if there is still cache space remaining.
It is possible to to launch garbage collection at a certain time,
usually outside the busy hours:l
\begin{verbatim}
        GcDailyGc      <I>time</I>
\end{verbatim}
 \par 

{\tt GcDailyGc} specifies the time to do daily garbage
collection, normally during the night.  Default value is 3:00.
Daily garbage collection can be disabled by specifying
{\tt Off}. \par 

\subsubsection{Example}
Default value would be specified as:
\begin{verbatim}
        GcDailyGc       3:00
\end{verbatim}

Another example: turning daily gc off:
\begin{verbatim}
        GcDailyGc       Off
\end{verbatim}


\par 
\subsection{Memory Usage of Garbage Collector}

Garbage collector performs its job best if if can read information
about the whole cache into memory at once.  This is not possible if
the machine doesn't have enough main memory. \par 

{\tt GcMemUsage} directive advices garbage collector about how
much memory to use.  You may imagine this is the number of kilobytes
to use for gc data, but it may vary greatly according to dynamic
things, like the directory structure of cached files. \par 

Default is 500; if gc fails because memory runs out make this smaller.
If your machine has so much memory that it just can't run out, make
this very big. \par 

\subsubsection{Example}
\begin{verbatim}
        GcMemUsage 100
\end{verbatim}

if you have very little memory. \par 

\par 


\subsection{Cache File Sizes}

There are two limits controlling the size factor of a file when its
value is being calculated.  {\tt CacheLimit\_1} sets the lower
limit; under this all the files have equal size factor.
{\tt CacheLimit\_2} sets up higher limit; files bigger than this
get extremely bad size factor (meaning they get removed right away
because they are too big). \par 

Sizes are specified in kilobytes, and defaults values are 200K and
4MB, respectively.

\subsubsection{Examples}
\begin{verbatim}
        CacheLimit_1 200 K
        CacheLimit_2 4000 K
\end{verbatim}

would set the same values as the defaults, 200K and 4MB. \par 

\par 


\subsection{Cache Lock Timeout}

During retrieval cache files are locked.  If something goes wrong a
lock file may be left hanging.  {\tt CacheLockTimeOut}
directive sets the amount of time after which lock can be broken.
Time is specified like all the other times in the configuration file, and
default value is 20 minutes, the same as default {\tt OutputTimeOut}.
{\bf CacheLockTimeOut should never be less than
OutputTimeOut!}

\subsubsection{Example}
\begin{verbatim}
        CacheLockTimeOut  30 mins
\end{verbatim}

would set lock timeout to half an hour. \par 


\par 
\subsection{CacheAccessLog}

Cache accesses can be logged to a different log file instead of the
normal access log.  The
{\tt CacheAccessLog} directive takes an absolute pathname of
the cache access log file:
\begin{verbatim}
        CacheAccessLog  <I>/absolute/path/file.log</I>
\end{verbatim}


\par 


\section{{}
Configuring Proxy To Connect To Another Proxy}

If there is a need to make an (inner) proxy cern\_httpd connect to the outside world via
another (outer) proxy server, you can use the same environment
variables as are used to redirect clients to the proxy to make inner
proxy use the outer one:
\begin{itemize}
\item  {\tt http\_proxy}
\item  {\tt ftp\_proxy}
\item  {\tt gopher\_proxy}
\item  {\tt wais\_proxy}
\end{itemize}
E.g. your (inner) proxy server's startup script could look like this:
\begin{verbatim}
        #!/bin/sh
        http_proxy=http://outer.proxy.server:8082/
        export http_proxy
        /usr/etc/httpd -r /etc/inner-proxy.conf -p 8081
\end{verbatim}

This is a little ugly, so there are also the following directives in
the configuration file:
\begin{itemize}
\item  {\tt http\_proxy }  {\it http://outer.proxy.server/\/}
\item  {\tt ftp\_proxy }  {\it http://outer.proxy.server/\/}
\item  {\tt gopher\_proxy }  {\it http://outer.proxy.server/\/}
\item  {\tt wais\_proxy }  {\it http://outer.proxy.server/\/}
\end{itemize}


\par 

\subsection{no\_proxy}

In the same way that clients can specify a set of domains for which
the proxy should not be consulted, {\tt httpd} has a
{\tt no\_proxy} configuration directive to tell it that it
should not connect to another proxy for certain URLs:
\begin{verbatim}
        no_proxy  cern.ch,ncsa.uiuc.edu,some.host:8080
\end{verbatim}

{}
The argument string is a comma-separated list and should {\bf not contain
spaces!} \par 


\par 


\chapter{{}
Configuration File Examples}

\begin{DL}{allow this much space}

\item[ {\tt httpd.conf}
] sample configuration file for running as a normal HTTP server.

\item[ {\tt prot.conf}
] sample configuration file for running as a normal HTTP server
     with access control.

\item[ {\tt proxy.conf}
] sample configuration file for running as a
     proxy
     {\bf without caching.}

\item[ {\tt caching.conf}
] sample configuration file for running as a
     proxy
     {\bf with caching.}

\end{DL}

\par 
\par 


\section{Normal HTTP Server Configuration}
\begin{verbatim}
#
#	Sample configuration file for cern_httpd for running it
#	as a normal HTTP server.
#
# See:
#	<http://info.cern.ch/hypertext/WWW/Daemon/User/Config/Overview.html>
#
# for more information.
#
# Written by:
#	Ari Luotonen  April 1994  <luotonen@dxcern.cern.ch>
#

#
#	Set this to point to the directory where you unpacked this
#	distribution, or wherever you want httpd to have its "home"
#
ServerRoot	/where/ever/server_root

#
#	The default port for HTTP is 80; if you are not root you have
#	to use a port above 1024; good defaults are 8000, 8001, 8080
#
Port	80

#
#	General setup; on some systems, like HP, nobody is defined so
#	that setuid() fails; in those cases use a different user id.
#
UserId	nobody
GroupId	nogroup

#
#	Logging; if you want logging uncomment these lines and specify
#	locations for your access and error logs
#
# AccessLog	/where/ever/httpd-log
# ErrorLog	/where/ever/httpd-errors
LogFormat	Common
LogTime		LocalTime

#
#	User-supported directories under ~/public_html
#
UserDir	public_html

#
#	Scripts; URLs starting with /cgi-bin/ will be understood as
#	script calls in the directory /your/script/directory
#
Exec	/cgi-bin/*	/your/script/directory/*

#
#	URL translation rules; If your documents are under /local/Web
#	then this single rule does the job:
#
Pass	/*	/local/Web/*

\end{verbatim}


\section{Normal HTTP Server With Access Control}
\begin{verbatim}
#
#	Sample configuration file for cern_httpd for running it
#	as a normal HTTP server WITH access control.
#
# See:
#	<http://info.cern.ch/hypertext/WWW/Daemon/User/Config/Overview.html>
#
# for more information.
#
# Written by:
#	Ari Luotonen  April 1994  <luotonen@dxcern.cern.ch>
#

#
#	Set this to point to the directory where you unpacked this
#	distribution, or wherever you want httpd to have its "home"
#
ServerRoot	/where/ever/server_root

#
#	The default port for HTTP is 80; if you are not root you have
#	to use a port above 1024; good defaults are 8000, 8001, 8080
#
Port	80

#
#	General setup; on some systems, like HP, nobody is defined so
#	that setuid() fails; in those cases use a different user id.
#
UserId	nobody
GroupId	nogroup

#
#	Logging; if you want logging uncomment these lines and specify
#	locations for your access and error logs
#
# AccessLog	/where/ever/httpd-log
# ErrorLog	/where/ever/httpd-errors
LogFormat	Common
LogTime		LocalTime

#
#	User-supported directories under ~/public_html
#
UserDir	public_html

#
#	Protection setup by usernames; specify groups in the group
#	file [if you need groups]; create and maintain password file
#	with the htadm program
#
Protection PROT-SETUP-USERS {
	UserId		nobody
	GroupId		nogroup
	ServerId	YourServersFancyName
	AuthType	Basic
	PasswdFile	/where/ever/passwd
	GroupFile	/where/ever/group
	GET-Mask	user, user, group, group, user
}

#
#	Protection setup by hosts; you can use both domain name
#	templates and IP number templates
#
Protection PROT-SETUP-HOSTS {
	UserId		nobody
	GroupId		nogroup
	ServerId	YourServersFancyName
	AuthType	Basic
	PasswdFile	/where/ever/passwd
	GroupFile	/where/ever/group
	GET-Mask	@(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu)
}

Protect	/very/secret/URL/*  	PROT-SETUP-USERS
Protect	/another/secret/URL/*	PROT-SETUP-HOSTS

#
#	Scripts; URLs starting with /cgi-bin/ will be understood as
#	script calls in the directory /your/script/directory
#
Exec	/cgi-bin/*	/your/script/directory/*

#
#	URL translation rules; If your documents are under /local/Web
#	then this single rule does the job:
#
Pass	/*	/local/Web/*


\end{verbatim}


\section{Proxy Configuration With Caching}

The configuration {\bf without caching} is otherwise the
same, just leave out all the directives starting with
"{\tt Cache}" or "{\tt Gc}".

\begin{verbatim}
#
#	Sample configuration file for cern_httpd for running it
#	as a proxy server WITH caching.
#
# See:
#	<http://info.cern.ch/hypertext/WWW/Daemon/User/Config/Overview.html>
#
# for more information.
#
# Written by:
#	Ari Luotonen  April 1994  <luotonen@dxcern.cern.ch>
#

#
#	Set this to point to the directory where you unpacked this
#	distribution, or wherever you want httpd to have its "home"
#
ServerRoot	/where/ever/server_root

#
#	Set the port for proxy to listen to
#
Port	8080

#
#	General setup; on some systems, like HP, nobody is defined so
#	that setuid() fails; in those cases use a different user id.
#
UserId	nobody
GroupId	nogroup

#
#	Logging; if you want logging uncomment these lines and specify
#	locations for your access and error logs
#
# AccessLog	/where/ever/proxy-log
# ErrorLog	/where/ever/proxy-errors
LogFormat	Common
LogTime		LocalTime

#
#	Proxy protections; if you want only certain domains to use
#	your proxy, uncomment these lines and specify the Mask
#	with hostname templates or IP number templates:
#
# Protection PROXY-PROT {
# 	ServerId	YourProxyName
# 	Mask		@(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu)
# }
# Protect  *  PROXY-PROT

#
#	Pass the URLs that this proxy is willing to forward.
#
Pass	http:*
Pass	ftp:*
Pass	gopher:*
Pass	wais:*

#
#	Enable caching, specify cache root directory, and cache size
#	in megabytes
#
Caching		On
CacheRoot	/your/cache/root/dir
CacheSize	5

#
#	Specify absolute maximum for caching time
#
CacheClean	*	2 months

#
#	Specify the maximum time to be unused
#
CacheUnused	http:*		2 weeks
CacheUnused	ftp:*		1 week
CacheUnused	gopher:*	1 week

#
#	Specify default expiry times for ftp and gopher;
#	NEVER specify it for HTTP, otherwise documents generated by
#	scripts get cached which is usually a bad thing.
#
CacheDefaultExpiry	ftp:*		10 days
CacheDefaultExpiry	gopher:*	2 days

#
#	Garbage collection controls; daily garbage collection at 3am;
#
Gc		On
GcDailyGc	3:00


\end{verbatim}


\chapter{{}
CERN Server CGI/1.1 Script Support}

Server scripts are used to handle searches,
clickable images and forms, and to
produce synthesized documents on the fly.  See calendar and finger gateway for
examples. \par 

\par 

\section{In This Section...}
\begin{itemize}
\item  Using {\tt Exec} rule to allow scripts
\item  CGI Interface $--$ Script Input
\item  CGI Interface $--$ Script Output
\item  NPH-Scripts $--$ No Parsing of Headers
\item  Setting up a search script
\end{itemize}

\par 

\section{{} Important Note!}

CERN {\tt httpd} versions 2.15 and newer have
{\bf two} script interfaces.  The other one is the official
CGI,
Common Gateway Interface, which enables scripts to be shared
between different server implementations (NCSA server, Plexus, etc).
The other one is the original, very easy-to-use, interface, that was
introduced in version 2.13. \par 

{\bf Use of CGI instead of the old interface is strongly
encouraged.}\par 

{\bf IMPORTANT:} If you have, or wish to write, scripts
that use the old interface, your script name has to end in
{\tt .pp} suffix (comes from "Pre-Parsed").  URLs referring to
these scripts should not contain this suffix.  This is to make it
easier to later upgrade to CGI scripts, so you only need to change
the script name in the file system, and not the documents pointing to
it.  If you absolutely want to use the old interface (which is nice
for quick hacks that don't need to be portable), see the doc. \par 

\par 

\section{Setting Up httpd To Call Scripts}

The server knows that a request is actually a script request by
looking at the beginning of the URL pathname.  You can specify these
special strings in the configuration file
{\tt (/etc/httpd.conf)} by {\tt Exec} rules:
\begin{verbatim}
        Exec <I>/url-prefix/*  /physical-path/*</I>
\end{verbatim}

Where {\it /url-prefix/\/} is the special string that signifies a
script request, and {\it /physical-path/\/} is the absolute filesystem
pathname of the {\bf directory} that contains your scripts.
\par 

\subsection{Example}
\begin{verbatim}
        Exec  /htbin/*  /usr/etc/cgi-bin/*
\end{verbatim}

makes URL paths starting with {\tt /htbin} to be mapped to
scripts in directory {\tt /usr/etc/cgi-bin.} I.e.
requesting
\begin{verbatim}
        /htbin/myscript
\end{verbatim}

causes a call to script
\begin{verbatim}
        /usr/etc/cgi-bin
\end{verbatim}


\subsection{Historical Note}

In {\tt httpd} versions before 2.15 there was an
{\tt HTBin} directive:
\begin{verbatim}
        HTBin  <I>/physical-path</I>
\end{verbatim}

which is now obsolite, but understood by the server to mean
\begin{verbatim}
        Exec  /htbin/*  <I>/physical-path/*</I>
\end{verbatim}

Use of {\tt Exec} rule instead is recommended for its
generality. \par 

\par 


\section{Information Passed to CGI Scripts}

CGI scripts get their input mainly from environment
variables and standard
input (when using {\tt POST} method).  Search scripts get
keywords also as command
line arguments. \par 

Most important environment variables are:
\begin{DL}{allow this much space}
\item[{\tt QUERY\_STRING}]
	The query part of URL, that is, everything that follows the
	question mark.  This string is URL-encoded, meaning that
	special characters like spaces and newlines are encoded into
	their hex notation (\%xx), and characters like {\tt + =
	\&} have a special meaning.
	The contents of this variable can be easily parsed using the
	{\tt cgiparse} program. \par 

\item[{\tt PATH\_INFO}]
	Extra path information given after the script name, for
	example with {\tt Exec} rule:
\begin{verbatim}
        Exec  /htbin/*  /usr/etc/cgi-bin/*
\end{verbatim}

        a URL with path
\begin{verbatim}
        /htbin/myscript/extra/pathinfo
\end{verbatim}

        will execute the script {\tt /usr/etc/cgibin/myscript}
	with {\tt PATH\_INFO} environment variable set to
	{\tt /extra/pathinfo}. \par 

\item[{\tt PATH\_TRANSLATED}]
	Extra pathinfo translated through the rule system. (This
	doesn't always make sense.) \par 
\end{DL}


See also NCSA's
primer to writing CGI scripts. \par 

\par 


\section{Results From Scripts}

Scripts return their results either outputting a document to their
standard output, or by outputting the location of the
result document (either a full URL or a local virtual path).


\par 
\subsection{Outputting a Document}

Script result must begin with a {\tt Content-Type:} line giving
the document content type, followed by {\bf an empty line}.
The actual document follows the empty line.
Example:
\begin{verbatim}
        Content-Type: text/html

        <HEAD>
        <TITLE>Script test>
        </HEAD>
        <BODY>
        <H1>My First Virtual Document</H1>
        ....
        </BODY>
\end{verbatim}


\par 
\subsection{Giving Document Location}

If the script wants to return an existing document (local or remote),
it can give a {\tt Location:} header followed by an empty line:
Example:
\begin{verbatim}
        Location: http://info.cern.ch/hypertext/WWW/TheProject.html

\end{verbatim}

This causes the server to send a redirection to client, which then
retrieves that document.  If {\tt Location} starts with a slash
(is not a full URL), it is taken to be a virtual path for a document
on the same machine, and server passes this string right away through
the rule system and serves that document as if it had been requested
in the first place.  In this case clients don't do the redirection,
but the server does it "on the fly". \par 
Example:
\begin{verbatim}
        Location: /hypertext/WWW/TheProject.html
\end{verbatim}

Understand, that this is a {\bf virtual path}, so after
translations it might be, for example,
{\tt /Public/Web/TheProject.html}. \par 

{\bf Important:} Only {\bf full} URLs in
{\tt Location} field can contain the {\it \#label\/} part of URL,
because that is meant only for the client-side, and the server cannot
possibly handle it in any way. \par 


\par 
\subsection{NPH-Scripts (No-Parse-Headers)}

Script wishing to output the entire HTTP reply (including status line
and all response headers) should be named to begin with
{\tt nph-} prefix.  This makes {\tt httpd} connect
script's output stream directly to requesting client reducing the
overhead of server needlessly parsing the response headers. \par 

\subsubsection{Example Of NPH-Script Output}
\begin{verbatim}
        HTTP/1.0 200 Script results follow
        Server: MyScript/1.0 via CERN/3.0
        Content-Type: text/html

        <HEAD>
        <TITLE>Just testing...</TITLE>
        </HEAD>
        <BODY>
        <H1>Output From NPH-Script</H1>
        Yep, seems to work.
        </BODY>
\end{verbatim}


\par 
\section{Setting Up A Search Script}

There is a special {\tt Search} directive in the configuration
file givin the {\bf absolute} pathname of the script
performing the search:
\begin{verbatim}
        Search <I>/absolute/path/search</I>
\end{verbatim}

Every time a document is searched, this script is called with
\begin{DL}{allow this much space}
\item[Command line]
	containing the search keywords decoded, one in each of
	{\tt argv\lbrack 1\rbrack }, {\tt argv\lbrack 2\rbrack }, ...

\item[{\tt QUERY\_STRING}]
	containing the query string encoded, as it came in the URL
	after the question mark.

\item[{\tt PATH\_INFO}]
	Virtual path of the document that the search was issued from.

\item[{\tt PATH\_TRANSLATED}]
	Absolute filesystem path of the document.
\end{DL}

Search results are output in the usual way:
\begin{verbatim}
        Content-Type: text/html

        ...generated document...
\end{verbatim}


\par 


\chapter{{}
cgiparse Manual}

{\tt cgiparse} handles {\tt QUERY\_STRING} environment
variable parsing for CGI scripts.  It comes with CERN server
distributions {\bf 2.15} and newer. \par 

If the {\tt QUERY\_STRING} environment variable is not set, it
reads {\tt CONTENT\_LENGTH} characters from its standard input.
\par 

\par 
\section{Command Line Options}

\subsection{Main Options}

\begin{DL}{allow this much space}
\item[ {\tt cgiparse -keywords}]
	Parse {\tt QUERY\_STRING} as search keywords.  Keywords
	are decoded and written to standard output, one per line. \par 

\item[ {\tt cgiparse -form}]
	Parse {\tt QUERY\_STRING} as form request.
	Outputs a string which, when {\tt eval}'ed by Bourne shell,
	will set shell variables beginning with {\tt FORM\_}
	appended with field name.  Field values are the contents of
	the variables. \par 

\item[ {\tt cgiparse -value } {\it fieldname\/}]
	Parse {\tt QUERY\_STRING} as form request.
	Prints only the value of field {\it fieldname\/}. \par 

\item[ {\tt cgiparse -read}]
	Just read {\tt CONTENT\_LENGTH} characters from
	{\tt stdin} and write them to {\tt stdout.} \par 

\item[ {\tt cgiparse -init}]
	If {\tt QUERY\_STRING} is not defined, read
	{\tt stdin} and output a string that when
	{\tt eval}'d by Bourne shell it will set
	{\tt QUERY\_STRING} to its correct value.  This can be
	used when the same script is used with both {\tt GET}
	and {\tt POST} method. Typical use in the beginning of
	Bourne shell script:
\begin{verbatim}
        eval `cgiparse -init`
\end{verbatim}

	After this command the {\tt QUERY\_STRING} environment
	variable will be set regardless of whether {\tt GET} or
	{\tt POST} method was used.  Therefore
	{\tt cgiparse} may be called multiple times in the same
	script (otherwise with {\tt POST} it could only be
	called once because after that the {\tt stdin} would be
	already read, and the next {\tt cgiparse} would hang).
\par 
\end{DL}

\par 

\subsection{Modifier Options}

\begin{DL}{allow this much space}
\item[ {\tt -sep } {\it separator\/}]
	Specify the string used to separate multiple values.  With \begin{itemize}
	\item  {\tt -value} default is newline
	\item  {\tt -form} default is "{\it , \/}"
	\end{itemize} \par 

\item[ {\tt -prefix } {\it prefix\/}]
	\begin{itemize}
	\item  Only with {\tt -form.}
		Specify the prefix to use when making up environment
		variable names.  Default is "{\it FORM\_\/}". \par 
	\end{itemize}

\item[ {\tt -count}]
	With \begin{itemize}
	\item  {\tt -keywords} outputs the number of keywords
	\item  {\tt -form} outputs the number of unique fields
		(multiple values are counted as one)
	\item  {\tt -value } {\it fieldname\/} gives the number of
		values of field {\it fieldname\/} (no such field is
		zero, one field gives 1, one multiple 2, etc).
	\end{itemize} \par 

\item[ {\tt -}{\it number\/} , e.g. {\tt -2}]
	With \begin{itemize}
	\item  {\tt -keywords} gives {\it n\/}'th keyword
	\item  {\tt -form} gives all the values of {\it n\/}'th
		field
	\item  {\tt -value } {\it fieldname\/} gives {\it n\/}'th
		of the multiple values of field {\it fieldname\/}
		(first value is number 1).
	\end{itemize} \par 


\item[ {\tt -quiet}]
	Suppress all error messages.  (Non-zero exit status still
	indicates error.) \par 
\end{DL}

All options have one-character equivalents:
{\tt -k -f -v -r -i -s -p -c -q} \par 

\par 
\section{Exit Statuses}
\begin{itemize}
\item  {\tt 0  }	Success
\item  {\tt 1  }	Illegal command line
\item  {\tt 2  }	Environment variables not set correctly
\item  {\tt 3  }	Failed to get requested information (no such
			field, {\tt QUERY\_STRING} contains
			keywords when form field values requested,
			etc).
\end{itemize}

\par 
\section{Examples}

Note: In real life, of course, {\tt QUERY\_STRING} is already
set by the server. \par 

Here {\tt \$} is the Bourne shell prompt. \par 

\par 
\subsection{Keyword Search}
\begin{verbatim}
    $ <B>QUERY_STRING="is+2%2B2+really+four%3F"</B>
    $ <B>export QUERY_STRING</B>
    $ <B>cgiparse -keywords</B>
    is
    2+2
    really
    four?
    $
\end{verbatim}


\par 
\subsection{Parsing All Form Fields}
\begin{verbatim}
    $ <B>QUERY_STRING="name1=value1&name2=Second+value%3F+That%27s right%21"</B>
    $ <B>export QUERY_STRING</B>
    $ <B>cgiparse -form</B>

    FORM_name1='value1'; FORM_name2='Second value? That'\''s right!'

    $ <B>eval `cgiparse -form`</B>
    $ <B>set</B>
    ...
    FORM_name1=value1
    FORM_name2=Second value? That's right!
    ...
    $
\end{verbatim}


\par 
\subsection{Extracting Only One Field Value}
\begin{verbatim}
    QUERY_STRING <I>as in previous example.</I>
    $ <B>cgiparse -value name1</B>
    value1
    $ <B>cgiparse -value name2</B>
    Second value? That's right!
    $
\end{verbatim}


\par 


\chapter{{}
cgiutils Manual}

{\tt cgiutils} program is provided to make it easier to produce
easily a full HTTP1 response header by NPH \lbrack No-Parse-Headers\rbrack  scripts.
It can also be used to just calculate the {\tt Expires:}
header, given the time to live in a human-friendly way, like
\begin{verbatim}
        1 year 3 months 2 weeks 4 days 12 hours 30 mins 15 secs
\end{verbatim}


\section{Command Line Options}

\begin{DL}{allow this much space}

\item[ {\tt cgiutils -version}
] print the version information. \par 

\item[ {\tt -nodate}
] don't produce the {\tt Date:} header. \par 

\item[ {\tt -noel}
] don't print the empty line after headers \lbrack in case you want to
     output other MIME headers yourself after the initial header
     lines\rbrack . \par 

\item[ {\tt -status } {\it nnn\/}
] give full HTTP1 response, instead of just a set of HTTP headers,
     with HTTP status code {\it nnn\/}. \par 

\item[ {\tt -reason } {\it explanation\/}
] specify the reason line for HTTP1 response \lbrack can only be used with
     the {\tt -status } {\it nnn\/} options. \par 

\item[ {\tt -ct } {\it type/subtype\/}
] specify the MIME content-type. \par 

\item[ {\tt -ce } {\it encoding\/}
] specify the content-encoding \lbrack e.g. {\tt x-compress},
     {\tt x-gzip}\rbrack . \par 

\item[ {\tt -dl } {\it language-code\/}
] specify the content-languge code. \par 

\item[ {\tt -length } {\it nnn\/}
] specify the MIME content-length value. \par 

\item[ {\tt -expires} {\it time-spec\/}
] specify the time to live, like {\tt "2 days 12 hours"},
     and {\tt cgiutils} will compute the {\tt Expires:}
     field value \lbrack which is the actual expiry date and time in GMT and
     in format specified by HTTP spec\rbrack . \par 

\item[ {\tt -expires now}
] means immediate expiry.  Often this is exactly what the scripts
     should output. \par 

\item[ {\tt -uri } {\it URI\/}
] specify the {\it URI\/} for the returned document. \par 

\item[ {\tt -extra } {\it xxx: yyy\/}
] specify an extra header which cannot otherwise be specified for
     {\tt cgiutils}. \par 

\end{DL}

{} Make sure that you quote
the option arguments that are more than one word:
\begin{verbatim}
        cgiutils -expires "2 days 12 hours 30 mins"
\end{verbatim}


\section{Examples}

\begin{verbatim}
        cgiutils -status 200 -reason "Virtual doc follows" -expires now
  ==>
        HTTP/1.0 200 Virtual doc follows
        MIME-Version: 1.0
        Server: CERN/2.17beta
        Date: Tuesday, 05-Apr-94 03:43:46 GMT
        Expires: Tuesday, 05-Apr-94 03:43:46 GMT

\end{verbatim}

{} There is an empty line after
the output to mark the end of the MIME header section; if you don't
want this \lbrack you want to output some more headers yourself\rbrack , specify the
{\tt -noel} (NO-Empty-Line) option. \par 

Note also that {\tt cgiutils} gives automatically the
{\tt Server:} header because it is available in the CGI
environment.  The {\tt Date:} field is also automatically
generated unless {\tt -nodate} option is specified. \par 

To get only the expires field don't specify the {\tt -status}
option.  If you don't want the empty line after the header line use
also the {\tt -noel} option:
\begin{verbatim}
        cgiutils -noel -expires "2 days"
  ==>
        Expires: Thursday, 07-Apr-94 03:44:02 GMT
\end{verbatim}


\par 


\chapter{
{}
CERN Server Clickable Image Support}

CERN Server versions 2.14 and newer have a {\tt htimage}
program in the distribution, which is an {\tt /htbin} program
handling clicks on sensitive images.  For versions 2.15 and newer it
is a CGI program (uses the Common
Gateway Interface to communicate with {\tt httpd}).  See demo. \par 


\par 
\section{In This Section...}
\begin{itemize}
\item  {\tt htimage} installation
\item  Writing documents that contain clickable images
\item  Image configuration file
\item  Output of {\tt htimage}
\end{itemize}


\par 
\section{Installing htimage Binary}

After compiling {\tt htimage} you should move the executable
binary to the same directory as your other server scripts are, and
remember to set up an exec rule.  For example if your scripts are in
{\tt /usr/etc/cgi-bin}, you could have an {\tt Exec}
rule like this:
\begin{verbatim}
        Exec  /htbin/*  /usr/etc/cgi-bin/*
\end{verbatim}

Often {\tt htimage} is one of the most often used scripts, and
it would therefore be nice to refer to it with as short a name as
possible, like {\tt /img}, so you could have a {\tt Map}
rule just before the {\tt Exec}:
\begin{verbatim}
        Map   /img/*    /htbin/htimage/*
        Exec  /htbin/*  /usr/etc/cgi-bin/*
\end{verbatim}


\par 
\section{Writing a Document With Clickable Images}

To create a clickable image in your HTML document, you'll need to:
\begin{itemize}
\item  specify {\tt ISMAP} in your inlined image call, and
\item  make that image an anchor, with an {\tt HREF}
to the script handling the request {\tt (htimage)} with
image configuration file name appended to it.
\end{itemize}

Each clickable image has to be described to {\tt htimage} via
an image configuration file.  These files are referred to by the extra path information in the URL
causing the call to {\tt htimage}:
\begin{verbatim}
        <A HREF="/htbin/htimage<B>/image/config/file</B>">
        <IMG SRC="Image.gif" ISMAP></A>
\end{verbatim}

Image configuration file can be:
\begin{itemize}
\item  either a virtual path, that is translated through rule system,
\item  or an absolute path in your filesystem.
\end{itemize}
{\tt htimage} will look for both of these (afterall, it gets
both {\tt PATH\_INFO} and {\tt PATH\_TRANSLATED}
environment variables from {\tt httpd} anyway). \par 

You can even do some very smart mappings in the rule file to allow
very short references to {\tt htimage} and picture
configuration files.  Let's suppose all your image configuration files
are in directory {\tt /usr/etc/images}.  Then you can use the
following two rules in your server's configuration file (by default
{\tt /etc/httpd.conf}):
\begin{verbatim}
        Map   /img/*    /htbin/htimage/usr/etc/images/*
        Exec  /htbin/*  /usr/etc/cgi-bin/*
\end{verbatim}

In this case you can refer to your image mapper very easily; if you
have an image configuration file {\tt Dragons.conf} in
{\tt /usr/etc/images} directory, all you need to say in the
anchor is this:
\begin{verbatim}
        <A HREF="/img/Dragons.conf">
        <IMG SRC="Image.gif" ISMAP></A>
\end{verbatim}


\par 
\section{Image Configuration File}

There are four keywords:
\begin{DL}{allow this much space}

\item[{\tt default} {\em URL\/}]
{\em URL\/} which is used if click is in none of the given shapes.
This should always be set! \par 

\item[{\tt circle} ({\em x\/},{\em y\/}) {\em r\/} {\em URL\/}]
Circle with center point {\em (x,y)\/} and radius {\em r\/}. \par 

\item[{\tt rectangle} ({\em x1\/},{\em y1\/}) ({\em x2\/},{\em y2\/}) {\em URL\/}]
Rectangle with (any) two opposite corners having coordinates {\em (x1,y1)\/}
and {\em (x2,y2)\/}. \par 

\item[{\tt polygon} ({\em x1\/},{\em y1\/}) ({\em x2\/},{\em y2\/}) ...
({\em xn\/},{\em yn\/}) {\em URL\/}]
Polygon having adjacent vertices {\em (xi,yi)\/}.  If the path given
is not closed (first and last coordinate pairs aren't the same)
the first and last coordinate pairs will be connected by {\tt htimage.}
So first point is added also as the last one if necessary. \par 

\end{DL}
These can be abbreviated as {\tt def, circ, rect, poly.} \par 

Shapes are checked in the order they appear in config file, and the
URL corresponding to the first match is returned.  If none match, the
{\tt default} URL is returned. \par 

{\em URL\/}s are
\begin{itemize}
\item  either full URLs (with access method, machine name and path), in
which case server sends a redirection to client,
\item  or a partial URL containing only pathname part of it (always starting
with a slash), in which case server considers that as the original request,
translates it through the rule system, access authorization and serves it
normally (faster than sending redirection).
\end{itemize}


\par 
\section{Output Produced by htimage}

{\tt htimage} prints a single {\tt Location:} field
to its {\tt stdout}, or an error message with preceding
{\tt Content-Type: text/html} so in fact {\tt htimage}
behaves exactly as any other CGI/1.0 program (script), and is not
in any way handled specially by the server.  Therefore, you can
rename {\tt htimage} to whatever you prefer, like we called it
{\tt /img} in the above example. \par 

Server understands this {\tt Location:} field, and either
directly sends that file to the client (non-full URL), or sends a
redirection to client causing it to fetch the document, maybe even
from another machine. \par 

Note that URLs returned by {\tt htimage} may well be other
script requests - there is no reason for being limited to just regular
documents. \par 

\par 


\chapter{{}
Protected CERN Server Setup}

Access can be restricted according to user name, internet address, or
both.  Access control can be tree-level, file level, or both.\par 

\par 

\section{In This Section...}
\begin{itemize}
\item  Password File
\item  Group File
\item  Protect Directive in Configuration File
\item  Protection Setup File
\item  Protecting a Tree of Documents
\item  Protecting Individual Files
\item  Using Two-Level Protection
\item  Embedding the Protection Setup in the
     Configuration File Itself
\item  Access Control List File
\end{itemize}


\par 

\section{Password File}

If user-wise access control is used there has to be a password file
listing all the users and their encrypted passwords.  Password file
can be maintained by {\tt htadm}
program which is a part ot CERN {\tt httpd} distribution. \par 

{} Unix password files are understood
by CERN daemon (but not vice versa).  However, {\bf Unix users are
in no way connected to the WWW access authorization.} \par 

\par 


\section{Group File}

Group file contains declarations of groups containing users and other
groups, with possibly an IP address template. Group declarations as
viewed from top-level look like this:
\begin{verbatim}
        groupname: item, item, item
\end{verbatim}

The list of items is called a group definition.
Each {\tt item} can be a username, an already-defined
groupname, or a comma-separated list of user and group names in
parentheses. Any of these can be followed by an at sign {\tt @}
followed by either a single IP address template, or a comma-separated
list of IP address templates in parentheses. The following are valid
group declarations:
\begin{verbatim}
        authors: john, james
        trusted: authors, jim
        cern_people: @128.141.*.*
        hackers: marca@141.142.*.*, sanders@153.39.*.*,
                 (luotonen, timbl, hallam)@128.141.*.*,
                 cailliau@(128.141.201.162, 128.141.248.119)
        cern_hackers: hackers@128.141.*.*
\end{verbatim}

If an item contains only IP address template part all users from those
addresses are accepted (e.g. {\tt cern\_people} above). Note the
last two declarations: {\tt cern\_hackers} group is made up of
the {\tt hackers} group by restricting it further according to
IP address.\par 

Group definition can be continued to next line after any comma in the
definition. Forward references in group file are illegal (i.e. to use
group name before it is defined).\par 

Group definition syntax is valid not only in group file, but also in
\begin{itemize}
\item {\tt GetMask} in protection setup file, and
\item in last field in ACL entries.
\end{itemize}
\par 

\par 

\section{Server Configuration File}

Typically you protect a tree of documents by {\tt protect} rule
in rule file, and specify authorized persons and IP addresses in the
protection setup file or access control list file:
\begin{verbatim}
        Protect /very/secret/*  /WWW/httpd.setup
\end{verbatim}

If there are Unix file system protections set up so that there is no
world read-permission the daemon naturally has to run as the owner or
the group member of those files.\par 

However, if there are protected trees owned by different people this
doesn't work. In that case {\em the daemon has to run as
{\tt root}, and the user and group ids have to be specified in
the {\tt protect} rule,\/} e.g.:
\begin{verbatim}
        Protect /kevin/secret/*	  /WWW/httpd.setup1  kevin.www
        Protect /marcus/secret/*  /WWW/httpd.setup2  marcus.nogroup
\end{verbatim}


\par 


\section{Protection Setup File}

Each {\tt protect} rule has an associated protection setup
file. It specifies valid authentication schemes, password and group
files, and password server-id:
\begin{verbatim}
        AuthType      Basic
        ServerId      OurCollaboration
        PasswordFile  /WWW/Admin/passwd
        GroupFile     /WWW/Admin/group
\end{verbatim}

Password server id needs not be a real machine name. It's only purpose
is to inform the browser about which password file it is using
(different protection setups on the same machine can use different
password file and that would otherwise confuse pseudo-intelligent
clients trying to figure out which password to send).\par 

{}
Same server-ids on different machines are considered
different by clients (otherwise this would be a security hole).\par 


\par 

\subsection{Protecting Entire Tree As One Entity}

If you want to control access only to entire trees of documents and
don't care to restrict access differently to individual files, it
suffices to give a {\tt GetMask} in setup file (and you
don't need any ACL files):
\begin{verbatim}
        GetMask    group, user, group@address, ...
\end{verbatim}

Group definition has the same syntax as in group file.\par 

\par 

\subsection{Protecting Individual Files Differently}

When each individual file needs to be protected separately you should
use an ACL (access control list) file in the same directory as the
protected files. After that no file in that directory can be accessed
unless there is a specific entry in ACL allowing it.\par 

In this case you don't need the {\tt GetMask} in setup
file.\par 

\par 

\subsection{Restricting Access Even Further}

There may be both {\tt GetMask} {\em and\/} an ACL, in
which case both conditions must be met.  This is typically used so
that {\tt GetMask} defines a general group of people allowed
to access the tree, and ACLs restrict access even further.\par 


\par 
\section{Protection Setup Embedded
in the Configuration File}

Often it is not necessary to have the protection information in a
different file; as a new feature {\tt cern\_httpd} allows
protection setup to be "embedded" inside the configuration file itself.
\par 

Instead of writing the setup in a different file and referring to it
by the filename, you can use the {\tt Protection} directive to
define the protection setup and bind it to a name, and later refer to
this setup via that name. \par 

The previous example could be written into the main configuration as
follows:
\begin{verbatim}
    Protection  <I>PROT-NAME</I>  {
        UserId        marcus
        GroupId       nogroup
        AuthType      Basic
        ServerId      OurCollaboration
        PasswordFile  /WWW/Admin/passwd
        GroupFile     /WWW/Admin/group
        GetMask       group, user, group@address, ...
    }
    Protect  /private/URL/*      <I>PROT-NAME</I>
    Protect  /another/private/*  <I>PROT-NAME</I>
\end{verbatim}


{} Note that since the protection setup is in
the same file as the other configuration directives, it is also
possible to specify the {\tt UserId} and {\tt GroupId}
for the server to run as, without it being a security hole.  With
external protection setup this is made impossible because of security
reasons; that is why there is an extra field after the protection
setup filename specifying the user and group ids in that case:
\begin{verbatim}
        Protect /kevin/secret/*	  /WWW/httpd.setup1  kevin.www
        Protect /marcus/secret/*  /WWW/httpd.setup2  marcus.nogroup
\end{verbatim}


If you need a given protection setup only once there is no need to first
bind it to a name and then refer to it by that name, but rather just
combine the two:
\begin{verbatim}
    Protect  /private/URL/*  {
        UserId        marcus
        GroupId       nogroup
        AuthType      Basic
        ServerId      OurCollaboration
        PasswordFile  /WWW/Admin/passwd
        GroupFile     /WWW/Admin/group
        GetMask       group, user, group@address, ...
    }
\end{verbatim}


{} {\tt httpd} is not
very robust in parsing this particular directive; make sure you have a
space between the URL template and the curly brace, and that the
ending curly brace is alone on that line.  Also, comments are
{\bf not} allowed inside the protection setup definition.
\par 


\par 
\section{Access Control List File}

ACL file is a file named {\tt .www\_acl} in the same directory
as the files the access of which it is controlling. It looks typically
something like this:
\begin{verbatim}
        secret*.html : GET,POST : trusted_people
        minutes*.html: GET,POST : secretaries
        *.html : GET : willy,kenny
\end{verbatim}

It is worth noticing that all the templates are matched agaist (unlike
in rule file where translation of rules stops in {\tt pass} and
{\tt fail.}. So in the previous example all the HTML files are
accessible to {\tt willy} and {\tt kenny,} even those
matching the two previous templates.\par 

The last field is just a list of users and group (possibly at required
IP addresses), and in fact this field is in same syntax as group file.\par 

When {\tt PUT} method will be implemented it can appear in the
middle field separated by a comma from {\tt get}:
\begin{verbatim}
        *.html : GET,PUT : authors
\end{verbatim}


\par 

\par 


\section{{}
Manual Page For htadm}

CERN {\tt httpd} password file can be maintained with
{\tt htadm} program which is a part ot CERN {\tt httpd}
distribution. \par 

\par 
\subsection{Command Line Options and Parameters}

\begin{DL}{allow this much space}

\item[ {\tt htadm -adduser } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password {\tt \lbrack }realname{\tt \rbrack \rbrack \rbrack }\/}
]	adds a user into the password file (fails if there is
	already a user by that name).\par 

\item[ {\tt htadm -deluser } {\it passwordfile {\tt \lbrack }username{\tt \rbrack }\/}
]	deletes a user from the password file (fails if there
	is no user by that name).\par 

\item[ {\tt htadm -passwd } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password{\tt \rbrack \rbrack }\/}
]	changes user's password (fails if there is no such user).\par 

\item[ {\tt htadm -check } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password{\tt \rbrack \rbrack }\/}
]	checks user's password (fails if there is no such user).
	Writes either {\tt Correct} or {\tt Incorrect}
	to standard output.
	Also indicates password correctness by a zero return value. \par 

\item[{\tt htadm -create } {\it passwordfile\/}
]	creates an empty password file. \par 

\end{DL}

If {\tt {\it password\/}} or even {\tt {\it username\/}}
is missing in either of the previous cases they are prompted
interactively.  {\tt {\it passwordfile\/}} must be always
specified. Missing real name is also prompted when adding a new
user.\par 

\par 

{}
Do NOT use {\tt htadm} to add new users to the actual Unix
password file {\tt /etc/passwd,} entries written by
{\tt htadm} are missing some necessary fields to Unix. \par 

{} Passwords should not be longer
than 8 characters (this is a restriction from linemode clients using C
library function {\tt getpass()} to read the password $--$ there
is no other cause for this restriction; the maximum hardcoded password
size is actually much larger, and if you only use GUI or other clients
that are able to read this long passwords, feel free to use them). \par 

{} {\tt htadm}
destroys the password from command line as soon as possible so that it
is very unlikely to see somebody's password by looking at the process
listing on the machine (with {\tt ps}, for example).\par 

\par 


\chapter{{} Proxies}

Proxy is a HTTP server typically running on a firewall machine,
providing with access to the outside world for people inside the
firewall.  {\tt cern\_httpd} can be
configured to run as a proxy.  Furthermore, it is able to perform
caching of documents, resulting in faster response times. \par 

I (Ari Luotonen, CERN) and Kevin Altis from Intel have written a joint
paper about proxies
which will be presented in the
WWW94 Conference. \par 

\par 
\section{In This Section...}
\begin{itemize}
\item  Server setup
\item  Proxy protection
\item  Configuring proxy to use another proxy
\item  Caching
\item  Client setup
\end{itemize}

\par 
\section{Setting Up cern\_httpd To Run as a Proxy}

{\tt cern\_httpd} runs as a proxy if
its configuration file allows URLs starting with corresponding access
method to be passed.  Typical proxy configuration file reads:
\begin{verbatim}
    pass http:*
    pass ftp:*
    pass gopher:*
    pass wais:*
\end{verbatim}


{\bf Note} that {\tt cern\_httpd} is capable of running as a
regular HTTP server at the same time; just add your normal rules after
those ones. \par 


{} The {\tt proxy\_xxx} environment
variables that are used to redirect clients to use a proxy also
affect the proxy server itself.  If this is not your intention make sure that those variables
are not set in {\tt httpd}'s environment. \par 


\par 
\section{Proxy Protection}

{\tt cern\_httpd} 2.17 and newer provide a mechanism to protect
the proxy against unauthorized use (in fact, the machinery behind this
is the same that is used to set up document protection when running as
a regular HTTP server). \par 


\subsection{Enabling and Disabling HTTP Methods}

By default only {\tt HEAD}, {\tt GET} and
{\tt POST} methods are allowed to go through the proxy.  You
can enable more methods using the {\tt Enable} directive in the
configuration file:
\begin{verbatim}
    Enable PUT
    Enable DELETE
\end{verbatim}


The {\tt Disable} directive disables methods:
\begin{verbatim}
    Disable POST
\end{verbatim}


\subsection{Defining Allowed Hosts}

A certain protection setup is defined to the proxy as a single entity
that is given a name.  Later, when protecting certain URLs this name
is used to refer to the protection setup.  (The name can also be the
absolute pathname of the file that defines the protection, if one
wishes to store protection information in a different file.) \par 

Protection is defined as follows:
\begin{verbatim}
    Protection  <I>protname</I>  {
        Mask @(*.cern.ch, *.desy.de)
    }
\end{verbatim}


This defines a protection that allows all request methods from domains
{\tt cern.ch} and {\tt desy.de}, and none from
elsewhere.  This protection can be referred to by {\it protname\/}. \par 

You can also use IP number templates:
\begin{verbatim}
    Protection  <I>protname</I>  {
        Mask  @(128.141.*.*, 131.169.*.*)
    }
\end{verbatim}

{\bf Note} that IP number templates always have four parts
separated by dots. \par 

If allowed methods are different according to domain, e.g.
{\tt GET} should be allowed from both of these domains, but
{\tt POST} and {\tt PUT} only from {\tt cern.ch},
you can use {\tt GetMask}, {\tt PostMask},
{\tt PutMask} and {\tt DeleteMask} directives instead:
\begin{verbatim}
    Protection  <I>protname</I>  {
        GetMask  @(*.cern.ch, *.desy.de)
        PostMask @*.cern.ch
        PutMask  @*.cern.ch
    }
\end{verbatim}

{\bf Note} that parentheses are necessary only if there is
more than one domain name template. \par 


\subsection{Actual Protection}

The {\tt Protect} rule actually associates protection with a
URL.  In case of proxy protection you would typically say:
\begin{verbatim}
    Protect  http:*   <I>protname</I>
    Protect  ftp:*    <I>protname</I>
    Protect  gopher:* <I>protname</I>
    Protect  news:*   <I>protname</I>
    Protect  wais:*   <I>protname</I>
\end{verbatim}

which would restrict all proxy use to the allowed hosts defined
previously in the protection setup {\it protname\/}.
{\bf Note} that {\it protname\/} must be defined before it
is referenced! \par 


\par 
\section{Caching}

{\tt cern\_httpd} running as a proxy can also perform caching of
files retrieved from remote hosts.  See the configuration diretives controlling this
feature. \par 


\par 


\chapter{{}
CERN Server FAQ}

If you have problems, first make sure you're using the newest version.
You'll find that out by peeking into
ftp://info.cern.ch/pub/www/src. \par 

When something goes wrong you should run server in verbose mode (the
{\tt  -v } flag) to see exactly what is the problem.  If you
usually run it from inet daemon start it now standalone to some other
port (with {\tt  -p } {\it port\/} flag) with otherwise the
same parameters as in {\tt /etc/inetd.conf.} \par 


\par 
\section{My Scripts Get Served As Text Files...}

...or are completely unaccessible. \par 

It's important to understand that rules in the configuration file
({\tt Map}, {\tt Pass}, {\tt Exec},
{\tt Fail}, {\tt Protect}, {\tt DefProt} and
{\tt Redirect}) are translated from top to bottom, and the
first matching {\tt Pass}, {\tt Exec} or
{\tt Fail} will {\bf terminate} rule translation.
\par 

So, make sure that your {\tt Exec}
rule is before any general {\tt Map}pings. \par 


\par 
\section{How do I...}
\begin{itemize}
\item  Set up access authorization?
\item  Write server-side scripts?
\item  Get the server to perform searches?
\item  Make clickable images?
\item  Handle forms?
\item  Set up a proxy
\item  Set up proxy caching
\end{itemize}


\par 
\section{Zombies}

There used to be one zombie when running {\tt cern\_httpd}
standalone; this was fixed in version 2.17beta.  If you still see zombies (more
than two that don't go away in a few minutes) it is a bug. \par 

\par 
\section{Inet daemon complains about looping...}

...and terminates WWW service. {\tt :-(} \par 
This is a hard-coded {\tt inetd} limitation on at least
SunOS-4.1.* and NeXT, which limits maximum allowed connections
from a given host to 40 per minute.  This can be exceeded by
scripts doing Web-roaming, or documents having masses of small
inlined images. \par 

There is a fix for at least SunOS {\tt inetd} (100178-08), and
in Solaris this is fixed.  You can also run {\tt httpd}
standalone (preferably with the {\tt -fork} command line
option). \par 

{\bf Most importantly,} you should stop running
{\tt httpd} from {\tt inetd} and rather run it standalone. This
is because running from {\tt inetd} is inefficient. \par 

\par 
\section{Server looks at funny directories and finds nothing}

From version 2.0 until 2.15, you need to have an explicit map to 
file system in your rule file, e.g.:
\begin{verbatim}
        Map    /*    file:/*
\end{verbatim}

but 2.15 doesn't have this limitation anymore. \par 

\par 
\section{But the document says rule file is no longer needed}

True, but it also says you must remember to give your Web directory as
a parameter to {\tt httpd,} e.g.
\begin{verbatim}
        httpd  /home/me/MyGloriousWeb
\end{verbatim}


\par 


\chapter{
{}
CERN httpd 2.15 Release Notes}


There is one single thing that needs to be done when
changing over from {\tt httpd} 2.14 to 2.15:
\begin{verbatim}
        <B>Rename your old /htbin scripts to end in .pp suffix!</B>
\end{verbatim}


\section{General Notes}

\begin{itemize}
\item  	Code tested under Purify $--$ all detected memory leaks and
	bugs fixed.
\item  	Forking code enhanced $--$ no longer crashes when running
	standalone.  Everybody should start running CERN
	httpd standalone instead of from inetd
\item  	Documentation redesigned, but still under construction
\item 	Contains Solaris port, but not VMS
\end{itemize}


\section{CGI/1.0, Common Gateway Interface}
\begin{itemize}
\item  	CGI/1.0 interface fully implemented
\item  	{\bf Old CERN httpd scripts will continue working if you rename
	them to end with .pp suffix.}  Links referencing these scrips do
	NOT need to be changed.  (This feature does not add any overhead to
	CGI/1.0 script calls.)
\item  	New product cgiparse for CGI/1.0 scripts to parse QUERY\_STRING
	env.var and to read CONTENT\_LENGTH characters from stdin
\item 	{\tt htimage} upgraded to CGI/1.0
\item 	The whole server-environment is propagated to CGI script, except
	for variables that are reserved for CGI/1.0.
\item  	Scripts are spawned by doing a fork() and exec() instead of
	system() $--$ more efficient and secure
\end{itemize}

\section{Firewall Gateway Modifications}
\begin{itemize}
\item 	Access authorization works thru firewalls
\item 	So does POST, therefore forms also
\item 	-disable/-enable command line options and Disable/Enable
	configuration directives for dis/enabling HTTP methods. GET,
	HEAD and POST are enabled by default.
\item  	Fix: text/html and text/plain not passed multiply to
	servers when running as gateway
\item  	Fix: */*, image/* etc not expanded by the gateway
\item 	Fix: try local search ONLY when accessing local files
\end{itemize} 

\section{Other New Features}
\begin{itemize}
\item  	When started standalone in non-verbose mode automatically
	disconnects from terminal session and goes background
\item 	User-supported directories enabling URLs starting with
	{\bf /\~username}
\item  	Redirection
\item 	Meta-information files to allow RFC-822-style headers to be
	appended to server response header section
\item  	New, common logfile format, localtime default, {\tt GMT}
	as an option
\item  	Ability to suppress logging for certain hosts/domains
	according to given hostname template or IP number mask,
	like {\tt *.cern.ch} or {\tt 128.141.*.*}
\item  	-setuid option to set server uid to authenticated uid (local)
\item 	Multilanguage support: same URL can be used to retrieve a
	document in different languages
\item 	AddLanguage, AddEncoding and AddType directives to
	configuration file (AddType replaces Suffix)
\item  	Better multiformat algorithm
\item 	HostName directive to configuration file for servers that want to give
	CGI/1.0 scripts a different hostname than the actual. Useful
	if machine has many aliases, or if httpd fails to get the full
	domainname.
\item  	Exec rule obsoliting HTBin directive $--$ now multiple script
	directories possible, with arbitrary mappings
\item  	Get-Mask, Post-Mask and Put-Mask for protection setup
	files. Get-Mask obsolites Mask-Group
\item 	Groups All/Users and Anybody/Anyone/Anonymous automatically
	defined.  All means anybody that has been authenticated, and
	Anybody is just anybody
\item  	Server:
\item  	Last-Modified:
\item  	Content-Length:
\item 	Content-Language:
\item  	Content-Encoding:
\item 	Scripts can output also Uri: and Expires: headers (this will
	eventually be made more general)
\item 	HEAD works, also with stupid scripts that also output the body
\end{itemize}

\section{Enhancements, Fixes}
\begin{itemize}
\item 	The final explicit Map to filesystem in configuration file no
	longer required, because it was causing confusion
\item 	Assume Basic authentication scheme even if not explicitly
	mentioned in setup file
\item 	Get client DNS hostname, for the logfile among other things
\item 	Fail made the default when rules are translated to the end
	without coming accross with a Pass, Exec or Fail rule (this is
	to enhance security, it was too easy to forget the Fail * from
	the end of config file)
\item 	Made config (rule) file understand different ways of writing
	keywords, e.g.: UserDir, userdir, User-Dir, user\_dir,
	UserDirectory and so on
\item  	The eight misplaced server-side access authorization files
	moved away from libwww
\item  	Fix: directory indexing works with a trailing slash
\item  	Fix: HTSimplify() might have behaved unexpectably on some
	systems (called strcpy() with overlapping args)
\end{itemize}

\par 


\chapter{
{}
CERN httpd 2.16beta Release Notes}

\begin{itemize}

\item 	If you are upgrading from 2.15beta, you need to make {\bf no
	changes}.
\item 	If you are upgrading from 2.14, there is one single thing that
	needs to be done:
\begin{verbatim}
        <B>Rename your old /htbin scripts to end in .pp suffix!</B>
\end{verbatim}

\end{itemize}

\section{Firewall Gateway (Proxy) Additions, Fixes}
\begin{itemize}
\item  	{\tt ftp} with binary files work
\item 	{\tt x-compress} and {\tt x-gzip} work correctly
	over proxy
\item  	Firewalling now works through arbitrary number of proxies;
	{\tt http\_proxy, ftp\_proxy, gopher\_proxy} and
	{\tt wais\_proxy} configuration directives cause proxy
	to connect to the outside world through another proxy.
	Environment variables with the same names have same effects, but
	config file is user-friendlier for this.
\item  	Now sends all the headers sent by client.
\item  	Proxy log file now gives byte count.
\item  	Proxy log file now gives correct status code also on error.
\end{itemize} 


\section{Firewall Gateway (Proxy) Caching}
\begin{itemize}
\item  	{\tt CacheRoot} directive specifies cache root
	directory, and turns on proxy caching.
	Cache root directory must be dedicated to {\tt httpd} -
	all files in there are subject to garbage collection.

\item  	Cache size (in megabytes) is specified by
	{\tt CacheSize} directive; cache size should be several
	megabytes, 50-100MB should give good results.
	Cache may, however, temporarily grow a few megabytes bigger
	than specified.  Also, space taken up by directories
	is not calculated in the current version.

\item  	{\tt http, ftp, gopher} with {\tt GET } method
	get cached.

\item  	However, not caching:
	\begin{itemize}
	\item  	HTTP0 responses (you never know if it failed; also
		confused HTTP1 servers sometimes output garbage in
		front of HTTP1 headers).

	\item  	Protected documents (request had
		{\tt Authorization:} field).

	\item  	Queries - they have too often side-effects. (POST
		should be {\bf always} used with forms, and all
		script responses should have {\tt Expires:}
		header when necessary.  Until then, we don't cache
		them.)
	\end{itemize}

\item  	Expiry date is extracted:
	\begin{itemize}
	\item 	From {\tt Expires:} header.

	\item 	If not present {\tt Last-Modified:} is used to
		approximate expires.  If a file hasn't changed in five
		months the chances are it won't change during the next
		week.  On the other hand, if a file has changed
		yesterday, it will probably change again pretty soon.
		I know this is heuristic but until all the servers
		give {\tt Expires:} this works much better than
		not using it, so no flames about it.

	\item  	If {\tt Last-Modified:} not given use the time
		given by {\tt CacheDefaultExpiry} directive,
		default 7 days.
	\end{itemize}

\item  	Format of cache files and directory structure under cache root
	is subject to change if necessary.
	No application should yet rely on any certain cache format.
	Eventually I can see clients accessing cache files directly,
	bypassing proxy server.

\item  	Caching system understands both time formats, also the one
	output by old NCSA httpds.

\item  	Cache files get locked during transfer.  Lock files time out
	if something goes wrong.  Timeout can be set by
	{\tt CacheLockTimeOut} directive (default 20 minutes).
	During the lock is in effect, further requests to the same file
	get retrieved from the remote host.

\item  	Garbage collection directives:
	\begin{itemize}
	\item  	
	\item 	{\tt GcMemoryUsage} to advice gc about how
		radical to be in memory use (more memory =$>$ smarter
		gc).
	\item 	{\tt GcTimeInterval}, how often to do gc.
	\item  	{\tt GcReqInterval}, after how many requests to
		do gc.
	\item  	(gc is also automatically started if cache size limit
		is reached.)
	\item  	{\tt CacheLimit\_1}, size in KB until which
		files are equally valuable despite their size (200K).
	\item  	{\tt CacheLimit\_2}, size in KB after which
		files get discarded because they are too big (4MB).
	\item  	{\tt CacheClean}, remove all files older than
		this (default 21 days).
	\item  	{\tt CacheUnused}, remove all files that have
		not been used in this long time (default 14 days).
	\end{itemize}

\item  	Garbage collector always removes all expired, too long unused,
	and too old files.
\item 	If cache size limit is reached some files need to be
	sacrified; the current algorithm takes into account:
	\begin{itemize}
	\item 	Time remaining to unconditional removal;  if it expires
		tomorrow it might as well be removed today.
	\item 	Time last accessed;  if it hasn't been accessed in 5
		days, it probably won't be accessed anymore before it
		expires.
	\item 	Size;  huge files get removed move easily.
	\item 	Time it took to load it from the remote host;
		files that were time-consuming to transfer have much
		higher value.  This compensates the size factor.
		Load delay is the single most significant value.
	\item 	Time it has already been in cache; ancient files
		get removed more easily than fresh ones.
	\end{itemize}
\end{itemize}


\section{Other New Features}
\begin{itemize}
\item  	Error log file.
\item  	{\tt Referer:} field ends up in error log when a
	request fails.
\item  	{\tt UserId} and {\tt GroupId} to set default
	uid and gid (used instead of nobody and nogroup).
\item  	Timeout for input and output; default time to wait for a
	request is 2 minutes, and to send response 20 minutes.
	Timeout causes a note to error log, and terminates child
	(no more hanging httpds).
	{\bf Note:} the one zombie is normal; don't report to me
	about it, I may do something about it some day, or maybe I
	won't.  Zombie doesn't take up any other system resources
	except the one process table entry.
\item  	Suffixes are no longer case-sensitive by default; this may be
	changed via the {\tt SuffixCaseSense} configuration
	directive.
\item  	Lou Montulli's news and proxy diffs added to the library.
\item  	Most command line options now also available as configuration
	directives:
	\begin{itemize}
	\item  {\tt DirAccess}
	\item  {\tt DirReadme}
	\item  {\tt AccessLog}
	\item  {\tt ErrorLog}
	\item  {\tt LogFormat}
	\item  {\tt LogTime}
	\end{itemize}
\item  	{\tt -vv} command line option for Very Verbose trace
	output.  Outputs also request headers as they came in.
	Otherwise like {\tt -v} flag.
\end{itemize}

\section{Enhancements, Fixes}
\begin{itemize}

\item  	NPH-scripts now work from automatically backgrounded
	standalone server.
\item  	Fixed the many problems with
	{\tt Content-Transfer-Encoding}:
	\begin{itemize}
	\item 	Mosaic uses {\tt Content-Encoding}, although
		spec says {\tt Content-Transfer-Encoding};
		I now output both
	\item  	{\tt Content-Transfer-Encoding} sometimes
		didn't show up although it should have, fixed.
	\item 	{\tt Content-Transfer-Encoding} didn't come
		up correctly with ftp, fixed.
	\end{itemize}

\item  	Strange escaping fixed with directory indexing (legal
	characters got escaped randomly by a gcc-compiled version).
\item  	Timezone bug around midnight with the new logfile format
	fixed.  (New logfile format is not yet default, use
	{\tt -newlog} command line option, or
	{\tt LogFormat} directive in configuration file.)
\item  	Dashes for non-existent status codes and byte counts now show
	up correctly in the log.
\item  	Forking code once again enhanced - fixed a possible
	hanging situation.
\item  	Log time fixed to be the time of incoming request, not the
	time of request served.
\item  	Zombies now correctly waited away on HP (this was in fact
	fixed already in 2.15beta binaries distributed after February
	17th - {\bf note,} that this bug had no effect on any other
	platforms ).
\item  	Directory listings no longer have {\tt Content-Length:}
	(because it was wrong).
\item  	Now understands also the old Accept: syntax, with spaces as
	separators between actual content-type and its parameters.
	This will eventually be taken out.

\item  	{\tt htadm} now uses the same file creation mask as in
	the original password file.
\end{itemize}

\par 


\chapter{
{}
CERN httpd 2.17beta Release Notes}


\section{General New Features}
\begin{itemize}
\item  	{\tt PUT} and {\tt POST} can be configured to be
	handled by external CGI scripts; {\tt PUT-Script} and {\tt POST-Script} directives
\item  	BodyTimeOut for timing out scripts waiting for input that
	never comes from clients
\item  	{\tt IdentityCheck} directive to turn on RFC931 remote
	login name checking
\item  	{\tt REMOTE\_IDENT} for CGI giving remote login name;
	this was the only feature missing to be fully CGI/1.0 compiant
\item  	CGI/1.1 upgrade:
	\begin{itemize}
	\item  all the headers without a special meaning to CGI from CGI
	     scripts get passed to the client
	\item  Status: header to specify the HTTP status code and
	     message for client when not using NPH scripts
	\item  all HTTP request header lines which are not otherwise
	     available to the scripts get passed as HTTP\_XXX\_YYY
	     environment variables
	\end{itemize}
\item  	Understands conditional {\tt GET} request with
	{\tt If-Modified-Since} header
\item 	{\tt kill -HUP } causes {\tt httpd} to re-read
	its configuration file
\item  	{\tt PidFile}
	directive for specifying the file to write the process id
	\lbrack makes it easy to send the {\tt HUP} signal
\item  	{\tt ServerRoot}
	directive to specify a "home directory" for {\tt httpd}
\item  	Directory listings with icons; by default icons are in
	{\tt icons} subdirectory under {\tt ServerRoot}
\item  	The precompiled binaries are distributed in a {\tt tar}
	packet that contains a set of default icons; the easiest way
	to configure the icons is to just set the
	{\tt ServerRoot} to point to the binary distribution
	directory \lbrack its name is {\tt cern\_httpd}\rbrack 
\item  	Welcome directive to
	specify the name of the overview page of the directory;
	default values are {\tt Welcome.html},
	{\tt welcome.html} and, for compatibility with NCSA
	server, {\tt index.html}.  Use of {\tt Welcome}
	directive will override all the defaults.
\item  	{\tt AlwaysWelcome} directive to configure if
	{\tt /directory} and {\tt /directory/}
	are to be taken to mean the same thing, or should only
	{\tt /directory/} be mapped to the overview page and
	{\tt /directory} produce the directory listing.
\item  	/\~user causes an automatic redirection to /\~user/
\item  	Now gives also the {\tt Date:} header.
\item  	{\tt Port} directive to config file specifying the port
	number to listen to.
\end{itemize}

\section{Access Authorization Enhancements / Proxy Protections}
\begin{itemize}
\item 	Now also domain name templates, like *.cern.ch, can be used in
	specifying allowed hosts, not only IP number masks
\item  	{\tt ACLOverRide} directive to allow ACLs to override
	the {\tt Mask}s set in the protection setup \lbrack without
	this feature ACLs cannot allow anything more than what the
	{\tt Mask}s allow, only restrict access further\rbrack .  This
	directive disables {\tt Mask} checking if an ACL file
	is present.
\item  	Since setting up protection seemed to be unnecessarily hard,
	it is now possible to give the protection setup in the main
	configuration file instead of having to use a different file;
	it is still ok to use a different file.
	\begin{itemize}
	\item  {\tt Protection} directive defines a protection
	setup and associates a name with it:
\begin{verbatim}
	Protection  <I>prot-name</I>  {
		AuthType    Basic
		ServerId    Test-Server
		PasswdFile  /where/ever/passwd
		GroupFile   /where/ever/group
		UserId      someuser
		GroupId     somegroup
		GET-Mask    list, of, users, and, groups
		POST-Mask   list, of, users, and, groups
		PUT-Mask    list, of, users, and, groups
	}
\end{verbatim}

	The content between the curly braces is the same as used to go
	the the protection setup file. What's new is the possibility to
	specify the {\tt UserId} and {\tt GroupId} for
	the clild process when serving the request in protected mode.
	This is not possible with external files for security reasons
	\lbrack it is not possible inside the external file, but it
	is not possible if the ids are set when calling	that file; see
	doc for more details\rbrack .

	\item  A single {\tt Mask} directive for cases when
	{\tt GET-Mask}, {\tt POST-Mask} and
	{\tt PUT-Mask} are the same.

	\item  In {\tt Protect} rule the {\it prot-name\/} is
	specified instead of the file name; what's more is that
	{\tt Protect} can now be used to protect also proxied
	URLs:
\begin{verbatim}
		Protect http:*   <I>prot-name</I>
		Protect ftp:*    <I>prot-name</I>
		Protect gopher:* <I>prot-name</I>
\end{verbatim}

	\end{itemize}
\end{itemize}


\section{Enhancements, Fixes}
\begin{itemize}
\item  	Incorporated Ian Dunkin's $<$imd1707@ggr.co.uk$>$ SOCKS
	modifications (thank you, Ian!); read the
	{\tt README-SOCKS} file in the source code distribution
	for more information.
\item  	{\tt SIGPIPE} causes a normal child to exit; proxy
	child will correctly stop writing to client socket but still
	writes to cache file \lbrack previously just kept on writing to the
	socket, too\rbrack 
\item  	401, 402, 403, 404 errors don't go to error log anymore
\item  	error log contains now the host name and request
\item  	no longer sends {\tt Content-Transfer-Encoding}, we
	agreed upon using {\tt Content-Encoding} for
	compression
\item  	fixed funny panic message from format module in verbose mode
	even though everything was ok \lbrack only aesthetic\rbrack 
\item  	now gives again "not authorized" rather than not found if
	trying to access a protected but nonexistant file; this way
	even filenames don't leak
\item  	all time specifications in configuration file have more
	readable forms:
\begin{verbatim}
        1 year
        2 months
        3 weeks 2 days
        5 days 20 hours 30 mins 2 secs
        20:30
        20:30:01
        2 weeks 20:30
\end{verbatim}

\item  	Case-sense bug with {\tt LogTime},
	{\tt LogFormat}, {\tt DirAccess} and
	{\tt DirReadme} fixed; now paramters really are handled
	in a case-insensitive manner.
\end{itemize}

\section{Proxy Additions, Fixes}
\begin{itemize}
\item  	Proxy protections, see above
\item  	Made proxy do smart guesses about the content of an unknown
	file while retrieving from the remote; this will end the
	problems of some files not being transferred to WinMosaic or Lynx.
	{\bf IMPORTANT: Everybody, remove the rule \lbrack if you have
	it\rbrack }:
\begin{verbatim}
        AddType  *.*  text/plain
\end{verbatim}

	because it would disable this smart feature.
\item  	Fixed a bug with unknown binary gopher files being truncated
\item  	Fixed the bug with trailing slashes in ftp directory listings
\item  	Fixed the bug with requests not being URL-encoded when
	forwarding the request
\item  	Fixed a bug with filenames in directory listings not being
	URL-encoded
\item  	Fixed stupid "mail-us" situation in certain situations when
	ftp load fails
\end{itemize} 


\section{Proxy Caching}
\begin{itemize}
\item  	Cache is refreshed using the conditional {\tt GET}
	method \lbrack use of {\tt If-Modified-Since} header\rbrack 
\item 	Standalone cache mode with {\tt CacheNoConnect}
	directive \lbrack causes an error rather than document fetch when
	the document is not in the cache\rbrack 
\item  	Possibility to disable garbage collection altogether
\item  	Possibility to disable expiry checking
\item  	Caching Off to explicitly turn off caching even if there are
	other caching directives specified
\item  	{\tt -gc\_only} command line option to do garbage
	collection as a {\tt cron} job for sites that run
	{\tt httpd} as a proxy from {\tt inetd}.
	However, since {\tt httpd} now re-reads its
	configuration files when it receives a {\tt HUP} signal,
	it makes standalone operation now even more easy, and
	{\tt inetd} should no longer be much more convenient.
\item  	Host names are converted to all-lower-case to avoid doing
	multiple caching for a single site.
\item  	Files expiring immediately never get written to the cache; not
	even part of it.
\item  	By default HTTP-retrieved documents without an
	{\tt Expires:} and {\tt Last-Modified:} field
	never get cached \lbrack because they are usually generated by
	scripts and should never be cached\rbrack ; therefore I strongly
	advice against the use of {\tt CacheDefaultExpiry} for
	HTTP.
\item  	Caching control directives have changed to take a URL template
	as a first argument, and a more readable time format:
\begin{verbatim}
        CacheDefaultExpiry  ftp:*     2 weeks 4 days
        CacheDefaultExpiry  gopher:*  6 days
        CacheUnused         http:*    1 month
        CacheUnused         ftp:*     2 weeks
        CacheUnused         gopher:*  1 week 5 days 2 hours 1 min 30 secs
\end{verbatim}

\item  	Made the expiry date approximation configurable; by default
	documents with {\tt Last-Modified:} but without
	{\tt Expires:} expire after 10\% of the time that they
	have been unmodified.  {\tt CacheLastModifiedfactor}
	can be used to change this value, or turn this feature
	{\tt Off}. Default value is 0.1 \lbrack =10\%\rbrack .
\item  	Understands yet another date format:
\begin{verbatim}
        Thu, 10 Feb 1994 22:23:32 GMT
\end{verbatim}

	This date format is {\bf not} conforming to the
	spec, so use of it is discouraged!  This is only to make the
	proxy more robust.
\item  	{\tt NoCaching} directive to prevent certain URLs from
	being cached at all.
\item  	Time margin to get rid of problems with machine clocks having
	inaccurate times and confusing caching.
\item  	{\tt GcDailyGc} to specify a daily garbage collection
	time, by default 3:00. \lbrack Can be turned {\tt Off}, too.\rbrack 
\item  	Now possible to disable {\tt GcReqInterval} and
	{\tt GcTimeInterval} \lbrack by default disabled\rbrack .
\item  	Expired cache lock files get removed also during gc.
\item  	{\tt CacheAccessLog} to specify a different log file
	for cache accesses; also possible to make a separate log for
	each remote host.
\end{itemize}


\section{cgiutils}

A new product {\tt cgiutils} for producing HTTP1 replies from
CGI scripts, and for easily generating the {\tt Expires:}
header given the time to live, e.g. "2 weeks 4 hours 30 mins". \par 


\par 


\chapter{
{}
CERN httpd 2.18beta Release Notes}


\section{New Features}
\begin{itemize}
\item 	Long FTP directory listing with last modification dates and sizes
\end{itemize}

\section{Fixes}
\begin{itemize}
\item 	Fixed a bad bug with {\tt Port} directive $--$ server
	didn't fork but rather the parent process served which caused
	the service to eventually hang (this is the main reason for
	this release).
\item 	{\tt CLIENT\_CONTROL} removed from SOCKS mods since
	{\tt httpd} has now native proxy protection support.
\item 	No longer fails to sometimes create {\tt .gc\_info} file.
\end{itemize}

\par 


\chapter{{}
CERN httpd 3.0 PreRelease Notes}

\section{3.0 Prerelease 3}
\begin{itemize}
\item  No longer strips hyphens from content-types and content-encodings
     that are given in the configuration file (broken in pre1).
\item  GMT-to-localtime transformation works now on all platforms in
     caching (was broken on others than Sun).
\item  Binary-FTP works again (broken pre2).
\item  Unescaping bug fixed in news module (caused many articles to
     fail to be retrieved).
\item  News module now gives appropriate error reponses for unavailable
     articles and non-existent news groups.
\item  FTP and HTTP modules now give better error responses.
\item  Fixed the cache access log to show the correct content-lengths.
\end{itemize}

\section{3.0 Prerelease 2}
\begin{itemize}
\item  Respects UserId and GroupId directives again.
\item  FTP module no longer prints messages to stderr in non-verbose mode.
\item  \~username form understood with ServerRoot, Search, PutScript,
     PostScript, DeleteScript, AccessLog, ErrorLog, CacheAccessLog
     directives.
\item  Opens cache access log only if caching is turned on.
\item  Binary distribution now contains a template configuration file
     that has all the configuration directives understood by httpd
     (thanks to Sean Gonzalez for it!).
\end{itemize}

\section{3.0 Prerelease 1}
\begin{itemize}

\item  If-Modified-Since GET request now works correctly with proxy
     (client can do conditional GET/proxy can do conditional GET plus
     all the combinations of these).

\item  {\tt Pragma: no-cache } supported; by sending this header
     to the proxy the client will force it to refresh its cache from
     remote server.  Pragma headers are also forwarded to the remote
     server.

\item  Server now resets its state correctly when it receives the HUP
     signal (directory listing icons used to stop working).

\item  {\tt -restart} option - {\tt httpd} will find out
     the actual server process number and send s HUP signal to it to
     make it reload its configuration files; note that
     {\tt httpd} must still have the same configuration file
     command line parameters ({\tt -r } options) as the actual
     server (so it finds out the ServerRoot and PidFile).

\item  Now makes appropriate entry to error log when restarting.

\item  Made common logfile format default, the old format can still be
     used with the {\tt LogFormat} directive:
\begin{verbatim}
	LogFormat old
\end{verbatim}

\item  Multiple wild-card (asterisk) matching in configuration file
     works; it is a bit different from typical regular expression
     matching in that the wildcard matches the {\em shortest\/}
     possible amount of characters instead of the longest matching
     string; this is the best choise in most of the cases. Consider:
\begin{verbatim}
	Pass  http://*/*  /mirror/*/http/*
\end{verbatim}

     Clearly the first asterisk should rather match only the hostname,
     and {\bf not} the entire path except the filename.

\item  Rules can now have asterisks and whitespace in them: precede them
     with a backslah; as a result also the backslash itself has to be
     escaped with another backslash.

\item  The tilde character after a slash has to be explicitly matched:
\begin{verbatim}
	Map	/*	/foo/bar/*
\end{verbatim}

     does {\em not\/} match user-supported directories, but:
\begin{verbatim}
	Map	/~*	/Webs/users/*
\end{verbatim}

     does match them.

\item  Fixed the problem that user-supported directories could not be
     mapped or {\tt Protect}'ed.

\item  Hostname matching made case-insensitive in access control/caching

\item  Added suffixes {\tt .htm} and {\tt .htmls} to the
     default set of known suffixes.

\item  Fixed some of the mysterious caching problems (all that were
     reported to me and that I could reproduce).

\item  Made it possible to specify the various byte/kilo/mega sizes in
     cache configuration with letters after the number (so it's no
     longer necessary to remember if the default is kilobytes or
     megabytes):
\begin{verbatim}
	CacheSize	150 M
	CacheLimit_1	100 K
	CacheLimit_2	2 M
\end{verbatim}

     The numbers still have to be cardinals.

\item  Content-Length given for {\em all\/} documents, including
     (non-nph-)script responses, generated directory listings, error
     responses, all the documents retrieved over another protocol by
     the proxy (FTP, Gopher, ...), including HTTP responses from
     servers that didn't give it originally.

\item  {\tt MaxContentLengthBuffer} directive to specify the
     maximum bytecount for the proxy to buffer in order to find out
     the content-length for the client - content-length is
     {\em always\/} calculated for the logs, but the user migth
     interrupt the connection if nothing seems to be happening, even
     though it is the proxy that is just buffering the entire file in
     order to find out the content-length before actually sending it
     to the client.

\item  Caching module now checks that it receives the correct
     content-length; if not it discards the cached document.  This
     rules out the possibility to cache a truncated document from a
     timed out connection in 99.99\% of the cases (0.01\% comes from the
     fact that Plexus sends a timeout error message concatenated to
     the document and if so should happen that this produces exactly
     the correct content-length then there is nothing that can be done
     about it; in practice this never happens).

\item  Made {\tt HEAD} work always, even on proxy with other
     protocols (FTP, Gopher...).

\item  PASV (Passive mode) in FTP now supported.  It is no longer
     necessary to allow incoming connections above 1024 on the
     firewall host just to make FTP work.  If PASV fails
     {\tt httpd} will retry PORT.

\item  Welcome messages from FTP servers get shown on top of the
     directory listings.

\item  Fixed bug with old FTP files fixed getting wrong date in the listing.

\item  Gopher listings now have icons.

\item  Proxy now reports unknown host errors appropriately.

\item  Fixed encoding-decoding problems with directory listings.

\item  Added {\tt ScriptTimeOut} - scripts that do not finish in
     this amount of time will be killed by {\tt httpd}. Default
     value is 5 minutes.

\item  A /\~username URL with an invalid username no longer causes an
     infinite redirection loop.

\item  The two files missing in FTP listings are no longer missing (they
     weren't in 2.18beta, either).

\item  Fixed a possible error condition that might cause the server to
     stop responding, or even die.

\item  Server now resets its UserId and GroupId even when in gc-only
     mode (this solves problems with {\tt .cache\_info} files
     sometimes being unwritable to actual caching processes).

\item  CacheAccessLog is now opened during startup while running as root
     to avoid opening problems.  There is no longer logging to
     individual files according to remote hosts - all cache accesses
     are logged to this single file.

\item  {\tt CacheOnly} directive for specifying a set of URLs
     that should be cached (for cases when there are only a few sites
     that should be cached).

\item  Added {\tt DELETE-Script} directive for specifying the CGI
     script to handle {\tt DELETE} method.

\item  {\tt NoProxy} directive to allow the proxy to do direct
     access to some servers instead of connecting to another proxy
     server (contains a list of domain names).  This works exactly
     like the {\tt no\_proxy} environment variable on clients.
     (Thanks to Rainer Klute for the patch!)  This is only necessary
     when running multiple proxy servers that connect to each other.

\item  Fixed a bug that sometimes caused time directives to be parsed
     incorrectly (e.g. {\tt CacheDefaultExpiry}).

\item  Multilanguage addition to allow server to understand e.g. that
     British English is also English, and that the US citizens do
     understand it (thanks to Toshihiro Takada for the patch!).

\item  Removed:
	\begin{itemize}
	\item  {\tt GcReqInterval} and
	     {\tt GcTimeInterval} - not very good criteria to
	     start doing garbage collection ({\tt GcDailyGc} is
	     better, giving the actual time to lauch gc)
	\item  cache access logging to individual logfiles according to
	     remote host (wasted resources - a separate program is better
	     for collecting this information from a single log file).
	\item  {\tt -a} and {\tt -R} options (never used).
	\item  {\tt BodyTimeOut} replaced by {\tt ScriptTimeOut}
	\item  {\tt include}s from Makefiles (not supported by
	     all the {\tt make}s).
	\item  {\tt \#elif} preprocessor directive removed (wasn't
	     supported by all the HP preprocessors)
	\end{itemize}
\end{itemize}

\par 


\end{document}