\author{Generated from the Hypertext}\title{CERN Server User Guide} \maketitle \cleardoublepage \pagenumbering{roman} \setcounter{page}{1} \tableofcontents \cleardoublepage \pagenumbering{arabic} \setcounter{page}{1}
\chapter{{}
CERN httpd 3.0
Guide for Prereleases}
CERN WWW Server
\lbrack {\tt httpd}, HyperText Transfer Protocol Daemon\rbrack is a generic,
full featured server for serving files using the HTTP protocol.
This is a TCP/IP based protocol running by convention on port 80. \par
Files can be real or synthesized, produced by scripts generating
virtual documents. It handle clickable images, fill-out forms, and
searches etc. \par
CERN {\tt httpd} can also be run as a proxy server to allow people behind firewalls
to use the Web as if the firewall was not present. A powerful
feature is caching performed by the
proxy, which makes {\tt cern\_httpd} as proxy attract even
those not inside a firewall. \par
\begin{itemize}
\item This documentation is also available in PostScript.
\item Documentation for older versions is still available: \lbrack 2.14 or older\rbrack \lbrack 2.15\rbrack \lbrack 2.16\rbrack \lbrack 2.17 \& 2.18\rbrack .
\item If you upgrade see also release notes for \lbrack 2.15\rbrack \lbrack 2.16\rbrack \lbrack 2.17\rbrack \lbrack 2.18\rbrack $<$A
HREF="ReleaseNotes\_3.0pre.html"$>$\lbrack 3.0pre1-3\rbrack .
\item {\bf Current VMS Version is 2.16beta. See
distribution.} See also Foteos Macrides' fixes. \par
\end{itemize}
\par
\section{In This Guide...}
\begin{DL}{allow this much space}
\item[Installation
] The steps necessary to install CERN server.
\item[Administration
] How to set up document protection, index search, clickable
images, server-side scripts, ...
\end{DL}
\par
\section{About documents generated from hypertext}Paper manuals generated from hypertext
are made for convenience, for example
for reading when one has no computer
to turn to. We have tried to make
the hypertext into fairly conventional
paper documents, but they may seem
a little strange in some ways.\par
All the links have been removed.
Therefore, it is worth looking at
the table of contents to see what
there is in the manual. Something
which is not explained in place may
be explained in detail elsewhere.\par
We have tried to keep related matter
together, but sometimes necessarily
you might have to check the table
of contents to find it.\par
Please remember that these are for
the most part "living documents".
That is, they are constantly changing
to reflect current knowledge. If
you see a statement such as "Product
xxx does not support this feature",
remember that it was the case when
the document was generated, and may
not be the same now. So if in doubt,
check the online version. Of course,
the living document may be out of
date too, in which case it is helpful
to mail its author.
\chapter{{}
Installing CERN Server}
{\bf VMS note:}
There are special instructions if you are
installing under VMS. \par
\par
\section{Getting the Program}
CERN server distribution is available from {\tt info.cern.ch}
anonymous ftp account.
Often you don't need to compile the server yourself, precompiled
binaries are available for many Unix platforms.
If there is no precompiled version for your platform, of if it doesn't
work (e.g. the name resolution doesn't work), you should get the
source code and compile it yourself.
\begin{itemize}
\item Precompiled versions can be found under directory
{\tt ftp://info.cern.ch/pub/www/bin}
(in the subdirectory corresponding your machine architecture). \par
\item Source code
{\tt ftp://info.cern.ch/pub/www/src/cern\_httpd.tar.Z}. \par
Compilation:
\begin{itemize}
\item Uncompress and untar the distribution tar file:
\begin{verbatim}
uncompress cern_httpd.tar.Z
tar xvf cern_httpd.tar
\end{verbatim}
\item Go to newly-created {\tt WWW} directory, and give
command {\tt ./BUILD}:
\begin{verbatim}
cd WWW
./BUILD
\end{verbatim}
\item Executable {\tt httpd} appears in directory
{\tt .../WWW/Daemon/sun4} (if you have a Sun4
machine), or in another subdirectory corresponding to
your machine architecture. The utility programs go to
the same directory
({\tt htadm},
{\tt htimage},
{\tt cgiparse} and
{\tt cgiutils}).
\end{itemize}
\end{itemize}
\par
\section{Configuration File}
\begin{itemize}
\item {\tt httpd} requires a configuration file, the
default configuration file is {\tt /etc/httpd.conf}.
If this doesn't suit you, you can specify another location to
it using the {\tt -r } option:
\begin{verbatim}
httpd -r /other/place/httpd.conf
\end{verbatim}
\item Sample configuration
files are available from
\begin{itemize}
\item directory {\tt cern\_httpd/config} inside the
binary distribution, or
\item under {\tt WWW/server\_root} inside the source code
distribution.
\item If this is missing you can get them from
{\tt ftp://info.cern.ch/pub/www/src/server\_root.tar.Z}
\end{itemize}
\end{itemize}
If you have all your documents in a single directory tree, say
{\tt /Public/Web}, the easiest way to make them available to
the world is to specify the following rule in your configuration file:
\begin{verbatim}
Pass /* /Public/Web/*
\end{verbatim}
This maps all the requests under the directory
{\tt /Public/Web} and accepts them. \par
The default welcome document (what you get with URL of form
{\tt http://your.host/}) is now {\tt Welcome.html} in
the directory {\tt /Public/Web}. \par
\par
\section{First Trying It Out In Verbose Mode}
Often it is easy to make mistakes in the configuration file that makes
configuring {\tt httpd} feel tedious - this doesn't have to be
so. In the beginning start {\tt httpd} by hand in verbose mode
to listen to some port, and look what happens when you make a request
to that port with your browser. \par
Typically test servers are run on a non-priviledged port above 1024
(you don't have to be {\tt root} to bind to them), often 8001,
8080, or such. Official HTTP port is 80. \par
The server port is defined in the configuration file with the {\tt Port} directive,
but you can override it with the {\tt -p } command line option
while testing; e.g.
\begin{verbatim}
httpd -v -r /home/you/httpd.conf -p 8080
\end{verbatim}
This will start {\tt httpd} in verbose mode, use configuration
file {\tt httpd.conf} in your home directory, and accept
connections to port 8080. \par
You can now try to request a document form your server using a URL of
form:
\begin{verbatim}
http://your.host:8080/document.html
\end{verbatim}
where {\tt document.html} is relative to the directory that you
have exported in your configuration file. \par
If you get an error message back see the verbose output to find out
what is going wrong - it is usually self-explanatory. \par
And remember, you should always feel free to ask advice from
{\bf httpd@info.cern.ch}. \par
\par
\section{The Actual Installation of httpd}
In Unix you can run the server either as stand-alone, or from
Internet Daemon {\tt (inetd)}.
A stand-alone server is typically started once at system-boot time.
It waits for incoming connections, and forks itself to serve a
request. {\bf This is much faster} than letting
{\tt inetd} spawn {\tt httpd} every time a request
comes. {\bf We therefore recommend that you run CERN httpd in
stand-alone mode.} \par
\subsection{Stand-alone Installation}
A stand-alone server is started from the bootstrap
command file (for example {\tt /etc/rc.local)} so that it runs
continuously like the {\tt sendmail} daemon, for example. \par
This method has the advantage over using the {\tt inetd} that
the response time is reduced. \par
Add a line starting {\tt httpd} to your system startup file
(usually {\tt /etc/rc.local} or {\tt /etc/rc}). If you
have the configuration file in the default place,
{\tt /etc/httpd.conf}, and if it specifies the port to listen
to via the {\tt Port} directive, you don't need any command
line options:
\begin{verbatim}
/usr/etc/httpd &
\end{verbatim}
{\tt httpd} will automatically go background so there is really no
need for an ampersand in the end (as long as your configuration file
{\tt /etc/httpd.conf} really exists). \par
Or a little more safely in case httpd is removed:
\begin{verbatim}
if [ -f /usr/etc/httpd ]; then
(/usr/etc/httpd && (echo -n ' httpd') ) & >/dev/console
fi
\end{verbatim}
Naturally you can use any of the
command line options, if necessary. \par
\par
\section{Registering Your Server}
Once you have your {\tt httpd} up and running, and you have
documents to show the word, announce
your server, so that others can find it. \par
\par
\section{If It Doesn't Work...}
...first run it in verbose mode with the {\tt -v } option and
try to figure out what goes wrong. See also the debugging chart and the FAQ. If you can't figure out what's going
wrong, feel free to send mail to {\bf httpd@info.cern.ch} \par
\par
\section{
{}
Installing httpd Under inetd}
This is how to to set up {\tt inetd} to run {\tt httpd}
whenever a request comes in. (These steps are the same for any daemon
under unix: you will probably find a similar thing has been done for
the FTP daemon, {\tt ftpd,} for example.) \par
\par
\subsection{Step 1: Install httpd Binary}
Copy {\tt httpd} into a suitable directory such as
{\tt /usr/etc.} Make it owned by {\tt root}, and make
it writable only to {\tt root,} for example by saying:
\begin{verbatim}
chmod 755 httpd
\end{verbatim}
\par
\subsection{Step 2: Add http Service to /etc/services}
Put "http" in the {\tt /etc/services} file, or use the name of
a specific service of your own if you want to use a special port
number. Standard port number for HTTP is 80.
\begin{verbatim}
http 80/tcp # WWW server
\end{verbatim}
{\bf Exceptions:}
\begin{itemize}
\item On a NeXT, see using the NetInfomanager
\item On any machine running NIS (yellow pages), see specicial instructions.
\end{itemize}
\par
\subsection{Step 3: Add a Line to /etc/inetd.conf}
Put a line in the internet daemon configuration file,
{\tt /etc/inetd.conf.}
\begin{verbatim}
http stream tcp nowait root /usr/etc/httpd httpd
\end{verbatim}
First word is the same as in {\tt /etc/services} file. \par
If you want to pass command line options or
parameters to {\tt httpd,} they would listed be in the end
of line, for example to set the rule file to something else than the
default {\tt /etc/httpd.conf:}
\begin{verbatim}
http stream tcp nowait root /usr/etc/httpd httpd -r /my/own/rules
\end{verbatim}
{\bf Note:} For {\tt httpd} version 2.15 and later
we recommend that it is run as user {\tt root.}
Running {\tt httpd} as {\tt root} is safe, since it
automatically resets its user-id to {\tt nobody.} However, if
you decide to use access authorization features, and you need to serve
protected files, {\tt httpd} will have to be able to set its
user-id to some other uid as well. In any case, {\tt httpd}
always sets its user-id to something other than {\tt root}
before serving the file to the client. \par
{\bf Note:} {\tt /etc/inetd.conf} syntax varies from
system to system, for example all systems don't have the field
specifying the user name, in which case the default is
{\tt root.} If in doubt, sopy the format of other lines in
your existing {\tt inetd.conf.} \par
{\bf Note:} There seems to be a limit of 4 arguments passed
across by {\tt inetd,} at least on the NeXT. \par
\par
\subsection{Step 4: Send HUP Signal to inetd }
When you have updated {\tt inetd.conf,}
find out the process number of {\tt inetd,} and send a "HUP"
signal to it. \par
For example on BSD unix do this:
\begin{verbatim}
> ps -aux | grep inetd | grep -v grep
root 85 0.0 0.9 1.24M 304K ? S 0:01 /usr/etc/inetd
> kill -HUP 85
\end{verbatim}
For system V, use {\tt ps -el} instead of {\tt ps -aux}.
Be aware that on some systems your local file /etc/services may not be
consulted by your system (see notes on debugging). \par
\par
\subsection{Test It!}
\par
\subsection{{} Using NIS (Yellow Pages)}
If your machine is running Sun's "Network Information Service",
originally know as "yellow pages", read this.\par
You must:
\begin{itemize}
\item First make an addition to the {\tt /etc/services}
file just as for a normal unix system.
\item Then, change directory to {\tt /var/yp}
and run {\tt make}.
\end{itemize}
This will load the {\tt /etc/services} file info the NIS
information system.\par
Some people have found that they needed to reboot he system afterward
for the change to take effect. \par
\par
\subsection{{} Adding a Service on the NeXT}
The NeXT uses the the "netinfo" database instead of the
{\tt /etc/services} file. This is managed with the
{\tt /NextAdmin/NetInforManager} application. Here's how to add
the service {\tt http}:
\begin{itemize}
\item Start the NetInfomanager by double-clicking on its icon. \par
\item If you are operating in a cluster, open either your local
domain {\tt (/hostname)} or if you have authority, the
whole cluster domain {\tt (/)}. If you're not in a
cluster, just use the domain you are presented with. \par
\item Select {\tt "services"} from the browser tree. \par
\item Select {\tt "ftp"} from the list of services. \par
\item Select {\tt "dupliacte"} from the edit menu. \par
\item Select {\tt "copy of ftp"} and double-click on its icon
to get the property editor. \par
\item Click on {\tt "name"} and then on the value {\tt "copy
of ftp"}. Change this to {\tt "http"} by typing
"http" in the window at the botton, and hitting return.
\item Click on {\tt "port"}, and then on the value
{\tt 21}. Change it to {\tt 80}. \par
\item Use {\tt "Directory:Save"} menu
{\tt (Command/s)} to save the result. You will have to
give a root password or netinfo manager password. \par
\end{itemize}
\par
\section{{} Priviliged ports}
The TCP/IP port numbers below 1024 are special in that normal users
are not allowed to run servers on them. This is a security feaure, in
that if you connect to a service on one of these ports you can be
fairly sure that you have the real thing, and not a fake which some
hacker has put up for you. \par
The normal port number for W3 servers is port 80. This number has
been assigned to WWW by the Internet Assigned Numbers Authority, IANA.
\par
When you run a server as a test from a non-priviliged account, you
will normally test it on other ports, such as 2784, 5000, 8001 or
8080. \par
\par
\subsection{Under Unix}
The Internet Daemon {\tt inetd} (running as root) can listen
for incomming conections on port 80 and pass them down to a process
with a safer uid for the server itself. However, the
{\tt httpd} versions 2.14 and later can be safely run as
{\tt root} since they automatically change their user-id to
{\tt nobody} or some other user-id depending on server setup.
\par
\par
\subsection{Under VMS }
Under UCX, the process running as a server needs BYPASS privilege to
listen to ports below 1024. This might mean you have to install the
server. With other TCP/IP packages, privilege of some sort is
similarly required. \par
\par
\section{{} Debugging httpd}
Suppose you think you have installed
{\tt httpd} but it doesn't work.
Here we assume you have
used port 80. If you have a situation
not handled by this problem-solving
guide, please mail {\tt httpd@info.cern.ch}. \par
\par
Type
\begin{verbatim}
www http://myhost.domain/
\end{verbatim}
What happens?
\par
\subsection{Connection Refused}
The browser tries to connect to the daemon but gets this status in the
trace. \par
This means that nobody was listening on that port number. Check the
port numbers match between server and client. Make sure you specify
the port number explicitly in the document address for
{\tt www}.\par
If you are running the daemon standalone (as you should be), check
that it is actually running by taking a list of processes, and that it
is listening to the correct port (specified with {\tt -p }
{\it port\/} option), or try running it from the terminal with
{\tt -v} option as well. The trace for the server should say
{\tt "socket, bind and listen all ok".} If it does, and you
still get "{\tt connection refused}", then you must be talking
to the wrong host (or, conceivably, different ethernet adapters on the
same host).\par
If you are running with the inet daemon, then check both the services
file {\tt (/etc/services)} or database (yellow pages, netinfo)
if your system uses it, and the {\tt /etc/inetd.conf} file.
Check the service name matches between these two (e.g.
{\tt http}).\par
Did you remember to kill -HUP the {\tt inetd} when you changed
the {\tt inetd.conf} file? \par
{\em Be aware that on some systems your local file
{\tt /etc/services} will not be consulted\/} E.g. when
{\tt ypbind} is running on Suns, then you should type
\begin{verbatim}
ypwhich -m services
\end{verbatim}
and ask the administrator of the machine named to change its own
{\tt /etc/services}. \par
Try running the deamon from a shell
window to see better what happens. \par
\par
\subsection{Cannot Connect To Information Server}
The usual cause of this is that the server is not running, or it's
running on a different port. \par
There is more information you can get. Use the "verbose" option on
the LineMode browser to find out what went wrong:
\begin{verbatim}
www -v http://myhost.domain:80/
\end{verbatim}
\par
What do you get? A load of trace messages. There are several cases.
\begin{itemize}
\item The browser can't look up the name of the host. If it can, it
will display "Parsed address as" message. If not, try fixing
your name server or {\tt /etc/hosts} file, or quoting
the IP number of the host in decimal notation (like
128.141.77.45) instead. \par
\item The browser can get to the host but gets
{\tt Connection refused} status back. \par
\item Your browser gets an error number but prints "error message
not translated". This is because when it was compiled on your
platform it didn't know what form the error message table
took. Try the same thing form a unix platform for example. \par
\item You get some network error like "network unreachable".
Depending on whether the IP network is your responsibility or
not, and your attitude to life, either fix it, try again in an
hour's time, or complain to someone. \par
\end{itemize}
\par
\subsection{Unable To Access Document}
Typical cause of this is that the configuration file is incorrect, or
files are not readable by the user-id under which the server runs.
When you are running the server as {\tt root,} it will
automatically switch it to {\tt nobody} just before serving the
document. This can be changed with the {\tt UserId}
configuration directive. \par
\par
\subsection{An Empty Document Is Displayed}
The document sent back is empty, but there is no error message.\par
The {\tt inetd} has started a process to run your server but it
immediately failed. Possibilities include:
\begin{itemize}
\item When running from {\tt inetd},
the daemon may not be in the file specified, or may not be
executable by the specified user (or, if a user id is not
specified in your variety of {\tt inetd.conf},
{\tt root}). \par
\item For some reason server crashes when it's trying to serve
the request. If you can, try to tract down when this happens,
and send mail to {\tt httpd@info.cern.ch}.
Try running the daemon from a terminal
window to see what happens. \par
\item Script fails to produce any
result, which may be due to the fact that there is no empty
line after the header section output by the script, causing
server to read the entire generated document as the header
section. \par
\end{itemize}
\par
\subsection{Document Address Invalid Or Access Not Authorized...}
...or some similar kind of error message.
This means either:
\begin{itemize}
\item You have been passed a bad document address. If you are following
a link, check with the author of the document which contained the
link.
\item The document has been moved. Check with the server administrator.
You should be able to find out who runs the server by going to the
welcome page (type "g /" with the line mode browser) and seeing a link
to information about the maintainers.
\end{itemize}
If you are the server administrator, and you can't understand why the
daemon refuses to deliver the file,
\begin{itemize}
\item Check the configuration file (rule
file, by default {\tt /etc/httpd.conf}) if you have one. Think
out way the document name will be mapped successively by each line,
and what the result will be. \par
\item Run the daemon in debug mode from a
terminal session to get trace information. \par
\end{itemize}
\par
\subsection{Bad Output}
A document is displayed, but not the one you wanted. \par
These are some ideas:
\begin{itemize}
\item Try running the server from the terminal. \par
\item Check the HTML source the daemon produces with
\begin{verbatim}
www -source http://my.host.domain/
\end{verbatim}
\item Try telnetting to httpd and
simulating the client:
\begin{verbatim}
> telnet my.host.domain 80
Connected to my.host.domain on port 80
Escape is ^[
GET /document/name
\end{verbatim}
\end{itemize}
\par
\par
\subsection{{} Running Under Shell}
You don't have to run the daemon under the {\tt inetd} if it
doesn't work (and we recommend running it standalone anyway). You can
run it from a shell session.\par
Run {\tt httpd} from your terminal turned on, with a different
port number like 8080:
\begin{verbatim}
httpd -p 8080
\end{verbatim}
{\bf Note:} You must be {\tt root} (under VMS, have
some privilege) to run with a port number below 1024. If you select a
port above 1024, then you can run as a normal user. This way, anyone
can publish files on the net. Howeever, it isn't very reliable, as
your server will not automatically come back up if the machine is
rebooted. In the long term it is best to install it to be started from
the system startup file {\tt /etc/rc} or
{\tt /etc/rc.local}. \par
You may not be able to use a port number which has been used by a
daemon process recently (port may still be bound), so you may have to
switch port number if you {\char94}C and restart {\tt httpd}. When it
is running like this, you can also read the debugging messages (when
running with {\tt -v} option), and use a debugger on it if
necessary. (See also: telnetting to the
server). \par
\par
\subsubsection{Debugging using Trace}
If you can't understand why a server refuses to give back a document,
then run with the {\tt -v} option to turn on debugging
messages. Use {\tt -v} as the very first command line option
(this way debugging is turned on right away). You will see the daemon
setting up the rules for translating requests into local URLs, and you
will see its attept to access the file (assuming you map requests onto
files).
\begin{verbatim}
httpd -v -p 8080
\end{verbatim}
Try to access the document from a client using another terminal
window. Look at the debugging output. It will probably explain what
is happening.
If you still can't figure out the problem, mail your local guru help
desk or if desperate {\tt httpd@info.cern.ch}
{\bf enclosing} a copy of debugging output. \par
\par
\subsubsection{Even simpler}
For testing a daemon very simply,
without using a client, you can make
the terminal be the client. With
{\tt httpd} try just running
it with the terminal and typing {\tt GET} {\it /document/url\/}
into its input:
\begin{verbatim}
httpd -v
GET /document/url
\end{verbatim}
\par
\subsection{{} Telnetting to httpd}
Most implementations of telnet allow you to specify a port number.
Under unix this is often just a second parameter, under VMS a
{\tt /PORT} option. \par
The HTTP
protocol is a telnet protocol, so you can simulate it just by
typing things in. This will help you to see exactly what a sending
back, and it will check you that it really is the server not the
browser which has a problem. \par
Here is a simple example (keybord input is in {\bf boldface}):
\begin{verbatim} > telnet myhost.domain 80
Connected to myhost.domain on port 80
Escape is ^[
GET /document/url
...document or error message...
\end{verbatim}
\par
\chapter{{} Command Line of CERN httpd}
The command line syntax for {\tt httpd} allows a number of
options and an optional directory argument:
\begin{verbatim}
httpd [-opt -opt -opt ...] [directory]
\end{verbatim}
The directory argument, if present, indicates the directory to be
exported. If not present, either a rule file is be used, to export
combinations of directories, or else the default is to export the
{\tt /Public} directory tree. \par
\par
\section{Options}
\begin{DL}{allow this much space}
\item[ {\tt -r } {\it rulefile\/}
] Use {\it rulefile\/} as configuration file. {\bf This is the
only necessary command line option} if you don't have the
default configuration file, {\tt /etc/httpd.conf}. All the
other options can be given as directives in the configuration file.
\item[ {\tt -p } {\it port\/}
] Listen to port {\it port\/}. Without this argument
{\tt httpd} assumes that it has been run by
{\tt inetd}, and uses
{\tt stdin} and {\tt stdout} as its communication
channel. {\bf Note} that port numbers under 1024 are
privileged.
\item[ {\tt -l } {\it logfile\/}
] Use {\it logfile\/} to log the requests.
\item[ {\tt -restart}
] Restart an already running {\tt httpd}.
{\tt httpd} finds the out the process number of the
running server from
{\tt PidFile}
and sends it the {\tt HUP} signal (HangUP). This will
cause {\tt httpd} to reload its configuration files and
reopen its log files. {\bf Important:} To find out the
{\tt PidFile} {\tt httpd} will have to read the
same configuration file as the running {\tt httpd} has, so
you have to specify the same {\tt -r } options on the
command line as for the actual {\tt httpd}.
\item[ {\tt -gc\_only}
] \lbrack only for proxies\rbrack
Do only garbage collection and then exit. This can be used to
run {\tt httpd} periodically by {\tt cron} to do
garbage collection on a cache that is used by {\tt httpd}
run from the {\tt inetd} daemon rather than standalone.
When {\tt httpd} is not running standalone it cannot
monitor the cache, nor perform automatic garbage collection.
\item[ {\tt -v}
] Verbose, turn on debugging messages.
\item[ {\tt -vv}
] Very Verbose, turn on even more verbose debugging messages.
\item[ {\tt -version}
] Print version number of {\tt httpd} and
{\tt libwww} (the WWW Common Library).
$<$!$--$ DT $<$CODE -newlog $<$/CODE $<$I logfile$<$/I
$<$DD Use $<$I logfile$<$/I to log the requests using the new, common
logfile format. This will eventually become the default.
$<$DT $<$CODE -errlog $<$/CODE $<$I errorlogfile$<$/I
$<$DD Use $<$I errorlogfile$<$/I to log errors. If this is not specified,
but $<$I logfile$<$/I is (with $<$CODE -l$<$/CODE or
$<$CODE -newlog$<$/CODE option), $<$I logfile.error$<$/I is used.
$<$DT $<$CODE -gmt$<$/CODE
$<$DD Use GMT instead of localtime in logfile (localtime is default).
$<$DT $<$CODE -nolog $<$/CODE $<$I template$<$/I
$<$DD Don't log accesses from hosts matching $<$I template$<$/I . Template
is either an IP number mask like $<$CODE 128.141.*.*$<$/CODE or a
hostname template containing at most one wildcard, for example
$<$CODE *.cern.ch$<$/CODE
$<$DT $<$CODE -disable $<$/CODE $<$I METHOD$<$/I
$<$DD Disable $<$I METHOD$<$/I on this server. You can also use the
$<$CODE Disable$<$/CODE directive in configuration file.
$<$DT $<$CODE -enable $<$/CODE $<$I METHOD$<$/I
$<$DD Enable $<$I METHOD$<$/I on this server. You can also use the
$<$CODE Enable$<$/CODE directive in configuration file.
$<$DT $<$CODE -setuid$<$/CODE
$<$DD When using user authentication, set server user-id to
authenticated user id (for people who have login accounts on the
same machine as the documents reside, and nobody else needs to
access them).
$--$$>$
\end{DL}
\par
\subsection{Directory Browsing}
You can set these also with the {\tt DirAccess}
configuration directive.
\begin{DL}{allow this much space}
\item[ {\tt -dy}
] Enable direcory browsing. Directories are returned as hypertext
documents. See browsing
directories. {\em Default.\/}
\item[ {\tt -dn}
] Disable directory browsing. An attempt to access a directory will
generate an error response.
\item[ {\tt -ds}
] Selective directory browsing; enabled only for directories
containing a file named {\tt .www\_browsable}
\end{DL}
\par
\subsection{README Feature}
It is common practice to put a file named {\tt README} into a
directory containing instructions or notices to be read by anyone new
to the directory. {\tt httpd} will by default embed any
{\tt README} file in the hypertext version of a directory. \par
You can set these also with the {\tt DirReadme}
configuration directive.
\begin{DL}{allow this much space}
\item[ {\tt -dt}
] For any browsable directory which contains a {\tt README}
file, include the text of the {\tt README} file at the top
of the document before the listing. {\em Default.\/}
\item[ {\tt -db}
] As {\tt -dt} but put the {\tt README} at the
bottom, after the listing. The {\tt -db} and
{\tt -dt} options may be combined with {\tt -dy} as
{\tt -dyb}, {\tt -dty} etc.
\item[ {\tt -dr}
] Disables the {\tt README} inclusion feature.
\end{DL}
\par
\section{Examples}
\begin{verbatim}
httpd -r /usr/etc/httpd.conf -p 80
\end{verbatim}
This is a standalone server running on port 80. Configuration file is
{\tt /usr/etc/httpd.conf} instead of the default,
{\tt /etc/httpd.conf}. \par
{\bf Note} that if the {\tt Port} directive is given in the
configuration file the {\tt -p } option is not necessary (it
can be used to override the value set in the configuration file). \par
\begin{verbatim}
httpd
\end{verbatim}
{\tt httpd} uses its default configuration file
{\tt /etc/httpd.conf}. If that file doesn't exist,
{\tt httpd} exports the {\tt /Public} directory tree.
This tree may contain soft links to other directory trees. \par
If the configuration file {\tt /etc/httpd.conf} didn't define
the port number to listen to
this is an {\tt httpd} reading its {\tt stdin} and
writing to its {\tt stdout}, so it is run by
{\tt inetd}. \par
\begin{verbatim}
httpd -r /usr/local/lib/httpd.conf
\end{verbatim}
The same as before, but uses {\tt /usr/local/lib/httpd.conf} as
a rule file instead of the default {\tt /etc/httpd.conf}. \par
\par
\chapter{{} Configuration File of CERN httpd}
The configuration file (often referred to as the rule file)
defines how {\tt httpd} will translate a request into
a document name. The directives controlling
{\tt httpd} features are also put into the
configuration file, as well as protection configuration.
This is essential to prevent unauthorized access to your
private documents. \par
\section{Default Configuration File}
By default, the configuration file {\tt /etc/httpd.conf} is
loaded, unless specified otherwise with the {\tt -r} command line
option:
\begin{verbatim}
httpd -p 80 -r /your/own/httpd.conf
\end{verbatim}
See also example configuration files. \par
\section{Comments in Configuration File}
Each line consists of an operation code and one or two parameters,
referred to as the template and the result. Lines starting with a
hash sign {\tt \#} are ignored, as are empty lines. \par
\par
\section{Restarting the Server}
When you are running the server in standalone mode (not from
{\tt inetd}), and modify the configuration file, send the
{\tt HUP} signal to {\tt httpd} to make it re-read the
configuration file. You can find out the process number from the pid file written by httpd, e.g.
\begin{verbatim}
> cat /server_root/httpd-pid
2846
> kill -HUP 2846
>
\end{verbatim}
{} You must specify the
configuration file as an {\bf absolute pathname} for the {\tt -r} option because
when the server is started in standalone mode it changes its current
directory to {\tt /} so after startup it cannot reload
configuration files that were specified with relative filenames. \par
To make restarting easier {\tt httpd } has a {\tt -restart
} option, which will automatically send the HUP signal to
another {\tt httpd} process. {\bf Important:} To find out the
{\tt PidFile} {\tt httpd} will have to read the same
configuration file as the running {\tt httpd} has, so you have
to specify the same {\tt -r } options on the command line as
for the actual {\tt httpd}, e.g.
\begin{verbatim}
> httpd -r /usr/etc/httpd.conf -restart
Restarting.. httpd
Sending..... HUP signal to process 21379
>
\end{verbatim}
\par
\section{Exhaustive List of Configuration Directives}
\begin{itemize}
\item General settings:
\begin{itemize}
\item {\tt ServerRoot}
\item {\tt HostName}
\item {\tt Port}
\item {\tt PidFile}
\item {\tt UserId}
\item {\tt GroupId}
\item {\tt Enable}
\item {\tt Disable}
\item {\tt IdentityCheck}
\item {\tt Welcome}
\item {\tt AlwaysWelcome}
\item {\tt UserDir}
\item {\tt MetaDir}
\item {\tt MetaSuffix}
\item {\tt MaxContentLengthBuffer}
\end{itemize}
\item URL translation rules:
\begin{itemize}
\item {\tt Map}
\item {\tt Pass}
\item {\tt Fail}
\item {\tt Redirect}
\item {\tt Protect}
\item {\tt DefProt}
\item {\tt Exec}
\end{itemize}
\item Filename suffix definitions:
\begin{itemize}
\item {\tt AddType}
\item {\tt AddEncoding}
\item {\tt AddLanguage}
\item {\tt SuffixCaseSense}
\end{itemize}
\item Accessory scripts:
\begin{itemize}
\item {\tt Search}
\item {\tt POST-Script}
\item {\tt PUT-Script}
\item {\tt DELETE-Script}
\end{itemize}
\item Directory listings:
\begin{itemize}
\item {\tt DirAccess}
\item {\tt DirReadme}
\item {\tt DirShowIcons}
\item {\tt DirShowBrackets}
\item {\tt DirShowMinLength}
\item {\tt DirShowMaxLength}
\item {\tt DirShowDate}
\item {\tt DirShowSize}
\item {\tt DirShowBytes}
\item {\tt DirShowHidden}
\item {\tt DirShowOwner}
\item {\tt DirShowGroup}
\item {\tt DirShowMode}
\item {\tt DirShowDescription}
\item {\tt DirShowMaxDescrLength}
\item {\tt DirShowCase}
\end{itemize}
\item Icons in directory listings:
\begin{itemize}
\item {\tt AddIcon}
\item {\tt AddBlankIcon}
\item {\tt AddUnknownIcon}
\item {\tt AddDirIcon}
\item {\tt AddParentIcon}
\end{itemize}
\item Logging:
\begin{itemize}
\item {\tt AccessLog}
\item {\tt ErrorLog}
\item {\tt LogFormat}
\item {\tt LogTime}
\item {\tt NoLog}
\item {\tt CacheAccessLog}
\end{itemize}
\item Timeouts:
\begin{itemize}
\item {\tt InputTimeOut}
\item {\tt OutputTimeOut}
\item {\tt ScriptTimeOut}
\end{itemize}
\item Proxy Caching:
\begin{itemize}
\item {\tt Caching}
\item {\tt CacheRoot}
\item {\tt CacheSize}
\item {\tt NoCaching}
\item {\tt CacheOnly}
\item {\tt CacheClean}
\item {\tt CacheUnused}
\item {\tt CacheDefaultExpiry}
\item {\tt CacheLastModifiedFactor}
\item {\tt CacheTimeMargin}
\item {\tt CacheNoConnect}
\item {\tt CacheExpiryCheck}
\item {\tt Gc}
\item {\tt GcDailyGc}
\item {\tt GcMemUsage}
\item {\tt CacheLimit\_1}
\item {\tt CacheLimit\_2}
\item {\tt CacheLockTimeOut}
\item {\tt CacheAccessLog}
\end{itemize}
\item Going through many proxies:
\begin{itemize}
\item {\tt http\_proxy}
\item {\tt ftp\_proxy}
\item {\tt gopher\_proxy}
\item {\tt wais\_proxy}
\item {\tt no\_proxy}
\end{itemize}
\end{itemize}
\par
\section{{}
General CERN httpd Configuration Directives}
\begin{itemize}
\item {\tt ServerRoot}
\item {\tt HostName}
\item {\tt Port}
\item {\tt PidFile}
\item {\tt UserId}
\item {\tt GroupId}
\item {\tt Enable}
\item {\tt Disable}
\item {\tt IdentityCheck}
\item {\tt Welcome}
\item {\tt AlwaysWelcome}
\item {\tt UserDir}
\item {\tt MetaDir}
\item {\tt MetaSuffix}
\item {\tt MaxContentLengthBuffer}
\end{itemize}
\par
\subsection{ServerRoot}
Server's "home" diretory is specified via {\tt ServerRoot}
directive. If server root is specified, but no {\tt AddIcon} directive has been used in
configuration file to set up icons, the default icon directory is
under server root {\tt icons}. The default icons that should
be present are:
\begin{itemize}
\item {\tt blank.xbm} blank icon for aligning the header with listing
\item {\tt directory.xbm} for directories
\item {\tt back.xbm} for parent directory
\item {\tt unknown.xbm} for unknown types
\item {\tt binary.xbm} for binary files
\item {\tt text.xbm} for text files
\item {\tt image.xbm} for image files
\item {\tt movie.xbm} for movies
\item {\tt sound.xbm} for audio files
\item {\tt tar.xbm} for tar files
\item {\tt compressed.xbm} for compressed files
\end{itemize}
If these defaults don't please you you can define all from the scratch.
As an example of {\tt AddIcon} directive, the defaults would be
specified as follows:
\begin{verbatim}
Pass /httpd-internal-icons/* /server_root/icons/*
AddBlankIcon /httpd-internal-icons/blank.xbm
AddDirIcon /httpd-internal-icons/directory.xbm DIR
AddParentIcon /httpd-internal-icons/back.xbm UP
AddUnknownIcon /httpd-internal-icons/unknown.xbm
AddIcon /httpd-internal-icons/binary.xbm BIN binary
AddIcon /httpd-internal-icons/text.xbm TXT text/*
AddIcon /httpd-internal-icons/image.xbm IMG image/*
AddIcon /httpd-internal-icons/movie.xbm MOV video/*
AddIcon /httpd-internal-icons/sound.xbm AU audio/*
AddIcon /httpd-internal-icons/tar.xbm TAR multipart/*tar
AddIcon /httpd-internal-icons/compressed.xbm CMP x-compress x-gzip
\end{verbatim}
\subsubsection{{} On Proxy Server}
On proxy server the icon URLs {\bf must be full URLs},
because otherwise clients would translate them relative to remote
host. This means that in the above example all the
{\tt AddIcon*} directives have to read:
\begin{verbatim}
AddIcon http://your.server/httpd-internal-icons/...
\end{verbatim}
{\bf and} you have to pass also the full icon URL:
\begin{verbatim}
Pass http://your.server/httpd-internal-icons/* /server_root/icons/*
\end{verbatim}
Since future smart browsers might notice that the icon server is the
same one as the proxy server it may be best in this case to also
{\tt Pass} the partial URL as above:
\begin{verbatim}
Pass /httpd-internal-icons/* /server_root/icons/*
\end{verbatim}
\par
\subsection{HostName}
On some hosts the hostname lookup fails producing only the name
without the domain part. Full hostname is necessary when
{\tt httpd} is generating references to itself (redirection
responses to clients). If necessary, provide full server hostname
with {\tt HostName} directive:
\begin{verbatim}
HostName full.server.host.name
\end{verbatim}
You may want to use this also when the real host name is different from
what you want the clients to see (you have a DNS alias for the host). \par
\par
\subsection{Default Port Setting}
For standalone server (the one running continuously, listening to a
certain port, and forking a child to handle the request) the port to
listen to can be defined via {\tt Port} configuration directive
instead of the {\tt -p }
{\it port\/} command line option. Normally:
\begin{verbatim}
Port 80
\end{verbatim}
{\tt -p } {\it port\/} command line line option still overrides
this default. \par
\par
\subsection{PidFile}
{\tt httpd} re-reads its configuration file when it receives
a {\tt HUP} signal \lbrack HANGUP\rbrack , the signal number 1. To make it
easy to find out the parent {\tt httpd} process id, it writes
it to a file. \par
By default, if {\tt ServerRoot} is
specified, this is the file {\tt httpd-pid} under server root;
if not, it defaults to {\tt /tmp/httpd-pid}. \par
The {\tt PidFile} directive can be used to set the process
id file name; it can be either an absolute path, or a relative one.
Relative path is relative to {\tt ServerRoot}, or if not
defined, relative to {\tt /tmp}.
\subsubsection{Example}
\begin{verbatim}
ServerRoot /Web/serverroot
PidFile logs/httpd-pid
\end{verbatim}
would cause the process id to be written to
{\tt /Web/serverroot/logs/httpd-pid}. \par
\par
\subsection{Default User Id}
{\tt UserId} directive sets the default user to run as instead
of {\tt nobody}. This directive is only meaningful when
running server as {\tt root.}
\begin{verbatim}
UserId whoever
\end{verbatim}
\par
\subsection{Default Group Id}
{\tt GroupId} directive sets the default group to run under
instead of {\tt nogroup}. This directive is only meaningful
when running server as {\tt root.}
\begin{verbatim}
GroupId whichever
\end{verbatim}
\par
\subsection{Enabling and Disabling
HTTP Methods}
You can enable/disable methods that you do/don't want your server to
accept:
\begin{verbatim}
Enable METHOD
Disable METHOD
\end{verbatim}
By default {\tt GET}, {\tt HEAD} and
{\tt POST} are enabled, and the rest are disabled. \par
\subsubsection{Examples}
\begin{verbatim}
Enable POST
Disable DELETE
\end{verbatim}
\par
\subsection{IdentityCheck}
If {\tt IdentityCheck} configuration directive is turned
{\tt On}, {\tt httpd} will connect to the ident daemon
(RFC931) of the remote host and find out the remote login name of the
owner of the client socket. This information is written to access log file, and put into the {\tt REMOTE\_IDENT }
CGI environment variable. \par
Default setting is {\tt Off}:
\begin{verbatim}
IdentityCheck Off
\end{verbatim}
and if you don't need this information you will save the resources by
keeping it off. Furthermore, this information does not provide any
more security and should not be trusted to be used in access control,
but rather just for informational purposes, such as logging. \par
\subsubsection{{}
WARNING
{}}
On some systems there is a kernel bug that causes all the connections
to the remote node to be broken if the remote ident request is not
answered (ident daemon not running, for example). This is reported
for at least SunOS 4.1.1, NeXT 2.0a, ISC 3.0 with TCP 1.3, and AIX
3.2.2, and later are ok. Sony News/OS 4.51, HP-UX 8-?? and Ultrix 4.3
still have this bug. A fix for Ultrix is availabe (CSO-8919). \par
\lbrack Thanks to Per-Steinar Iversen from Norway for pointing this out!\rbrack \par
If the operating system on your server host has this bug, {\bf do
not use IdentityCheck!} \par
\par
\subsection{Welcome}
{\tt Welcome} directive specifies the default file name to use
when only a directory name is specified in the URL. There may be many
{\tt Welcome} directives giving alternative welcome page names.
The one that was defined earlier will have precedence. \par
Default values are {\tt Welcome.html},
{\tt welcome.html} and {\tt index.html}.
{\tt index.html} is there only for compatibility with NCSA
server; the word "Welcome" is more descriptive, and has precedence.
\par
All default values will be overridden if {\tt Welcome}
directive is used. \par
Default values could be defined as:
\begin{verbatim}
Welcome Welcome.html
Welcome welcome.html
Welcome index.html
\end{verbatim}
\par
\subsection{AlwaysWelcome}
By default there is no difference between directory names with and without
a trailing slash when it comes to welcome pages. The one without a
trailing slash will cause an automatic redirection to the one with a
trailing slash, which then gets mapped to the welcome page. \par
If it is desirable to have plain directory names to produce a
directory listing, and only the ones with a trailing slash cause the
welcome page to be returned, set the {\tt AlwaysWelcome}
directive to off:
\begin{verbatim}
AllwaysWelcome Off
\end{verbatim}
Default value is {\tt On}. \par
\par
\subsection{User-Supported Directories}
User-supported directories, URLs of form {\bf /\~username}, are
enabled by {\tt UserDir} directive:
\begin{verbatim}
UserDir dir-name
\end{verbatim}
The {\it dir-name\/} argument is the directory in each user's home
directory to be exported, for example {\tt WWW}:
\begin{verbatim}
UserDir WWW
\end{verbatim}
\par
\subsection{Meta-Information}
It is possible to tell {\tt httpd} to add meta-information to
response. Meta-information is stored in a directory specified by
{\tt MetaDir} directive, under the same directory as the file
being retrieved:
\begin{verbatim}
MetaDir dir-name
\end{verbatim}
Meta-information is stored in a file with the same name as the actual
document, but appended with a suffix specified via
{\tt MetaSuffix} directive:
\begin{verbatim}
MetaSuffix .suffix
\end{verbatim}
Meta-information files contain RFC822-style headers. \par
Default settings are:
\begin{verbatim}
MetaDir .web
MetaSuffix .meta
\end{verbatim}
meaning that meta-information files are located in the
{\tt .web} subdirectory, and they end in {\tt .meta}
suffix, i.e. the metafile for file:
\begin{verbatim}
/Web/Demo/file.html
\end{verbatim}
would be:
\begin{verbatim}
/Web/Demo/.web/file.html.meta
\end{verbatim}
\par
\subsection{MaxContentLengthBuffer}
{\tt httpd} normally gives a content-lenght header line for
every document it returns. When it's running as a proxy it buffers the document
received from the remote server before sending it to the client. This
directive can be used to set the value of this buffer - if it is
exceeded the document will be returned without a content-lenght header
field. \par
Default setting is 50 kilobytes:
\begin{verbatim}
MaxContentLengthBuffer 50 K
\end{verbatim}
\par
\section{{}
Rules In The Configuration File}
Rules define the mapping between virtual URLs and physical file names.
Currently the following rules are understood:
\begin{itemize}
\item {\tt Map}
- Map URLs to actual files
\item {\tt Pass}
- Accept a request
\item {\tt Fail}
- Fail a request
\item {\tt Redirect}
- Redirect a request
\item {\tt Protect}
- Set up protection
\item {\tt DefProt}
- Default protection setup
\item {\tt Exec}
- Executable server scripts
\end{itemize}
\par
\subsection{Mapping, Passing and Failing}
There are three main rules: {\tt Map,} {\tt Pass} and
{\tt Fail.} The server uses the top rule first, then
{\bf each successive rule} unless told otherwise by a
{\tt Pass} or a {\tt Fail} rule. \par
\begin{DL}{allow this much space}
\item[ {\tt Map } {\it template result\/}
] If the address matches the {\it template\/}, use the {\it result\/}
string from now on for future rules.
\item[ {\tt Pass } {\it template\/}
] If the address maches the {\it template\/}, use it as it is,
porocessing no further rules.
\item[ {\tt Pass } {\it template result\/}
] If the string matches the {\it template\/}, use the {\it result\/}
string as it is, processing no futher rules.
\item[ {\tt Fail } {\it template\/}
] If the address matches the {\it template\/}, prohibit access,
processing no futher rules.
\end{DL}
The {\it template\/} string may contain wildcards (asterisks)
{\tt *}. (Versions earlier than 3.0 support only a single
wildcard.) The {\it result\/} string may have wildcards only if the
{\it template\/} has them. In this case they expand to matched strings
in respective order. \par
{\bf Whitespace, (literal) asterisks and backslashes} are allowed in
templates if they are preceded by a backslash. \par
{\bf The tilde character} (see user-supported directories) just after
a slash (in other words in the beginning of a directory name) has to
be explicitly matched, i.e. wildcard does not match it. \par
When matching,
\begin{itemize}
\item Rules are scanned from the top of the file to the bottom.
\item If a request matches a {\tt Map} template exactly, the
result string is used instead of the original string and applied
to successive rules.
\item If the request maches a {\tt Map} {\it template\/} with
wildcard, then the text of the request which matches the wildcard
is inserted in place of the wildcard in the {\it result\/} string
to form the translated request. If the result string has no
wildcard, it is used as it is.
\item When a {\tt Map} substitution takes place, the rule scan
continues with the next rule using the new string in place of the
request. This is not the case if a {\tt Pass} or
{\tt Fail} is matched: they terminate the rule scan.
\end{itemize}
\par
\subsection{Redirecting Requests Elsewhere}
When documents, or entire trees of documents, are moved from one
server to another, you can use {\tt Redirect} rule to tell
{\tt httpd} to redirect the request to another server. If the
client program is smart enough user won't even notice that the
document is retrieved from a different server.
\begin{DL}{allow this much space}
\item[ {\tt Redirect } {\it template result\/}
] Document matching {\it template\/} is redirected to {\it result\/},
which must be a {\bf full URL} (i.e. containing
{\tt http:} and the host name).
\end{DL}
\subsubsection{Example}
\begin{verbatim}
Redirect /hypertext/WWW/* http://www.cern.ch/WebDocs/*
\end{verbatim}
This redirects everything starting with {\tt /hypertext/WWW} to
host {\tt www.cern.ch} into virtual directory
{\tt /WebDocs}. For example,
{\tt /hypertext/WWW/TheProject.html} would be redirected to
{\tt http://www.cern.ch/WebDocs/TheProject.html}. \par
\par
\subsection{Setting Up User Authentication and Document Protection}
Documents are protected by {\tt Protect} and
{\tt DefProt} rules. Their syntax is the following:
\begin{DL}{allow this much space}
\item[ {\tt DefProt } {\it template \/} {\it setup-file\/} {\tt \lbrack }{\it uid.gid\/}{\tt \rbrack }
] Any document matching the {\it template\/} is associated with
protection {\it setup-file.\/} The documents are not yet taken to
be protected, but they may become protected by an existing access
control list file in the same directory as the requested file, or
by later matching a {\tt Protect} rule. If that
{\tt Protect} rule doesn't specify {\it setup-file\/}, the
one from the latest {\tt DefProt} rule is used.\par
\item[ {\tt Protect } {\tt \lbrack }{\it template setup-file\/} {\tt \lbrack }{\it uid.gid\/}{\tt \rbrack \rbrack }
] Any document matching {\it template\/} is protected. The type of
protection is defined in finer detail in {\it setup-file.\/} \par
If {\it setup-file\/} is not specified the one from previous
matched {\tt DefProt} rule will be used. If none have
matched access to the file is forbidden.
\end{DL}
{\it setupfile\/} is always a full pathname for the
protection setup file which specifies
the actual protection parameters. \par
Setup file can be omitted from {\tt Protect} rule, but it is
obligatory in {\tt DefProt} rule. If setup file is omitted it
is not possible to give the {\it uid.gid\/} part, either. \par
{\it uid.gid\/} are the Unix user id and group id (either by name or by
number, separated by a dot) to which the server should change when
serving the request. These are only meaningful when the server is
running as {\tt root.} If they are missing they default to
{\it nobody.nogroup\/}.\par
{\bf Note:} Uid and gid are inherited from
{\tt DefProt} rule to {\tt Protect} rule
{\bf only} when the {\it setup-file\/} is also inherited.
If {\it setup-file\/} is specified for {\tt Protect} rule but
{\it uid.gid\/} is not, they default to {\it nobody.nogroup\/}
regardless of the previous {\tt DefProt} rule. \par
This is to avoid accidentally running the server under wrong user id
with wrong setup file. This information should logically go into the
protection setup file, but for safety reasons it cannot be done,
because a non-trustworthy collaboration could specify it to be
{\tt root}. This way only the main {\tt webmaster} can
control user and group ids. \par
\par
\subsection{Executable Server Scripts}
Document address is mapped into a script call by {\tt Exec}
rule:
\begin{verbatim}
Exec template script
\end{verbatim}
{} In both
{\it template\/} and {\it script\/} there {\bf must be a
{\tt *} wildcard, that matches everything starting from the
script filename.} This is to enable {\tt httpd} to know
what is the script name and what is the extra path information to be
passed to the script.\par
\subsubsection{Example}
You want to map everything starting with {\tt /your/url/doit}
to execute the script {\tt /usr/etc/www/htbin/doit.} You do
this by saying:
\begin{verbatim}
Exec /your/url/* /usr/etc/www/htbin/*
\end{verbatim}
Here asterisk mathes the script name {\tt doit} (and everything
else that follows it). Usually people use some fixed keyword in front
of the pathname in URL to point out that the document is actually a
script call. Often this keyword is {\tt /htbin}. That is,
usually your {\tt Exec} rule looks like this:
\begin{verbatim}
Exec /htbin/* /usr/etc/www/htbin/*
\end{verbatim}
and all the URLs pointing to the scripts start with
{\tt /htbin}, for example {\tt /htbin/doit} in the
previous example. \par
\par
\subsubsection{Historical Note (HTBin Rule)}
CERN {\tt httpd} versions 2.13 and 2.14 had a hard-coded
handling of URL pathnames starting {\tt /htbin} that mapped
them to scripts in a directory specified via {\tt HTBin}
rule:
\begin{verbatim}
HTBin /your/htbin/directory
\end{verbatim}
This is still handled automatically by {\tt httpd}, by
translating it to its equivalent {\tt Exec} form:
\begin{verbatim}
Exec /htbin/* /your/htbin/directory/*
\end{verbatim}
Always use {\tt Exec} instead $--$ it is more general. \par
\par
\section{{}
Suffix Definitions for CERN httpd}
{\tt cern\_httpd} uses suffixes to discover the content-type,
content-encoding and content-language of a file. Default values are
so extensive that {\tt httpd} knows the usual file types. The
following configuration directives can be
used to add new suffix bindings and override existing defaults:
\begin{itemize}
\item {\tt AddType}
- Filename suffix mappings to MIME Content-Types
\item {\tt AddEncoding}
- Filename suffix mappings to MIME Content-Encodings
\item {\tt AddLanguage}
- Multilanguage support, suffix mappings to different Content-Languages
\item {\tt SuffixCaseSense}
- Set suffix case sensitivity
\end{itemize}
\par
\subsection{Binding Suffixes to MIME Content-Types}
As well as any mapping lines in the rule file, the rule file may be
used to define the data types of files with particular suffixes. CERN
{\tt httpd} has an extensive set of predefined
suffixes, so usually you don't need to specify any. \par
The syntax is:
\begin{verbatim}
AddType .suffix representation encoding [quality]
\end{verbatim}
The parameters are as follows:
\begin{DL}{allow this much space}
\item[{\it suffix\/}]
The last part of the filename. There are two special cases.
{\tt *.*} matches to all files which have not been matched by
any explicit suffixes but do contain a dot. {\tt *} by itself
matches to any file which does not match any other suffix. \par
\item[{\it representation\/}]
A MIME Content-Type style description of the repreentation in fact in
use in the file. See the HTTP spec. This need not be a real MIME
type - it will only be used if it matches a type given by a client. \par
\item[{\it encoding\/}]
A MIME content
transfer encoding type. Much more limited in variety than
representations, basically whether the file is ASCII (7bit or 8bit) or
binary. A few other encodings are allowed, and maybe extension to
compression. \par
\item[{\it quality\/}]
Optional. A floating point number between 0.0 and 1.0 which determines
the relative merits of files {\tt xxx.*} which differ in their
suffix only, when a link to {\tt xxx.multi} is being resolved.
Defaults to 1.0. \par
\end{DL}
\subsubsection{Examples}
\begin{verbatim}
AddType .html text/html 8bit 1.0
AddType .text text/plain 7bit 0.9
AddType .ps application/postscript 8bit 1.0
AddType *.* application/binary binary 0.1
AddType * text/plain 7bit
\end{verbatim}
\par
\subsubsection{Historical Note (Suffix Directive)}
{\tt AddType} was previously called {\tt Suffix.} The
old name is still understood, but may be misleading since suffixes are
also used to determine Content-Encoding and language. Always use
{\tt AddType} instead. \par
\par
\subsection{Binding Suffixes to MIME Content-Endocings}
Suffixes are also used to determine the Content-Encoding of a
file ({\tt .Z} suffix for {\tt x-compressed}, for
example). Syntax is:
\begin{verbatim}
AddEncoding .suffix encoding
\end{verbatim}
\subsubsection{Example}
\begin{verbatim}
AddEncoding .Z x-compress
\end{verbatim}
\par
\subsection{Multilanguage Support}
Multilanguage support is also built on using suffixes to determine the
language of a document. Suffix is bound to a language by
{\tt AddLanguage} rule ({\tt .en} suffix for english,
for example). Syntax is:
\begin{verbatim}
AddLanguage .suffix encoding
\end{verbatim}
\subsubsection{Examples}
\begin{verbatim}
AddLanguage .en en
AddLanguage .uk en_UK
\end{verbatim}
\par
\subsection{Suffix Case Sensitivity}
Suffix case sensitivity is by default {\it off.\/} You can make
suffixes case sensitive with {\tt SuffixCaseSense} directive:
\begin{verbatim}
SuffixCaseSense On
\end{verbatim}
\par
\section{{}
Accessory Scripts}
In addition to having a fully configurable CGI script interface to handle form
requests, CERN {\tt httpd} has a few special directives to
handle certain tasks always via CGI scripts:
\begin{itemize}
\item keyword searches
\item general {\tt POST}
\item general {\tt PUT}
\item general {\tt DELETE}
\end{itemize}
\par
\subsection{Keyword Search Facility}
Server automatically calls a script
to perform search,
if the {\bf absolute pathname} of search script is supplied
by a {\tt Search} directive in the configuration file:
\begin{verbatim}
Search /search/script/pathname
\end{verbatim}
This script is called with the vital information in the following
CGI environment
variables:
\begin{DL}{allow this much space}
\item[ {\tt PATH\_INFO}
] contains the virtual URL of the file from where the query was
issued from. \par
\item[ {\tt PATH\_TRANSLTED}
] contains the physical filename of the document corresponding
to the virtual URL in {\tt PATH\_INFO}. \par
\item[ {\tt QUERY\_STRING}
] contains the (URL-encoded) keywords, which are also available
decoded as command line parameters, one in each of
{\tt argv\lbrack 1\rbrack }, {\tt argv\lbrack 2\rbrack }, ... \par
\end{DL}
Search script must conform to CGI/1.1 rules, that is, it has to start
its output with a MIME header {\bf followed by a blank
line}, after which comes the actual document. MIME header
{\bf must} contain either a {\tt Location: } field,
or a {\tt Content-Type: } field, typically:
\begin{verbatim}
Content-Type: text/html
\end{verbatim}
if the document is an HTML document. \par
\par
\subsection{General POST Method Handler Script}
{\tt POST} requests are handled by calling the script defined
by {\tt POST-Script} directive:
\begin{verbatim}
POST-Script /absolute/path/post-handler
\end{verbatim}
POST handler script is called in the normal CGI manner, and its output must be CGI
compliant. \par
{} Only such {\tt POST}
requests are handled by the POST handler that haven't already matched
an {\tt Exec} rule (which causes
a specified script to be called). \par
\par
\subsection{General PUT Method Handler Script}
{\tt PUT} requests are handled by calling the script defined by
{\tt PUT-Script} configuration directive:
\begin{verbatim}
PUT-Script /absolute/path/put-handler
\end{verbatim}
PUT handler script is called in the normal CGI manner, and its output must be CGI
compliant. \par
{} By default {\tt PUT}
method is disabled; you must explicitly enable it in the configuration
file:
\begin{verbatim}
Enable PUT
\end{verbatim}
This is to enhance security. \par
{} Since {\tt PUT} can
be a very dangerous method because it allows files to be written back to
the server, it is not possible to use {\tt PUT} without access
authorization module being activated. This means that you have to
have at least a {\tt DefProt}
rule specifying a default protection setup, which then in turn defines
the {\tt PutMask} containing the list of allowed users and
hosts to perform PUT operation. \par
\par
\subsection{General DELETE Method Handler Script}
{\tt DELETE} requests are handled by calling the script defined by
{\tt DELETE-Script} configuration directive:
\begin{verbatim}
DELETE-Script /absolute/path/put-handler
\end{verbatim}
DELETE handler script is called in the normal CGI manner, and its output must be CGI
compliant. \par
{} By default {\tt PUT}
method is disabled; you must explicitly enable it in the configuration
file:
\begin{verbatim}
Enable DELETE
\end{verbatim}
This is to enhance security. \par
{} Since {\tt DELETE} can
be a very dangerous method because it allows files to be deleted from
the server, it is not possible to use {\tt DELETE} without access
authorization module being activated. This means that you have to
have at least a {\tt DefProt}
rule specifying a default protection setup, which then in turn defines
the {\tt DeleteMask} containing the list of allowed users and
hosts to perform DELETE operation. \par
\par
\section{{} Directory Browsing}
By default references to directories which don't include a welcome page cause {\tt httpd} to
generate a hypertext view of the directory listing. There are
numerous configuration directives controlling this feature:
\begin{itemize}
\item {\tt DirAccess}
- Enable/Selective/Disable directory listings
\item {\tt DirReadme}
- Configure/disable README-feature
\item Controlling the appearance of directory listings:
\begin{itemize}
\item {\tt DirShowIcons}
- Show icons in directory listings
\item {\tt DirShowDate}
- show last-modified date
\item {\tt DirShowSize}
- show file sizes
\item {\tt DirShowBytes}
- show byte count for small files
\item {\tt DirShowDescription}
- show descriptions for files
\item {\tt DirShowMaxDescrLength}
- maximum description length
\item {\tt DirShowBrackets}
- use brackets around ALTernative text used instead of an icon
\item {\tt DirShowMinLength}
- minimum width to reserve for filenames
\item {\tt DirShowMaxLength}
- maximum width to reserve for filenames
\item {\tt DirShowHidden}
- show also files starting with a dot (hidden Unix files)
\item {\tt DirShowOwner}
- show owner of the file
\item {\tt DirShowGroup}
- show group of the file
\item {\tt DirShowMode}
- show permissions of the file
\item {\tt DirShowCase}
- do sorting in a case-sensitive manner
\end{itemize}
\item Icons:
\begin{itemize}
\item {\tt AddIcon}
- bind icon URL to a MIME Content-Type or Content-Encoding
\item {\tt AddBlankIcon}
- icon URL used in the heading of the listing to align it
\item {\tt AddUnknownIcon}
- icon URL for unknown file types
\item {\tt AddDirIcon}
- icon URL for directories
\item {\tt AddParentIcon}
- icon URL for parent directory
\end{itemize}
\end{itemize}
\par
\subsection{Controlling Directory Browsing}
\begin{DL}{allow this much space}
\item[{\tt DirAccess on}]
Enable directory browsing in all directories (which are not
forbidden by rules).
Synonym with {\tt -dy} command line option.
{\it Default.\/}\par
\item[{\tt DirAccess off}]
Disable directory browsing.
Synonym with {\tt -dn} command line option.
\par
\item[{\tt DirAccess selective}]
Enable selective directory browsing - only directories
containing the file {\tt .www\_browsable} are allowed.
Synonym with {\tt -ds} command line option.
\par
\end{DL} \par
\par
\subsection{README Feature}
\begin{DL}{allow this much space}
\item[{\tt DirReadme top}]
For any browsable directeory containing a {\tt README}
file, include the text at the top of the directory listing.
Synonym with {\tt -dt} command line option.
{\it Default.\/}
\par
\item[{\tt DirReadme bottom}]
Same as previous, but contents of {\tt README} appear
on the bottom.
Synonym with {\tt -db} command line option.
\par
\item[{\tt DirReadme off}]
Disables the {\tt README} inclusion feature.
Synonym with {\tt -dr} command line option.
\par
\end{DL} \par
\par
\subsection{Controlling The Look of Directory Listings}
The following {\tt On/Off} directives control how the directory
listings look like. The default is to show icons, use brackets around
ALTernaltive text, show last-modifid, size and description, and allow
filename field width to vary between 15-22 characters, and reserve
25 characters for description. \par
\begin{DL}{allow this much space}
\item[ {\tt DirShowIcons}
] Generate inlined image calls in front of each line. Icons
visualize the content-type of the file, and they are defined by
{\tt AddIcon} configuration
directive. {\em Default.\/} \par
\item[ {\tt DirShowDate}
] Show last modification date. {\em Default.\/} \par
\item[ {\tt DirShowSize}
] Show the size of files. {\em Default.\/} \par
\item[ {\tt DirShowBytes}
] By default files smaller than 1K are shown as just 1K. Setting
this directive to {\tt On} will cause the exact byte count
to appear. \par
\item[ {\tt DirShowDescription}
] Show description if available. {\em Default.\/} \par
At the time of release of 2.17 there was no consensus about
where the descriptions come from, and the mechanism is currently
undocumented. For HTML files description it the TITLE element;
for other files the description field is left empty. \par
\item[ {\tt DirShowMaxDescrLenght}
] The maximum number of characters to show in the description field. \par
\item[ {\tt DirShowBrackets}
] Use brackets around ALTernative text used by browsers not capable
of displaying images. {\em Default.\/} \par
\item[ {\tt DirShowHidden}
] Show hidden Unix files (the ones starting with a dot). \par
\item[ {\tt DirShowOwner}
] Show the owner of the file. \par
\item[ {\tt DirShowGroup}
] Show the group of the file. \par
\item[ {\tt DirShowMode}
] Show the permissions of files. \par
\item[ {\tt DirShowCase}
] Sort entries in a case-sensitive manner, i.e. all capital letters
before lower-case letters. \par
\end{DL}
\par
\subsection{Filename Length}
There is a minimum and maximum width for the filename field. Entries
longer than the maximum value will be truncated. Default values are
15 and 25, and they can be changed with these directives:
\begin{DL}{allow this much space}
\item[ {\tt DirShowMinLength } {\it num\/}
] At least this amount of characters is always reserved for
filenames. If the longest filename in the directory is longer
than {\it num\/} the field will be extended, but no more than the
maximum limit (see next directive).\par
\item[ {\tt DirShowMaxLength } {\it num\/}
] Filenames longer than {\it num\/} will be truncated to fit in length. \par
\end{DL}
\subsubsection{Example}
The default values would be set by saying:
\begin{verbatim}
DirShowMinLength 15
DirShowMaxLength 25
\end{verbatim}
\par
\section{ {}
Icons In The Directory Listings}
{\tt cern\_httpd} directory icons
are used, if enabled, for
both regular directory listings, and FTP listings (when runnins as a
proxy). \par
\begin{itemize}
\item {\tt AddIcon}
- bind icon URL to a MIME Content-Type or Content-Encoding
\item {\tt AddBlankIcon}
- icon URL used in the heading of the listing to align it
\item {\tt AddUnknownIcon}
- icon URL for unknown file types
\item {\tt AddDirIcon}
- icon URL for directories
\item {\tt AddParentIcon}
- icon URL for parent directory
\end{itemize}
These directives are specified in the configuration file. \par
\par
\subsection{AddIcon Directive}
The {\tt AddIcon} directive binds an icon to a MIME
Content-Type or Content-Encoding:
\begin{verbatim}
AddIcon icon-url ALT-text template
\end{verbatim}
\begin{DL}{allow this much space}
\item[ {\it icon-url\/}
] is the URL of the icon. \par
\item[ {\it ALT-text\/}
] is the alternative text to use on character terminal browsers. \par
\item[ {\it template\/}
] is either a Content-Type template or a Content-Encoding template.
Content-Type template must always contain a slash, whereas
Content-Encoding template never has it. \par
\end{DL}
The following important remarks serve also as examples. \par
\subsubsection{{}
CERN httpd as a Normal HTTP Server}
Understand that the {\it icon-url\/} is a virtual URL - one that will
be translated through the rules. Therefore you must make sure that
your configuration rules allow the icon URLs to be passed, e.g.:
\begin{verbatim}
AddIcon /icons/UNKNOWN.gif ??? */*
AddIcon /icons/TEXT.gif TXT text/*
AddIcon /icons/IMAGE.gif IMG image/*
AddIcon /icons/SOUND.gif AU audio/*
AddIcon /icons/MOVIE.gif MOV video/*
AddIcon /icons/PS.gif PS application/postscript
Pass /icons/* /absolute/icon/dir/*
...other rules...
\end{verbatim}
\subsubsection{{}
CERN httpd as a Proxy}
When using {\tt httpd} as a proxy the icon URL {\bf must
be} an absolute URL pointing to your server; otherwise clients
would translate it relative to the remote host. \par
{\bf Furthermore,} you must have a mapping from this
absolute URL to your local file system, e.g.:
\begin{verbatim}
AddIcon http://your.server/icons/UNKNOWN.gif ??? */*
AddIcon http://your.server/icons/TEXT.gif TXT text/*
AddIcon http://your.server/icons/IMAGE.gif IMG image/*
AddIcon http://your.server/icons/SOUND.gif AU audio/*
AddIcon http://your.server/icons/MOVIE.gif MOV video/*
AddIcon http://your.server/icons/PS.gif PS application/postscript
Pass http://your.server/icons/* /absolute/icon/dir/*
Pass /icons/* /absolute/icon/dir/*
Pass http:*
Pass ftp:*
Pass gopher:*
\end{verbatim}
{}
Both the full and partial icon URLs are {\tt Pass}'ed because
smart clients may be configured to connect to local
servers directly, instead of through the proxy, and in that case the proxy
server (which is then just a normal HTTP server from client's point
of view) will be requested for {\tt /icons/...} instead of
{\tt http://your.server/icons/...}. The proxy server has no way
of knowing which will happen. \par
\par
\subsection{Icons in Gopher Listings}
There are special internal (to {\tt httpd}) MIME content types
that can be bound to icons for gopher listings (the names should be
self-explanatory):
\begin{itemize}
\item {\tt application/x-gopher-index}
\item {\tt application/x-gopher-cso}
\item {\tt application/x-gopher-telnet}
\item {\tt application/x-gopher-tn3270}
\item {\tt application/x-gopher-duplicate}
\end{itemize}
\par
\subsection{Special Icons}
{\tt httpd} needs some special icons:
\begin{DL}{allow this much space}
\item[ {\tt AddBlankIcon}
] Icon URL used in the heading of the listing to align it.
This is typically a blank icon, but may contain some nice image
that you wish to have on top of all your listings. The only criterion
is that it must be the same size as the other icons. \par
\item[ {\tt AddUnknownIcon}
] Icon URL used for unknown file types, i.e. files for which no
other icon binding applies. If you have an exhaustive set of
{\tt AddIcon} directives this needs not be used. \par
\item[ {\tt AddDirIcon}
] Icon URL for directories. \par
\item[ {\tt AddParentIcon}
] Icon URL for parent directory. \par
\end{DL}
\subsubsection{Example For a Regular HTTP Server}
{}
Remember to {\tt Pass} the icon URLs! \par
\begin{verbatim}
AddBlankIcon /icons/BLANK.gif
AddUnknownIcon /icons/UNKNOWN.gif ???
AddDirIcon /icons/DIR.gif DIR
AddParentIcon /icons/PARENT.gif UP
Pass /icons/* /absolute/icon/dir/*
...other rules...
\end{verbatim}
\subsubsection{Example For a Proxy Server}
{}
Icon URLs {\bf must be absolute URLs}, and you must have
a mapping from the absolute form to local form, and remember to
{\tt Pass} them:
\begin{verbatim}
AddBlankIcon http://your.server/icons/BLANK.gif
AddUnknownIcon http://your.server/icons/UNKNOWN.gif ???
AddDirIcon http://your.server/icons/DIR.gif DIR
AddParentIcon http://your.server/icons/PARENT.gif UP
Pass http://your.server/icons/* /absolute/icon/dir/*
Pass /icons/* /absolute/icon/dir/*
Pass http:*
Pass ftp:*
Pass gopher:*
\end{verbatim}
\par
\section{{}
Logging Control In CERN httpd}
{\tt cern\_httpd} logs all the incoming requests to an access
log file. It also has an error log where internal server errors are
logged.
\begin{itemize}
\item {\tt AccessLog}
- Set access log file name
\item {\tt ErrorLog}
- Set error log file name
\item {\tt LogFormat}
- Set access log file format
\item {\tt LogTime}
- Set time zone for log files
\item {\tt NoLog}
- No log entries for listed hosts/domains
\item {\tt CacheAccessLog}
- Log cache accesses to a different log file
\end{itemize}
\par
\subsection{Access Log File}
Access log file contains a log of all the requests. The name of the
log file is spesified either by {\tt -l }{\it logfile\/} command
line option, or with {\tt AccessLog} directive:
\begin{verbatim}
AccessLog /absolute/path/logfile
\end{verbatim}
\par
\subsection{Error Log File}
Error log contains a log of errors that might prove useful when
figuring out if something doesn't work. Error log file name is set by
{\tt ErrorLog} directive:
\begin{verbatim}
ErrorLog /absolute/path/errorlog
\end{verbatim}
If error log file is not specified, it defaults to access log file
name with {\tt .error} extension. If the filename extension
already exists, {\tt .error} will replace it. \par
\par
\subsection{Log File Format}
Previously every server used to have its own logfile format which made
it difficult to write general statistics collectors. Therefore there
is now a {\em common logfile format\/} (which will eventually become
the default). Currently it is enabled by
\begin{verbatim}
LogFormat Common
\end{verbatim}
The old CERN {\tt httpd} format can be used by
\begin{verbatim}
LogFormat Old
\end{verbatim}
\par
\subsection{Log Time Format}
Times in the log file are by default local time. That can be changed
to be GMT time by {\tt LogTime} directive:
\begin{verbatim}
LogTime GMT
\end{verbatim}
Default is:
\begin{verbatim}
LogTime LocalTime
\end{verbatim}
\par
\subsection{Suppressing Log Entries For Certain Hosts/Domains}
It's not always necessary to collect log information of accesses made
by local hosts. The {\tt NoLog} directive can be used to
prevent log entry being made for hosts matching a given IP number or
host name template:
\begin{verbatim}
NoLog template
\end{verbatim}
\subsubsection{Examples}
\begin{verbatim}
NoLog 128.141.*.*
NoLog *.cern.ch
NoLog *.ch *.fr *.it
\end{verbatim}
\par
\section{{}
Timeout Settings}
Something may go wrong with the connection to the client causing
{\tt httpd} to hang infinitely doing nothing. This can be
avoided by setting timeouts on different tasks that the server
performs. All of these timeouts have relatively good default values
by default and they don't usually need to be changed. \par
All the times for these directives are of form:
\begin{verbatim}
45 secs
10 mins
2 mins 30 secs
1 hour
\end{verbatim}
\par
\subsection{InputTimeOut}
{\tt InputTimeOut} diretictive specifies the time to wait for
the client to send the request (the MIME-header part of it, not the
message body). Default value is:
\begin{verbatim}
InputTimeOut 2 mins
\end{verbatim}
\par
\subsection{OutputTimeOut}
{\tt OutputTimeOut} diretictive specifies the time to allow for
sending the response. Default value is:
\begin{verbatim}
OutputTimeOut 20 mins
\end{verbatim}
If you are serving huge files for clients behind slow connections you
may want to increase this value if you hear of connections being cut
in the middle of transfer. \par
\par
\subsection{ScriptTimeOut}
{\tt ScriptTimeOut} diretictive specifies the time to allow for
server scripts to finish. If a script doesn't return in the time
specified {\tt httpd} will send {\tt TERM} and
{\tt KILL} signals to it (with 5 seconds in between to let
scripts do cleanup upon exit).
Default value is:
\begin{verbatim}
ScriptTimeOut 5 mins
\end{verbatim}
\par
\section{{}
Proxy Caching}
When {\tt cern\_httpd} is run as a
proxy it can perform caching of
the documents retrieved from remote hosts to make futher requests
faster. \par
\begin{itemize}
\item {\tt Caching}
- Turn caching on
\item {\tt CacheRoot}
- Set cache root directory for a proxy server
\item {\tt CacheSize}
- Specify cache size (in megabytes)
\item {\tt NoCaching}
- No caching for URLs matching a given mask
\item {\tt CacheOnly}
- Cache only if URL matches a given set of URLs
\item {\tt CacheClean}
- Remove everything older than this (in days)
\item {\tt CacheUnused}
- Remove if has been unused this long (in days)
\item {\tt CacheDefaultExpiry}
- Default expiry time if not given by remote server (in days)
\item {\tt CacheLastModifiedFactor}
- Factor used in approximating expiry date
\item {\tt CacheTimeMargin}
- Time accuracy between hosts
\item {\tt CacheNoConnect}
- Standalone cache mode - no external document retrievals
\item {\tt CacheExpiryCheck}
- Turn off expiry checking for standalone operation
\item {\tt Gc}
- Enable and disable garbage collection
\item {\tt GcDailyGc}
- Time for daily garbage collection
\item {\tt GcTimeInterval}
- Interval to do cache garbage collection (in hours)
\item {\tt GcReqInterval}
- Number of requests between garbage collections
\item {\tt GcMemUsage}
- Garbage collector memory usage directive
\item {\tt CacheLimit\_1}
- First cache file size limit (kilobytes)
\item {\tt CacheLimit\_2}
- Second cache file size limit (kilobytes)
\item {\tt CacheLockTimeOut}
- Break cache locks after this timeout
\item {\tt CacheAccessLog}
- Log cache accesses to a different log file
\end{itemize}
\par
\subsection{Turning Caching On and Off}
Caching is normally turned implicitly on by specifying the
Cache Root Directory, but it can be
explicitly turned on and off by {\tt Caching} directive:
\begin{verbatim}
Caching On
\end{verbatim}
\par
\subsection{Setting Cache Directory}
Caching is enabled on a server running as a gateway (proxy) by
{\tt CacheRoot} directive, which is used to set the absolute
path of the cache directory:
\begin{verbatim}
CacheRoot /absolute/cache/directory
\end{verbatim}
\par
\subsection{Cache Size}
{\tt CacheSize} directive sets the maximum cache size in
megabytes. Default value is 5MB, but its preferable to have several
megabytes of cache, like 50-100MB, to get best results. Cache may,
however, temporarily grow a few megabytes bigger than specified.
\subsubsection{Example}
\begin{verbatim}
CacheSize 20 M
\end{verbatim}
sets cache size to 20 megabytes. \par
\par
\subsection{NoCaching}
URLs matching a template given by {\tt NoCaching} directive
will never be cached, e.g.:
\begin{verbatim}
http://really.useless.site/*
\end{verbatim}
From version 3.0 on templates can have any number of wildcard characters
{\tt *}. \par
\par
\subsection{CacheOnly}
Only the URLs matching templates given by {\tt CacheOnly}
directives will be cached, e.g.:
\begin{verbatim}
http://really.important.site/*
\end{verbatim}
From version 3.0 on templates can have any number of wildcard characters
{\tt *}. \par
\par
\subsection{Maximum Time to Keep Cache Files}
All cached documents matching a specified template and that are older
than specified by {\tt CacheClean} directive will be removed.
This value overrides expiry date in that no file can be stored longer
than this value specifies, regardless of expiry date.
\subsubsection{Examples}
\begin{verbatim}
CacheClean http:* 1 month
CacheClean ftp:* 14 days
CacheClean gopher:* 5 days 12 hours
\end{verbatim}
\par
\subsection{Maximum Time to Keep Unused Files}
Cache files matching a template and having been unused longer than
specified by {\tt CacheUnused} directive will be removed.
\subsubsection{Examples}
\begin{verbatim}
CacheUnused * 4 days 12 hours
CacheUnused http://info.cern.ch/* 7 days
CacheUnused ftp://some.server/* 14 days
\end{verbatim}
Note that the last matching specification will have precedence;
therefore HTTP files from {\tt info.cern.ch} will be kept
7 days, and {\bf not} 4.5 days. \par
\par
\subsection{Default Expiry Time}
Files for which the server gave neither {\tt Expires:} nor
{\tt Last-Modified:} header will be kept at most the time
specified by {\tt CacheDefaultExpiry} directive.
Default values are zero for HTTP (script replies shouldn't be cached),
and 1 day for FTP and Gopher. \par
\subsubsection{Example}
\begin{verbatim}
CacheDefaultExpiry ftp:* 1 month
CacheDefaultExpiry gopher:* 10 days
\end{verbatim}
{} Default expiry for HTTP will
almost always cause problems because there are currently many scripts
that don't give an expiry date, yet their output expires immediately.
Therefore, it is better to keep the default value for
{\tt http:} in zero. \par
\par
\subsection{CacheLastModifiedFactor}
Currently HTTP servers give usually only the
{\tt Last-Modified} time, but not {\tt Expires} time.
{\tt Last-Modified} can often be successfully used to
approximate expiry date. {\tt CacheLastModifiedFactor} gives
the fraction of time since last modification to give the remaining
time to be up-to-date. \par
Default value is {\tt 0.1}, which means that e.g. file modified
20 days ago will expire in 2 days. \par
\subsubsection{Examples}
\begin{verbatim}
CacheLastModifiedFactor 0.2
\end{verbatim}
would cause files modified 5 months ago to expire after one month. \par
This feature can be turned off by specifying:
\begin{verbatim}
CacheLastModifiedFactor Off
\end{verbatim}
\par
\subsection{CacheTimeMargin}
Sometimes inaccurate times on other hosts cause confusion in caching.
It often also makes sense not to cache documents that will expiry in
a couple of minutes anyway. {\tt CacheTimeMargin} defines this
time margin, by default:
\begin{verbatim}
CacheTimeMargin 2 mins
\end{verbatim}
No document expiring in less than two minutes will be written to disk.
\par
\par
\subsection{CacheNoConnect}
This directive puts proxy to standalone cache mode, i.e. only the
documents found in the cache are returned, and ones no in the cache
will return error rather than connection to the outside world. This
is useful for demo-purposes and in other cases without network
connection:
\begin{verbatim}
CacheNoConnect On
\end{verbatim}
Default setting is naturally {\tt Off}. \par
This directive is typically used with expiry checking also turned
{\tt Off}. \par
\par
\subsection{CacheExpiryCheck}
If (for demo-reasons etc) it's desired that the proxy always returns
documents from the cache, even if they have expired,
{\tt CacheExpiryCheck} can be turned off:
\begin{verbatim}
CacheExpiryCheck Off
\end{verbatim}
Default setting is {\tt On}, meaning that proxy never returns an
expired document. \par
This is usually used in standalone cache
mode ({\tt CacheNoConnect} diretive turned
{\tt On}). \par
\par
\subsection{Garbage Collection}
When caching is enabled garbage collection is also activated by
default. This can be explicitly turned off with {\tt Gc}
directive:
\begin{verbatim}
Gc Off
\end{verbatim}
\par
\subsection{When to Do Garbage Collection}
Garbage collection is launched right away when cache size limit is
reached. However, to keep cache smaller it might be desirable to
remove expired files even if there is still cache space remaining.
It is possible to to launch garbage collection at a certain time,
usually outside the busy hours:l
\begin{verbatim}
GcDailyGc time
\end{verbatim}
\par
{\tt GcDailyGc} specifies the time to do daily garbage
collection, normally during the night. Default value is 3:00.
Daily garbage collection can be disabled by specifying
{\tt Off}. \par
\subsubsection{Example}
Default value would be specified as:
\begin{verbatim}
GcDailyGc 3:00
\end{verbatim}
Another example: turning daily gc off:
\begin{verbatim}
GcDailyGc Off
\end{verbatim}
\par
\subsection{Memory Usage of Garbage Collector}
Garbage collector performs its job best if if can read information
about the whole cache into memory at once. This is not possible if
the machine doesn't have enough main memory. \par
{\tt GcMemUsage} directive advices garbage collector about how
much memory to use. You may imagine this is the number of kilobytes
to use for gc data, but it may vary greatly according to dynamic
things, like the directory structure of cached files. \par
Default is 500; if gc fails because memory runs out make this smaller.
If your machine has so much memory that it just can't run out, make
this very big. \par
\subsubsection{Example}
\begin{verbatim}
GcMemUsage 100
\end{verbatim}
if you have very little memory. \par
\par
\subsection{Cache File Sizes}
There are two limits controlling the size factor of a file when its
value is being calculated. {\tt CacheLimit\_1} sets the lower
limit; under this all the files have equal size factor.
{\tt CacheLimit\_2} sets up higher limit; files bigger than this
get extremely bad size factor (meaning they get removed right away
because they are too big). \par
Sizes are specified in kilobytes, and defaults values are 200K and
4MB, respectively.
\subsubsection{Examples}
\begin{verbatim}
CacheLimit_1 200 K
CacheLimit_2 4000 K
\end{verbatim}
would set the same values as the defaults, 200K and 4MB. \par
\par
\subsection{Cache Lock Timeout}
During retrieval cache files are locked. If something goes wrong a
lock file may be left hanging. {\tt CacheLockTimeOut}
directive sets the amount of time after which lock can be broken.
Time is specified like all the other times in the configuration file, and
default value is 20 minutes, the same as default {\tt OutputTimeOut}.
{\bf CacheLockTimeOut should never be less than
OutputTimeOut!}
\subsubsection{Example}
\begin{verbatim}
CacheLockTimeOut 30 mins
\end{verbatim}
would set lock timeout to half an hour. \par
\par
\subsection{CacheAccessLog}
Cache accesses can be logged to a different log file instead of the
normal access log. The
{\tt CacheAccessLog} directive takes an absolute pathname of
the cache access log file:
\begin{verbatim}
CacheAccessLog /absolute/path/file.log
\end{verbatim}
\par
\section{{}
Configuring Proxy To Connect To Another Proxy}
If there is a need to make an (inner) proxy cern\_httpd connect to the outside world via
another (outer) proxy server, you can use the same environment
variables as are used to redirect clients to the proxy to make inner
proxy use the outer one:
\begin{itemize}
\item {\tt http\_proxy}
\item {\tt ftp\_proxy}
\item {\tt gopher\_proxy}
\item {\tt wais\_proxy}
\end{itemize}
E.g. your (inner) proxy server's startup script could look like this:
\begin{verbatim}
#!/bin/sh
http_proxy=http://outer.proxy.server:8082/
export http_proxy
/usr/etc/httpd -r /etc/inner-proxy.conf -p 8081
\end{verbatim}
This is a little ugly, so there are also the following directives in
the configuration file:
\begin{itemize}
\item {\tt http\_proxy } {\it http://outer.proxy.server/\/}
\item {\tt ftp\_proxy } {\it http://outer.proxy.server/\/}
\item {\tt gopher\_proxy } {\it http://outer.proxy.server/\/}
\item {\tt wais\_proxy } {\it http://outer.proxy.server/\/}
\end{itemize}
\par
\subsection{no\_proxy}
In the same way that clients can specify a set of domains for which
the proxy should not be consulted, {\tt httpd} has a
{\tt no\_proxy} configuration directive to tell it that it
should not connect to another proxy for certain URLs:
\begin{verbatim}
no_proxy cern.ch,ncsa.uiuc.edu,some.host:8080
\end{verbatim}
{}
The argument string is a comma-separated list and should {\bf not contain
spaces!} \par
\par
\chapter{{}
Configuration File Examples}
\begin{DL}{allow this much space}
\item[ {\tt httpd.conf}
] sample configuration file for running as a normal HTTP server.
\item[ {\tt prot.conf}
] sample configuration file for running as a normal HTTP server
with access control.
\item[ {\tt proxy.conf}
] sample configuration file for running as a
proxy
{\bf without caching.}
\item[ {\tt caching.conf}
] sample configuration file for running as a
proxy
{\bf with caching.}
\end{DL}
\par
\par
\section{Normal HTTP Server Configuration}
\begin{verbatim}
#
# Sample configuration file for cern_httpd for running it
# as a normal HTTP server.
#
# See:
#
#
# for more information.
#
# Written by:
# Ari Luotonen April 1994
#
#
# Set this to point to the directory where you unpacked this
# distribution, or wherever you want httpd to have its "home"
#
ServerRoot /where/ever/server_root
#
# The default port for HTTP is 80; if you are not root you have
# to use a port above 1024; good defaults are 8000, 8001, 8080
#
Port 80
#
# General setup; on some systems, like HP, nobody is defined so
# that setuid() fails; in those cases use a different user id.
#
UserId nobody
GroupId nogroup
#
# Logging; if you want logging uncomment these lines and specify
# locations for your access and error logs
#
# AccessLog /where/ever/httpd-log
# ErrorLog /where/ever/httpd-errors
LogFormat Common
LogTime LocalTime
#
# User-supported directories under ~/public_html
#
UserDir public_html
#
# Scripts; URLs starting with /cgi-bin/ will be understood as
# script calls in the directory /your/script/directory
#
Exec /cgi-bin/* /your/script/directory/*
#
# URL translation rules; If your documents are under /local/Web
# then this single rule does the job:
#
Pass /* /local/Web/*
\end{verbatim}
\section{Normal HTTP Server With Access Control}
\begin{verbatim}
#
# Sample configuration file for cern_httpd for running it
# as a normal HTTP server WITH access control.
#
# See:
#
#
# for more information.
#
# Written by:
# Ari Luotonen April 1994
#
#
# Set this to point to the directory where you unpacked this
# distribution, or wherever you want httpd to have its "home"
#
ServerRoot /where/ever/server_root
#
# The default port for HTTP is 80; if you are not root you have
# to use a port above 1024; good defaults are 8000, 8001, 8080
#
Port 80
#
# General setup; on some systems, like HP, nobody is defined so
# that setuid() fails; in those cases use a different user id.
#
UserId nobody
GroupId nogroup
#
# Logging; if you want logging uncomment these lines and specify
# locations for your access and error logs
#
# AccessLog /where/ever/httpd-log
# ErrorLog /where/ever/httpd-errors
LogFormat Common
LogTime LocalTime
#
# User-supported directories under ~/public_html
#
UserDir public_html
#
# Protection setup by usernames; specify groups in the group
# file [if you need groups]; create and maintain password file
# with the htadm program
#
Protection PROT-SETUP-USERS {
UserId nobody
GroupId nogroup
ServerId YourServersFancyName
AuthType Basic
PasswdFile /where/ever/passwd
GroupFile /where/ever/group
GET-Mask user, user, group, group, user
}
#
# Protection setup by hosts; you can use both domain name
# templates and IP number templates
#
Protection PROT-SETUP-HOSTS {
UserId nobody
GroupId nogroup
ServerId YourServersFancyName
AuthType Basic
PasswdFile /where/ever/passwd
GroupFile /where/ever/group
GET-Mask @(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu)
}
Protect /very/secret/URL/* PROT-SETUP-USERS
Protect /another/secret/URL/* PROT-SETUP-HOSTS
#
# Scripts; URLs starting with /cgi-bin/ will be understood as
# script calls in the directory /your/script/directory
#
Exec /cgi-bin/* /your/script/directory/*
#
# URL translation rules; If your documents are under /local/Web
# then this single rule does the job:
#
Pass /* /local/Web/*
\end{verbatim}
\section{Proxy Configuration With Caching}
The configuration {\bf without caching} is otherwise the
same, just leave out all the directives starting with
"{\tt Cache}" or "{\tt Gc}".
\begin{verbatim}
#
# Sample configuration file for cern_httpd for running it
# as a proxy server WITH caching.
#
# See:
#
#
# for more information.
#
# Written by:
# Ari Luotonen April 1994
#
#
# Set this to point to the directory where you unpacked this
# distribution, or wherever you want httpd to have its "home"
#
ServerRoot /where/ever/server_root
#
# Set the port for proxy to listen to
#
Port 8080
#
# General setup; on some systems, like HP, nobody is defined so
# that setuid() fails; in those cases use a different user id.
#
UserId nobody
GroupId nogroup
#
# Logging; if you want logging uncomment these lines and specify
# locations for your access and error logs
#
# AccessLog /where/ever/proxy-log
# ErrorLog /where/ever/proxy-errors
LogFormat Common
LogTime LocalTime
#
# Proxy protections; if you want only certain domains to use
# your proxy, uncomment these lines and specify the Mask
# with hostname templates or IP number templates:
#
# Protection PROXY-PROT {
# ServerId YourProxyName
# Mask @(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu)
# }
# Protect * PROXY-PROT
#
# Pass the URLs that this proxy is willing to forward.
#
Pass http:*
Pass ftp:*
Pass gopher:*
Pass wais:*
#
# Enable caching, specify cache root directory, and cache size
# in megabytes
#
Caching On
CacheRoot /your/cache/root/dir
CacheSize 5
#
# Specify absolute maximum for caching time
#
CacheClean * 2 months
#
# Specify the maximum time to be unused
#
CacheUnused http:* 2 weeks
CacheUnused ftp:* 1 week
CacheUnused gopher:* 1 week
#
# Specify default expiry times for ftp and gopher;
# NEVER specify it for HTTP, otherwise documents generated by
# scripts get cached which is usually a bad thing.
#
CacheDefaultExpiry ftp:* 10 days
CacheDefaultExpiry gopher:* 2 days
#
# Garbage collection controls; daily garbage collection at 3am;
#
Gc On
GcDailyGc 3:00
\end{verbatim}
\chapter{{}
CERN Server CGI/1.1 Script Support}
Server scripts are used to handle searches,
clickable images and forms, and to
produce synthesized documents on the fly. See calendar and finger gateway for
examples. \par
\par
\section{In This Section...}
\begin{itemize}
\item Using {\tt Exec} rule to allow scripts
\item CGI Interface $--$ Script Input
\item CGI Interface $--$ Script Output
\item NPH-Scripts $--$ No Parsing of Headers
\item Setting up a search script
\end{itemize}
\par
\section{{} Important Note!}
CERN {\tt httpd} versions 2.15 and newer have
{\bf two} script interfaces. The other one is the official
CGI,
Common Gateway Interface, which enables scripts to be shared
between different server implementations (NCSA server, Plexus, etc).
The other one is the original, very easy-to-use, interface, that was
introduced in version 2.13. \par
{\bf Use of CGI instead of the old interface is strongly
encouraged.}\par
{\bf IMPORTANT:} If you have, or wish to write, scripts
that use the old interface, your script name has to end in
{\tt .pp} suffix (comes from "Pre-Parsed"). URLs referring to
these scripts should not contain this suffix. This is to make it
easier to later upgrade to CGI scripts, so you only need to change
the script name in the file system, and not the documents pointing to
it. If you absolutely want to use the old interface (which is nice
for quick hacks that don't need to be portable), see the doc. \par
\par
\section{Setting Up httpd To Call Scripts}
The server knows that a request is actually a script request by
looking at the beginning of the URL pathname. You can specify these
special strings in the configuration file
{\tt (/etc/httpd.conf)} by {\tt Exec} rules:
\begin{verbatim}
Exec /url-prefix/* /physical-path/*
\end{verbatim}
Where {\it /url-prefix/\/} is the special string that signifies a
script request, and {\it /physical-path/\/} is the absolute filesystem
pathname of the {\bf directory} that contains your scripts.
\par
\subsection{Example}
\begin{verbatim}
Exec /htbin/* /usr/etc/cgi-bin/*
\end{verbatim}
makes URL paths starting with {\tt /htbin} to be mapped to
scripts in directory {\tt /usr/etc/cgi-bin.} I.e.
requesting
\begin{verbatim}
/htbin/myscript
\end{verbatim}
causes a call to script
\begin{verbatim}
/usr/etc/cgi-bin
\end{verbatim}
\subsection{Historical Note}
In {\tt httpd} versions before 2.15 there was an
{\tt HTBin} directive:
\begin{verbatim}
HTBin /physical-path
\end{verbatim}
which is now obsolite, but understood by the server to mean
\begin{verbatim}
Exec /htbin/* /physical-path/*
\end{verbatim}
Use of {\tt Exec} rule instead is recommended for its
generality. \par
\par
\section{Information Passed to CGI Scripts}
CGI scripts get their input mainly from environment
variables and standard
input (when using {\tt POST} method). Search scripts get
keywords also as command
line arguments. \par
Most important environment variables are:
\begin{DL}{allow this much space}
\item[{\tt QUERY\_STRING}]
The query part of URL, that is, everything that follows the
question mark. This string is URL-encoded, meaning that
special characters like spaces and newlines are encoded into
their hex notation (\%xx), and characters like {\tt + =
\&} have a special meaning.
The contents of this variable can be easily parsed using the
{\tt cgiparse} program. \par
\item[{\tt PATH\_INFO}]
Extra path information given after the script name, for
example with {\tt Exec} rule:
\begin{verbatim}
Exec /htbin/* /usr/etc/cgi-bin/*
\end{verbatim}
a URL with path
\begin{verbatim}
/htbin/myscript/extra/pathinfo
\end{verbatim}
will execute the script {\tt /usr/etc/cgibin/myscript}
with {\tt PATH\_INFO} environment variable set to
{\tt /extra/pathinfo}. \par
\item[{\tt PATH\_TRANSLATED}]
Extra pathinfo translated through the rule system. (This
doesn't always make sense.) \par
\end{DL}
See also NCSA's
primer to writing CGI scripts. \par
\par
\section{Results From Scripts}
Scripts return their results either outputting a document to their
standard output, or by outputting the location of the
result document (either a full URL or a local virtual path).
\par
\subsection{Outputting a Document}
Script result must begin with a {\tt Content-Type:} line giving
the document content type, followed by {\bf an empty line}.
The actual document follows the empty line.
Example:
\begin{verbatim}
Content-Type: text/html
Script test>
My First Virtual Document
....
\end{verbatim}
\par
\subsection{Giving Document Location}
If the script wants to return an existing document (local or remote),
it can give a {\tt Location:} header followed by an empty line:
Example:
\begin{verbatim}
Location: http://info.cern.ch/hypertext/WWW/TheProject.html
\end{verbatim}
This causes the server to send a redirection to client, which then
retrieves that document. If {\tt Location} starts with a slash
(is not a full URL), it is taken to be a virtual path for a document
on the same machine, and server passes this string right away through
the rule system and serves that document as if it had been requested
in the first place. In this case clients don't do the redirection,
but the server does it "on the fly". \par
Example:
\begin{verbatim}
Location: /hypertext/WWW/TheProject.html
\end{verbatim}
Understand, that this is a {\bf virtual path}, so after
translations it might be, for example,
{\tt /Public/Web/TheProject.html}. \par
{\bf Important:} Only {\bf full} URLs in
{\tt Location} field can contain the {\it \#label\/} part of URL,
because that is meant only for the client-side, and the server cannot
possibly handle it in any way. \par
\par
\subsection{NPH-Scripts (No-Parse-Headers)}
Script wishing to output the entire HTTP reply (including status line
and all response headers) should be named to begin with
{\tt nph-} prefix. This makes {\tt httpd} connect
script's output stream directly to requesting client reducing the
overhead of server needlessly parsing the response headers. \par
\subsubsection{Example Of NPH-Script Output}
\begin{verbatim}
HTTP/1.0 200 Script results follow
Server: MyScript/1.0 via CERN/3.0
Content-Type: text/html
Just testing...
Output From NPH-Script
Yep, seems to work.
\end{verbatim}
\par
\section{Setting Up A Search Script}
There is a special {\tt Search} directive in the configuration
file givin the {\bf absolute} pathname of the script
performing the search:
\begin{verbatim}
Search /absolute/path/search
\end{verbatim}
Every time a document is searched, this script is called with
\begin{DL}{allow this much space}
\item[Command line]
containing the search keywords decoded, one in each of
{\tt argv\lbrack 1\rbrack }, {\tt argv\lbrack 2\rbrack }, ...
\item[{\tt QUERY\_STRING}]
containing the query string encoded, as it came in the URL
after the question mark.
\item[{\tt PATH\_INFO}]
Virtual path of the document that the search was issued from.
\item[{\tt PATH\_TRANSLATED}]
Absolute filesystem path of the document.
\end{DL}
Search results are output in the usual way:
\begin{verbatim}
Content-Type: text/html
...generated document...
\end{verbatim}
\par
\chapter{{}
cgiparse Manual}
{\tt cgiparse} handles {\tt QUERY\_STRING} environment
variable parsing for CGI scripts. It comes with CERN server
distributions {\bf 2.15} and newer. \par
If the {\tt QUERY\_STRING} environment variable is not set, it
reads {\tt CONTENT\_LENGTH} characters from its standard input.
\par
\par
\section{Command Line Options}
\subsection{Main Options}
\begin{DL}{allow this much space}
\item[ {\tt cgiparse -keywords}]
Parse {\tt QUERY\_STRING} as search keywords. Keywords
are decoded and written to standard output, one per line. \par
\item[ {\tt cgiparse -form}]
Parse {\tt QUERY\_STRING} as form request.
Outputs a string which, when {\tt eval}'ed by Bourne shell,
will set shell variables beginning with {\tt FORM\_}
appended with field name. Field values are the contents of
the variables. \par
\item[ {\tt cgiparse -value } {\it fieldname\/}]
Parse {\tt QUERY\_STRING} as form request.
Prints only the value of field {\it fieldname\/}. \par
\item[ {\tt cgiparse -read}]
Just read {\tt CONTENT\_LENGTH} characters from
{\tt stdin} and write them to {\tt stdout.} \par
\item[ {\tt cgiparse -init}]
If {\tt QUERY\_STRING} is not defined, read
{\tt stdin} and output a string that when
{\tt eval}'d by Bourne shell it will set
{\tt QUERY\_STRING} to its correct value. This can be
used when the same script is used with both {\tt GET}
and {\tt POST} method. Typical use in the beginning of
Bourne shell script:
\begin{verbatim}
eval `cgiparse -init`
\end{verbatim}
After this command the {\tt QUERY\_STRING} environment
variable will be set regardless of whether {\tt GET} or
{\tt POST} method was used. Therefore
{\tt cgiparse} may be called multiple times in the same
script (otherwise with {\tt POST} it could only be
called once because after that the {\tt stdin} would be
already read, and the next {\tt cgiparse} would hang).
\par
\end{DL}
\par
\subsection{Modifier Options}
\begin{DL}{allow this much space}
\item[ {\tt -sep } {\it separator\/}]
Specify the string used to separate multiple values. With \begin{itemize}
\item {\tt -value} default is newline
\item {\tt -form} default is "{\it , \/}"
\end{itemize} \par
\item[ {\tt -prefix } {\it prefix\/}]
\begin{itemize}
\item Only with {\tt -form.}
Specify the prefix to use when making up environment
variable names. Default is "{\it FORM\_\/}". \par
\end{itemize}
\item[ {\tt -count}]
With \begin{itemize}
\item {\tt -keywords} outputs the number of keywords
\item {\tt -form} outputs the number of unique fields
(multiple values are counted as one)
\item {\tt -value } {\it fieldname\/} gives the number of
values of field {\it fieldname\/} (no such field is
zero, one field gives 1, one multiple 2, etc).
\end{itemize} \par
\item[ {\tt -}{\it number\/} , e.g. {\tt -2}]
With \begin{itemize}
\item {\tt -keywords} gives {\it n\/}'th keyword
\item {\tt -form} gives all the values of {\it n\/}'th
field
\item {\tt -value } {\it fieldname\/} gives {\it n\/}'th
of the multiple values of field {\it fieldname\/}
(first value is number 1).
\end{itemize} \par
\item[ {\tt -quiet}]
Suppress all error messages. (Non-zero exit status still
indicates error.) \par
\end{DL}
All options have one-character equivalents:
{\tt -k -f -v -r -i -s -p -c -q} \par
\par
\section{Exit Statuses}
\begin{itemize}
\item {\tt 0 } Success
\item {\tt 1 } Illegal command line
\item {\tt 2 } Environment variables not set correctly
\item {\tt 3 } Failed to get requested information (no such
field, {\tt QUERY\_STRING} contains
keywords when form field values requested,
etc).
\end{itemize}
\par
\section{Examples}
Note: In real life, of course, {\tt QUERY\_STRING} is already
set by the server. \par
Here {\tt \$} is the Bourne shell prompt. \par
\par
\subsection{Keyword Search}
\begin{verbatim}
$ QUERY_STRING="is+2%2B2+really+four%3F"
$ export QUERY_STRING
$ cgiparse -keywords
is
2+2
really
four?
$
\end{verbatim}
\par
\subsection{Parsing All Form Fields}
\begin{verbatim}
$ QUERY_STRING="name1=value1&name2=Second+value%3F+That%27s right%21"
$ export QUERY_STRING
$ cgiparse -form
FORM_name1='value1'; FORM_name2='Second value? That'\''s right!'
$ eval `cgiparse -form`
$ set
...
FORM_name1=value1
FORM_name2=Second value? That's right!
...
$
\end{verbatim}
\par
\subsection{Extracting Only One Field Value}
\begin{verbatim}
QUERY_STRING as in previous example.
$ cgiparse -value name1
value1
$ cgiparse -value name2
Second value? That's right!
$
\end{verbatim}
\par
\chapter{{}
cgiutils Manual}
{\tt cgiutils} program is provided to make it easier to produce
easily a full HTTP1 response header by NPH \lbrack No-Parse-Headers\rbrack scripts.
It can also be used to just calculate the {\tt Expires:}
header, given the time to live in a human-friendly way, like
\begin{verbatim}
1 year 3 months 2 weeks 4 days 12 hours 30 mins 15 secs
\end{verbatim}
\section{Command Line Options}
\begin{DL}{allow this much space}
\item[ {\tt cgiutils -version}
] print the version information. \par
\item[ {\tt -nodate}
] don't produce the {\tt Date:} header. \par
\item[ {\tt -noel}
] don't print the empty line after headers \lbrack in case you want to
output other MIME headers yourself after the initial header
lines\rbrack . \par
\item[ {\tt -status } {\it nnn\/}
] give full HTTP1 response, instead of just a set of HTTP headers,
with HTTP status code {\it nnn\/}. \par
\item[ {\tt -reason } {\it explanation\/}
] specify the reason line for HTTP1 response \lbrack can only be used with
the {\tt -status } {\it nnn\/} options. \par
\item[ {\tt -ct } {\it type/subtype\/}
] specify the MIME content-type. \par
\item[ {\tt -ce } {\it encoding\/}
] specify the content-encoding \lbrack e.g. {\tt x-compress},
{\tt x-gzip}\rbrack . \par
\item[ {\tt -dl } {\it language-code\/}
] specify the content-languge code. \par
\item[ {\tt -length } {\it nnn\/}
] specify the MIME content-length value. \par
\item[ {\tt -expires} {\it time-spec\/}
] specify the time to live, like {\tt "2 days 12 hours"},
and {\tt cgiutils} will compute the {\tt Expires:}
field value \lbrack which is the actual expiry date and time in GMT and
in format specified by HTTP spec\rbrack . \par
\item[ {\tt -expires now}
] means immediate expiry. Often this is exactly what the scripts
should output. \par
\item[ {\tt -uri } {\it URI\/}
] specify the {\it URI\/} for the returned document. \par
\item[ {\tt -extra } {\it xxx: yyy\/}
] specify an extra header which cannot otherwise be specified for
{\tt cgiutils}. \par
\end{DL}
{} Make sure that you quote
the option arguments that are more than one word:
\begin{verbatim}
cgiutils -expires "2 days 12 hours 30 mins"
\end{verbatim}
\section{Examples}
\begin{verbatim}
cgiutils -status 200 -reason "Virtual doc follows" -expires now
==>
HTTP/1.0 200 Virtual doc follows
MIME-Version: 1.0
Server: CERN/2.17beta
Date: Tuesday, 05-Apr-94 03:43:46 GMT
Expires: Tuesday, 05-Apr-94 03:43:46 GMT
\end{verbatim}
{} There is an empty line after
the output to mark the end of the MIME header section; if you don't
want this \lbrack you want to output some more headers yourself\rbrack , specify the
{\tt -noel} (NO-Empty-Line) option. \par
Note also that {\tt cgiutils} gives automatically the
{\tt Server:} header because it is available in the CGI
environment. The {\tt Date:} field is also automatically
generated unless {\tt -nodate} option is specified. \par
To get only the expires field don't specify the {\tt -status}
option. If you don't want the empty line after the header line use
also the {\tt -noel} option:
\begin{verbatim}
cgiutils -noel -expires "2 days"
==>
Expires: Thursday, 07-Apr-94 03:44:02 GMT
\end{verbatim}
\par
\chapter{
{}
CERN Server Clickable Image Support}
CERN Server versions 2.14 and newer have a {\tt htimage}
program in the distribution, which is an {\tt /htbin} program
handling clicks on sensitive images. For versions 2.15 and newer it
is a CGI program (uses the Common
Gateway Interface to communicate with {\tt httpd}). See demo. \par
\par
\section{In This Section...}
\begin{itemize}
\item {\tt htimage} installation
\item Writing documents that contain clickable images
\item Image configuration file
\item Output of {\tt htimage}
\end{itemize}
\par
\section{Installing htimage Binary}
After compiling {\tt htimage} you should move the executable
binary to the same directory as your other server scripts are, and
remember to set up an exec rule. For example if your scripts are in
{\tt /usr/etc/cgi-bin}, you could have an {\tt Exec}
rule like this:
\begin{verbatim}
Exec /htbin/* /usr/etc/cgi-bin/*
\end{verbatim}
Often {\tt htimage} is one of the most often used scripts, and
it would therefore be nice to refer to it with as short a name as
possible, like {\tt /img}, so you could have a {\tt Map}
rule just before the {\tt Exec}:
\begin{verbatim}
Map /img/* /htbin/htimage/*
Exec /htbin/* /usr/etc/cgi-bin/*
\end{verbatim}
\par
\section{Writing a Document With Clickable Images}
To create a clickable image in your HTML document, you'll need to:
\begin{itemize}
\item specify {\tt ISMAP} in your inlined image call, and
\item make that image an anchor, with an {\tt HREF}
to the script handling the request {\tt (htimage)} with
image configuration file name appended to it.
\end{itemize}
Each clickable image has to be described to {\tt htimage} via
an image configuration file. These files are referred to by the extra path information in the URL
causing the call to {\tt htimage}:
\begin{verbatim}
\end{verbatim}
Image configuration file can be:
\begin{itemize}
\item either a virtual path, that is translated through rule system,
\item or an absolute path in your filesystem.
\end{itemize}
{\tt htimage} will look for both of these (afterall, it gets
both {\tt PATH\_INFO} and {\tt PATH\_TRANSLATED}
environment variables from {\tt httpd} anyway). \par
You can even do some very smart mappings in the rule file to allow
very short references to {\tt htimage} and picture
configuration files. Let's suppose all your image configuration files
are in directory {\tt /usr/etc/images}. Then you can use the
following two rules in your server's configuration file (by default
{\tt /etc/httpd.conf}):
\begin{verbatim}
Map /img/* /htbin/htimage/usr/etc/images/*
Exec /htbin/* /usr/etc/cgi-bin/*
\end{verbatim}
In this case you can refer to your image mapper very easily; if you
have an image configuration file {\tt Dragons.conf} in
{\tt /usr/etc/images} directory, all you need to say in the
anchor is this:
\begin{verbatim}
\end{verbatim}
\par
\section{Image Configuration File}
There are four keywords:
\begin{DL}{allow this much space}
\item[{\tt default} {\em URL\/}]
{\em URL\/} which is used if click is in none of the given shapes.
This should always be set! \par
\item[{\tt circle} ({\em x\/},{\em y\/}) {\em r\/} {\em URL\/}]
Circle with center point {\em (x,y)\/} and radius {\em r\/}. \par
\item[{\tt rectangle} ({\em x1\/},{\em y1\/}) ({\em x2\/},{\em y2\/}) {\em URL\/}]
Rectangle with (any) two opposite corners having coordinates {\em (x1,y1)\/}
and {\em (x2,y2)\/}. \par
\item[{\tt polygon} ({\em x1\/},{\em y1\/}) ({\em x2\/},{\em y2\/}) ...
({\em xn\/},{\em yn\/}) {\em URL\/}]
Polygon having adjacent vertices {\em (xi,yi)\/}. If the path given
is not closed (first and last coordinate pairs aren't the same)
the first and last coordinate pairs will be connected by {\tt htimage.}
So first point is added also as the last one if necessary. \par
\end{DL}
These can be abbreviated as {\tt def, circ, rect, poly.} \par
Shapes are checked in the order they appear in config file, and the
URL corresponding to the first match is returned. If none match, the
{\tt default} URL is returned. \par
{\em URL\/}s are
\begin{itemize}
\item either full URLs (with access method, machine name and path), in
which case server sends a redirection to client,
\item or a partial URL containing only pathname part of it (always starting
with a slash), in which case server considers that as the original request,
translates it through the rule system, access authorization and serves it
normally (faster than sending redirection).
\end{itemize}
\par
\section{Output Produced by htimage}
{\tt htimage} prints a single {\tt Location:} field
to its {\tt stdout}, or an error message with preceding
{\tt Content-Type: text/html} so in fact {\tt htimage}
behaves exactly as any other CGI/1.0 program (script), and is not
in any way handled specially by the server. Therefore, you can
rename {\tt htimage} to whatever you prefer, like we called it
{\tt /img} in the above example. \par
Server understands this {\tt Location:} field, and either
directly sends that file to the client (non-full URL), or sends a
redirection to client causing it to fetch the document, maybe even
from another machine. \par
Note that URLs returned by {\tt htimage} may well be other
script requests - there is no reason for being limited to just regular
documents. \par
\par
\chapter{{}
Protected CERN Server Setup}
Access can be restricted according to user name, internet address, or
both. Access control can be tree-level, file level, or both.\par
\par
\section{In This Section...}
\begin{itemize}
\item Password File
\item Group File
\item Protect Directive in Configuration File
\item Protection Setup File
\item Protecting a Tree of Documents
\item Protecting Individual Files
\item Using Two-Level Protection
\item Embedding the Protection Setup in the
Configuration File Itself
\item Access Control List File
\end{itemize}
\par
\section{Password File}
If user-wise access control is used there has to be a password file
listing all the users and their encrypted passwords. Password file
can be maintained by {\tt htadm}
program which is a part ot CERN {\tt httpd} distribution. \par
{} Unix password files are understood
by CERN daemon (but not vice versa). However, {\bf Unix users are
in no way connected to the WWW access authorization.} \par
\par
\section{Group File}
Group file contains declarations of groups containing users and other
groups, with possibly an IP address template. Group declarations as
viewed from top-level look like this:
\begin{verbatim}
groupname: item, item, item
\end{verbatim}
The list of items is called a group definition.
Each {\tt item} can be a username, an already-defined
groupname, or a comma-separated list of user and group names in
parentheses. Any of these can be followed by an at sign {\tt @}
followed by either a single IP address template, or a comma-separated
list of IP address templates in parentheses. The following are valid
group declarations:
\begin{verbatim}
authors: john, james
trusted: authors, jim
cern_people: @128.141.*.*
hackers: marca@141.142.*.*, sanders@153.39.*.*,
(luotonen, timbl, hallam)@128.141.*.*,
cailliau@(128.141.201.162, 128.141.248.119)
cern_hackers: hackers@128.141.*.*
\end{verbatim}
If an item contains only IP address template part all users from those
addresses are accepted (e.g. {\tt cern\_people} above). Note the
last two declarations: {\tt cern\_hackers} group is made up of
the {\tt hackers} group by restricting it further according to
IP address.\par
Group definition can be continued to next line after any comma in the
definition. Forward references in group file are illegal (i.e. to use
group name before it is defined).\par
Group definition syntax is valid not only in group file, but also in
\begin{itemize}
\item {\tt GetMask} in protection setup file, and
\item in last field in ACL entries.
\end{itemize}
\par
\par
\section{Server Configuration File}
Typically you protect a tree of documents by {\tt protect} rule
in rule file, and specify authorized persons and IP addresses in the
protection setup file or access control list file:
\begin{verbatim}
Protect /very/secret/* /WWW/httpd.setup
\end{verbatim}
If there are Unix file system protections set up so that there is no
world read-permission the daemon naturally has to run as the owner or
the group member of those files.\par
However, if there are protected trees owned by different people this
doesn't work. In that case {\em the daemon has to run as
{\tt root}, and the user and group ids have to be specified in
the {\tt protect} rule,\/} e.g.:
\begin{verbatim}
Protect /kevin/secret/* /WWW/httpd.setup1 kevin.www
Protect /marcus/secret/* /WWW/httpd.setup2 marcus.nogroup
\end{verbatim}
\par
\section{Protection Setup File}
Each {\tt protect} rule has an associated protection setup
file. It specifies valid authentication schemes, password and group
files, and password server-id:
\begin{verbatim}
AuthType Basic
ServerId OurCollaboration
PasswordFile /WWW/Admin/passwd
GroupFile /WWW/Admin/group
\end{verbatim}
Password server id needs not be a real machine name. It's only purpose
is to inform the browser about which password file it is using
(different protection setups on the same machine can use different
password file and that would otherwise confuse pseudo-intelligent
clients trying to figure out which password to send).\par
{}
Same server-ids on different machines are considered
different by clients (otherwise this would be a security hole).\par
\par
\subsection{Protecting Entire Tree As One Entity}
If you want to control access only to entire trees of documents and
don't care to restrict access differently to individual files, it
suffices to give a {\tt GetMask} in setup file (and you
don't need any ACL files):
\begin{verbatim}
GetMask group, user, group@address, ...
\end{verbatim}
Group definition has the same syntax as in group file.\par
\par
\subsection{Protecting Individual Files Differently}
When each individual file needs to be protected separately you should
use an ACL (access control list) file in the same directory as the
protected files. After that no file in that directory can be accessed
unless there is a specific entry in ACL allowing it.\par
In this case you don't need the {\tt GetMask} in setup
file.\par
\par
\subsection{Restricting Access Even Further}
There may be both {\tt GetMask} {\em and\/} an ACL, in
which case both conditions must be met. This is typically used so
that {\tt GetMask} defines a general group of people allowed
to access the tree, and ACLs restrict access even further.\par
\par
\section{Protection Setup Embedded
in the Configuration File}
Often it is not necessary to have the protection information in a
different file; as a new feature {\tt cern\_httpd} allows
protection setup to be "embedded" inside the configuration file itself.
\par
Instead of writing the setup in a different file and referring to it
by the filename, you can use the {\tt Protection} directive to
define the protection setup and bind it to a name, and later refer to
this setup via that name. \par
The previous example could be written into the main configuration as
follows:
\begin{verbatim}
Protection PROT-NAME {
UserId marcus
GroupId nogroup
AuthType Basic
ServerId OurCollaboration
PasswordFile /WWW/Admin/passwd
GroupFile /WWW/Admin/group
GetMask group, user, group@address, ...
}
Protect /private/URL/* PROT-NAME
Protect /another/private/* PROT-NAME
\end{verbatim}
{} Note that since the protection setup is in
the same file as the other configuration directives, it is also
possible to specify the {\tt UserId} and {\tt GroupId}
for the server to run as, without it being a security hole. With
external protection setup this is made impossible because of security
reasons; that is why there is an extra field after the protection
setup filename specifying the user and group ids in that case:
\begin{verbatim}
Protect /kevin/secret/* /WWW/httpd.setup1 kevin.www
Protect /marcus/secret/* /WWW/httpd.setup2 marcus.nogroup
\end{verbatim}
If you need a given protection setup only once there is no need to first
bind it to a name and then refer to it by that name, but rather just
combine the two:
\begin{verbatim}
Protect /private/URL/* {
UserId marcus
GroupId nogroup
AuthType Basic
ServerId OurCollaboration
PasswordFile /WWW/Admin/passwd
GroupFile /WWW/Admin/group
GetMask group, user, group@address, ...
}
\end{verbatim}
{} {\tt httpd} is not
very robust in parsing this particular directive; make sure you have a
space between the URL template and the curly brace, and that the
ending curly brace is alone on that line. Also, comments are
{\bf not} allowed inside the protection setup definition.
\par
\par
\section{Access Control List File}
ACL file is a file named {\tt .www\_acl} in the same directory
as the files the access of which it is controlling. It looks typically
something like this:
\begin{verbatim}
secret*.html : GET,POST : trusted_people
minutes*.html: GET,POST : secretaries
*.html : GET : willy,kenny
\end{verbatim}
It is worth noticing that all the templates are matched agaist (unlike
in rule file where translation of rules stops in {\tt pass} and
{\tt fail.}. So in the previous example all the HTML files are
accessible to {\tt willy} and {\tt kenny,} even those
matching the two previous templates.\par
The last field is just a list of users and group (possibly at required
IP addresses), and in fact this field is in same syntax as group file.\par
When {\tt PUT} method will be implemented it can appear in the
middle field separated by a comma from {\tt get}:
\begin{verbatim}
*.html : GET,PUT : authors
\end{verbatim}
\par
\par
\section{{}
Manual Page For htadm}
CERN {\tt httpd} password file can be maintained with
{\tt htadm} program which is a part ot CERN {\tt httpd}
distribution. \par
\par
\subsection{Command Line Options and Parameters}
\begin{DL}{allow this much space}
\item[ {\tt htadm -adduser } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password {\tt \lbrack }realname{\tt \rbrack \rbrack \rbrack }\/}
] adds a user into the password file (fails if there is
already a user by that name).\par
\item[ {\tt htadm -deluser } {\it passwordfile {\tt \lbrack }username{\tt \rbrack }\/}
] deletes a user from the password file (fails if there
is no user by that name).\par
\item[ {\tt htadm -passwd } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password{\tt \rbrack \rbrack }\/}
] changes user's password (fails if there is no such user).\par
\item[ {\tt htadm -check } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password{\tt \rbrack \rbrack }\/}
] checks user's password (fails if there is no such user).
Writes either {\tt Correct} or {\tt Incorrect}
to standard output.
Also indicates password correctness by a zero return value. \par
\item[{\tt htadm -create } {\it passwordfile\/}
] creates an empty password file. \par
\end{DL}
If {\tt {\it password\/}} or even {\tt {\it username\/}}
is missing in either of the previous cases they are prompted
interactively. {\tt {\it passwordfile\/}} must be always
specified. Missing real name is also prompted when adding a new
user.\par
\par
{}
Do NOT use {\tt htadm} to add new users to the actual Unix
password file {\tt /etc/passwd,} entries written by
{\tt htadm} are missing some necessary fields to Unix. \par
{} Passwords should not be longer
than 8 characters (this is a restriction from linemode clients using C
library function {\tt getpass()} to read the password $--$ there
is no other cause for this restriction; the maximum hardcoded password
size is actually much larger, and if you only use GUI or other clients
that are able to read this long passwords, feel free to use them). \par
{} {\tt htadm}
destroys the password from command line as soon as possible so that it
is very unlikely to see somebody's password by looking at the process
listing on the machine (with {\tt ps}, for example).\par
\par
\chapter{{} Proxies}
Proxy is a HTTP server typically running on a firewall machine,
providing with access to the outside world for people inside the
firewall. {\tt cern\_httpd} can be
configured to run as a proxy. Furthermore, it is able to perform
caching of documents, resulting in faster response times. \par
I (Ari Luotonen, CERN) and Kevin Altis from Intel have written a joint
paper about proxies
which will be presented in the
WWW94 Conference. \par
\par
\section{In This Section...}
\begin{itemize}
\item Server setup
\item Proxy protection
\item Configuring proxy to use another proxy
\item Caching
\item Client setup
\end{itemize}
\par
\section{Setting Up cern\_httpd To Run as a Proxy}
{\tt cern\_httpd} runs as a proxy if
its configuration file allows URLs starting with corresponding access
method to be passed. Typical proxy configuration file reads:
\begin{verbatim}
pass http:*
pass ftp:*
pass gopher:*
pass wais:*
\end{verbatim}
{\bf Note} that {\tt cern\_httpd} is capable of running as a
regular HTTP server at the same time; just add your normal rules after
those ones. \par
{} The {\tt proxy\_xxx} environment
variables that are used to redirect clients to use a proxy also
affect the proxy server itself. If this is not your intention make sure that those variables
are not set in {\tt httpd}'s environment. \par
\par
\section{Proxy Protection}
{\tt cern\_httpd} 2.17 and newer provide a mechanism to protect
the proxy against unauthorized use (in fact, the machinery behind this
is the same that is used to set up document protection when running as
a regular HTTP server). \par
\subsection{Enabling and Disabling HTTP Methods}
By default only {\tt HEAD}, {\tt GET} and
{\tt POST} methods are allowed to go through the proxy. You
can enable more methods using the {\tt Enable} directive in the
configuration file:
\begin{verbatim}
Enable PUT
Enable DELETE
\end{verbatim}
The {\tt Disable} directive disables methods:
\begin{verbatim}
Disable POST
\end{verbatim}
\subsection{Defining Allowed Hosts}
A certain protection setup is defined to the proxy as a single entity
that is given a name. Later, when protecting certain URLs this name
is used to refer to the protection setup. (The name can also be the
absolute pathname of the file that defines the protection, if one
wishes to store protection information in a different file.) \par
Protection is defined as follows:
\begin{verbatim}
Protection protname {
Mask @(*.cern.ch, *.desy.de)
}
\end{verbatim}
This defines a protection that allows all request methods from domains
{\tt cern.ch} and {\tt desy.de}, and none from
elsewhere. This protection can be referred to by {\it protname\/}. \par
You can also use IP number templates:
\begin{verbatim}
Protection protname {
Mask @(128.141.*.*, 131.169.*.*)
}
\end{verbatim}
{\bf Note} that IP number templates always have four parts
separated by dots. \par
If allowed methods are different according to domain, e.g.
{\tt GET} should be allowed from both of these domains, but
{\tt POST} and {\tt PUT} only from {\tt cern.ch},
you can use {\tt GetMask}, {\tt PostMask},
{\tt PutMask} and {\tt DeleteMask} directives instead:
\begin{verbatim}
Protection protname {
GetMask @(*.cern.ch, *.desy.de)
PostMask @*.cern.ch
PutMask @*.cern.ch
}
\end{verbatim}
{\bf Note} that parentheses are necessary only if there is
more than one domain name template. \par
\subsection{Actual Protection}
The {\tt Protect} rule actually associates protection with a
URL. In case of proxy protection you would typically say:
\begin{verbatim}
Protect http:* protname
Protect ftp:* protname
Protect gopher:* protname
Protect news:* protname
Protect wais:* protname
\end{verbatim}
which would restrict all proxy use to the allowed hosts defined
previously in the protection setup {\it protname\/}.
{\bf Note} that {\it protname\/} must be defined before it
is referenced! \par
\par
\section{Caching}
{\tt cern\_httpd} running as a proxy can also perform caching of
files retrieved from remote hosts. See the configuration diretives controlling this
feature. \par
\par
\chapter{{}
CERN Server FAQ}
If you have problems, first make sure you're using the newest version.
You'll find that out by peeking into
ftp://info.cern.ch/pub/www/src. \par
When something goes wrong you should run server in verbose mode (the
{\tt -v } flag) to see exactly what is the problem. If you
usually run it from inet daemon start it now standalone to some other
port (with {\tt -p } {\it port\/} flag) with otherwise the
same parameters as in {\tt /etc/inetd.conf.} \par
\par
\section{My Scripts Get Served As Text Files...}
...or are completely unaccessible. \par
It's important to understand that rules in the configuration file
({\tt Map}, {\tt Pass}, {\tt Exec},
{\tt Fail}, {\tt Protect}, {\tt DefProt} and
{\tt Redirect}) are translated from top to bottom, and the
first matching {\tt Pass}, {\tt Exec} or
{\tt Fail} will {\bf terminate} rule translation.
\par
So, make sure that your {\tt Exec}
rule is before any general {\tt Map}pings. \par
\par
\section{How do I...}
\begin{itemize}
\item Set up access authorization?
\item Write server-side scripts?
\item Get the server to perform searches?
\item Make clickable images?
\item Handle forms?
\item Set up a proxy
\item Set up proxy caching
\end{itemize}
\par
\section{Zombies}
There used to be one zombie when running {\tt cern\_httpd}
standalone; this was fixed in version 2.17beta. If you still see zombies (more
than two that don't go away in a few minutes) it is a bug. \par
\par
\section{Inet daemon complains about looping...}
...and terminates WWW service. {\tt :-(} \par
This is a hard-coded {\tt inetd} limitation on at least
SunOS-4.1.* and NeXT, which limits maximum allowed connections
from a given host to 40 per minute. This can be exceeded by
scripts doing Web-roaming, or documents having masses of small
inlined images. \par
There is a fix for at least SunOS {\tt inetd} (100178-08), and
in Solaris this is fixed. You can also run {\tt httpd}
standalone (preferably with the {\tt -fork} command line
option). \par
{\bf Most importantly,} you should stop running
{\tt httpd} from {\tt inetd} and rather run it standalone. This
is because running from {\tt inetd} is inefficient. \par
\par
\section{Server looks at funny directories and finds nothing}
From version 2.0 until 2.15, you need to have an explicit map to
file system in your rule file, e.g.:
\begin{verbatim}
Map /* file:/*
\end{verbatim}
but 2.15 doesn't have this limitation anymore. \par
\par
\section{But the document says rule file is no longer needed}
True, but it also says you must remember to give your Web directory as
a parameter to {\tt httpd,} e.g.
\begin{verbatim}
httpd /home/me/MyGloriousWeb
\end{verbatim}
\par
\chapter{
{}
CERN httpd 2.15 Release Notes}
There is one single thing that needs to be done when
changing over from {\tt httpd} 2.14 to 2.15:
\begin{verbatim}
Rename your old /htbin scripts to end in .pp suffix!
\end{verbatim}
\section{General Notes}
\begin{itemize}
\item Code tested under Purify $--$ all detected memory leaks and
bugs fixed.
\item Forking code enhanced $--$ no longer crashes when running
standalone. Everybody should start running CERN
httpd standalone instead of from inetd
\item Documentation redesigned, but still under construction
\item Contains Solaris port, but not VMS
\end{itemize}
\section{CGI/1.0, Common Gateway Interface}
\begin{itemize}
\item CGI/1.0 interface fully implemented
\item {\bf Old CERN httpd scripts will continue working if you rename
them to end with .pp suffix.} Links referencing these scrips do
NOT need to be changed. (This feature does not add any overhead to
CGI/1.0 script calls.)
\item New product cgiparse for CGI/1.0 scripts to parse QUERY\_STRING
env.var and to read CONTENT\_LENGTH characters from stdin
\item {\tt htimage} upgraded to CGI/1.0
\item The whole server-environment is propagated to CGI script, except
for variables that are reserved for CGI/1.0.
\item Scripts are spawned by doing a fork() and exec() instead of
system() $--$ more efficient and secure
\end{itemize}
\section{Firewall Gateway Modifications}
\begin{itemize}
\item Access authorization works thru firewalls
\item So does POST, therefore forms also
\item -disable/-enable command line options and Disable/Enable
configuration directives for dis/enabling HTTP methods. GET,
HEAD and POST are enabled by default.
\item Fix: text/html and text/plain not passed multiply to
servers when running as gateway
\item Fix: */*, image/* etc not expanded by the gateway
\item Fix: try local search ONLY when accessing local files
\end{itemize}
\section{Other New Features}
\begin{itemize}
\item When started standalone in non-verbose mode automatically
disconnects from terminal session and goes background
\item User-supported directories enabling URLs starting with
{\bf /\~username}
\item Redirection
\item Meta-information files to allow RFC-822-style headers to be
appended to server response header section
\item New, common logfile format, localtime default, {\tt GMT}
as an option
\item Ability to suppress logging for certain hosts/domains
according to given hostname template or IP number mask,
like {\tt *.cern.ch} or {\tt 128.141.*.*}
\item -setuid option to set server uid to authenticated uid (local)
\item Multilanguage support: same URL can be used to retrieve a
document in different languages
\item AddLanguage, AddEncoding and AddType directives to
configuration file (AddType replaces Suffix)
\item Better multiformat algorithm
\item HostName directive to configuration file for servers that want to give
CGI/1.0 scripts a different hostname than the actual. Useful
if machine has many aliases, or if httpd fails to get the full
domainname.
\item Exec rule obsoliting HTBin directive $--$ now multiple script
directories possible, with arbitrary mappings
\item Get-Mask, Post-Mask and Put-Mask for protection setup
files. Get-Mask obsolites Mask-Group
\item Groups All/Users and Anybody/Anyone/Anonymous automatically
defined. All means anybody that has been authenticated, and
Anybody is just anybody
\item Server:
\item Last-Modified:
\item Content-Length:
\item Content-Language:
\item Content-Encoding:
\item Scripts can output also Uri: and Expires: headers (this will
eventually be made more general)
\item HEAD works, also with stupid scripts that also output the body
\end{itemize}
\section{Enhancements, Fixes}
\begin{itemize}
\item The final explicit Map to filesystem in configuration file no
longer required, because it was causing confusion
\item Assume Basic authentication scheme even if not explicitly
mentioned in setup file
\item Get client DNS hostname, for the logfile among other things
\item Fail made the default when rules are translated to the end
without coming accross with a Pass, Exec or Fail rule (this is
to enhance security, it was too easy to forget the Fail * from
the end of config file)
\item Made config (rule) file understand different ways of writing
keywords, e.g.: UserDir, userdir, User-Dir, user\_dir,
UserDirectory and so on
\item The eight misplaced server-side access authorization files
moved away from libwww
\item Fix: directory indexing works with a trailing slash
\item Fix: HTSimplify() might have behaved unexpectably on some
systems (called strcpy() with overlapping args)
\end{itemize}
\par
\chapter{
{}
CERN httpd 2.16beta Release Notes}
\begin{itemize}
\item If you are upgrading from 2.15beta, you need to make {\bf no
changes}.
\item If you are upgrading from 2.14, there is one single thing that
needs to be done:
\begin{verbatim}
Rename your old /htbin scripts to end in .pp suffix!
\end{verbatim}
\end{itemize}
\section{Firewall Gateway (Proxy) Additions, Fixes}
\begin{itemize}
\item {\tt ftp} with binary files work
\item {\tt x-compress} and {\tt x-gzip} work correctly
over proxy
\item Firewalling now works through arbitrary number of proxies;
{\tt http\_proxy, ftp\_proxy, gopher\_proxy} and
{\tt wais\_proxy} configuration directives cause proxy
to connect to the outside world through another proxy.
Environment variables with the same names have same effects, but
config file is user-friendlier for this.
\item Now sends all the headers sent by client.
\item Proxy log file now gives byte count.
\item Proxy log file now gives correct status code also on error.
\end{itemize}
\section{Firewall Gateway (Proxy) Caching}
\begin{itemize}
\item {\tt CacheRoot} directive specifies cache root
directory, and turns on proxy caching.
Cache root directory must be dedicated to {\tt httpd} -
all files in there are subject to garbage collection.
\item Cache size (in megabytes) is specified by
{\tt CacheSize} directive; cache size should be several
megabytes, 50-100MB should give good results.
Cache may, however, temporarily grow a few megabytes bigger
than specified. Also, space taken up by directories
is not calculated in the current version.
\item {\tt http, ftp, gopher} with {\tt GET } method
get cached.
\item However, not caching:
\begin{itemize}
\item HTTP0 responses (you never know if it failed; also
confused HTTP1 servers sometimes output garbage in
front of HTTP1 headers).
\item Protected documents (request had
{\tt Authorization:} field).
\item Queries - they have too often side-effects. (POST
should be {\bf always} used with forms, and all
script responses should have {\tt Expires:}
header when necessary. Until then, we don't cache
them.)
\end{itemize}
\item Expiry date is extracted:
\begin{itemize}
\item From {\tt Expires:} header.
\item If not present {\tt Last-Modified:} is used to
approximate expires. If a file hasn't changed in five
months the chances are it won't change during the next
week. On the other hand, if a file has changed
yesterday, it will probably change again pretty soon.
I know this is heuristic but until all the servers
give {\tt Expires:} this works much better than
not using it, so no flames about it.
\item If {\tt Last-Modified:} not given use the time
given by {\tt CacheDefaultExpiry} directive,
default 7 days.
\end{itemize}
\item Format of cache files and directory structure under cache root
is subject to change if necessary.
No application should yet rely on any certain cache format.
Eventually I can see clients accessing cache files directly,
bypassing proxy server.
\item Caching system understands both time formats, also the one
output by old NCSA httpds.
\item Cache files get locked during transfer. Lock files time out
if something goes wrong. Timeout can be set by
{\tt CacheLockTimeOut} directive (default 20 minutes).
During the lock is in effect, further requests to the same file
get retrieved from the remote host.
\item Garbage collection directives:
\begin{itemize}
\item
\item {\tt GcMemoryUsage} to advice gc about how
radical to be in memory use (more memory =$>$ smarter
gc).
\item {\tt GcTimeInterval}, how often to do gc.
\item {\tt GcReqInterval}, after how many requests to
do gc.
\item (gc is also automatically started if cache size limit
is reached.)
\item {\tt CacheLimit\_1}, size in KB until which
files are equally valuable despite their size (200K).
\item {\tt CacheLimit\_2}, size in KB after which
files get discarded because they are too big (4MB).
\item {\tt CacheClean}, remove all files older than
this (default 21 days).
\item {\tt CacheUnused}, remove all files that have
not been used in this long time (default 14 days).
\end{itemize}
\item Garbage collector always removes all expired, too long unused,
and too old files.
\item If cache size limit is reached some files need to be
sacrified; the current algorithm takes into account:
\begin{itemize}
\item Time remaining to unconditional removal; if it expires
tomorrow it might as well be removed today.
\item Time last accessed; if it hasn't been accessed in 5
days, it probably won't be accessed anymore before it
expires.
\item Size; huge files get removed move easily.
\item Time it took to load it from the remote host;
files that were time-consuming to transfer have much
higher value. This compensates the size factor.
Load delay is the single most significant value.
\item Time it has already been in cache; ancient files
get removed more easily than fresh ones.
\end{itemize}
\end{itemize}
\section{Other New Features}
\begin{itemize}
\item Error log file.
\item {\tt Referer:} field ends up in error log when a
request fails.
\item {\tt UserId} and {\tt GroupId} to set default
uid and gid (used instead of nobody and nogroup).
\item Timeout for input and output; default time to wait for a
request is 2 minutes, and to send response 20 minutes.
Timeout causes a note to error log, and terminates child
(no more hanging httpds).
{\bf Note:} the one zombie is normal; don't report to me
about it, I may do something about it some day, or maybe I
won't. Zombie doesn't take up any other system resources
except the one process table entry.
\item Suffixes are no longer case-sensitive by default; this may be
changed via the {\tt SuffixCaseSense} configuration
directive.
\item Lou Montulli's news and proxy diffs added to the library.
\item Most command line options now also available as configuration
directives:
\begin{itemize}
\item {\tt DirAccess}
\item {\tt DirReadme}
\item {\tt AccessLog}
\item {\tt ErrorLog}
\item {\tt LogFormat}
\item {\tt LogTime}
\end{itemize}
\item {\tt -vv} command line option for Very Verbose trace
output. Outputs also request headers as they came in.
Otherwise like {\tt -v} flag.
\end{itemize}
\section{Enhancements, Fixes}
\begin{itemize}
\item NPH-scripts now work from automatically backgrounded
standalone server.
\item Fixed the many problems with
{\tt Content-Transfer-Encoding}:
\begin{itemize}
\item Mosaic uses {\tt Content-Encoding}, although
spec says {\tt Content-Transfer-Encoding};
I now output both
\item {\tt Content-Transfer-Encoding} sometimes
didn't show up although it should have, fixed.
\item {\tt Content-Transfer-Encoding} didn't come
up correctly with ftp, fixed.
\end{itemize}
\item Strange escaping fixed with directory indexing (legal
characters got escaped randomly by a gcc-compiled version).
\item Timezone bug around midnight with the new logfile format
fixed. (New logfile format is not yet default, use
{\tt -newlog} command line option, or
{\tt LogFormat} directive in configuration file.)
\item Dashes for non-existent status codes and byte counts now show
up correctly in the log.
\item Forking code once again enhanced - fixed a possible
hanging situation.
\item Log time fixed to be the time of incoming request, not the
time of request served.
\item Zombies now correctly waited away on HP (this was in fact
fixed already in 2.15beta binaries distributed after February
17th - {\bf note,} that this bug had no effect on any other
platforms ).
\item Directory listings no longer have {\tt Content-Length:}
(because it was wrong).
\item Now understands also the old Accept: syntax, with spaces as
separators between actual content-type and its parameters.
This will eventually be taken out.
\item {\tt htadm} now uses the same file creation mask as in
the original password file.
\end{itemize}
\par
\chapter{
{}
CERN httpd 2.17beta Release Notes}
\section{General New Features}
\begin{itemize}
\item {\tt PUT} and {\tt POST} can be configured to be
handled by external CGI scripts; {\tt PUT-Script} and {\tt POST-Script} directives
\item BodyTimeOut for timing out scripts waiting for input that
never comes from clients
\item {\tt IdentityCheck} directive to turn on RFC931 remote
login name checking
\item {\tt REMOTE\_IDENT} for CGI giving remote login name;
this was the only feature missing to be fully CGI/1.0 compiant
\item CGI/1.1 upgrade:
\begin{itemize}
\item all the headers without a special meaning to CGI from CGI
scripts get passed to the client
\item Status: header to specify the HTTP status code and
message for client when not using NPH scripts
\item all HTTP request header lines which are not otherwise
available to the scripts get passed as HTTP\_XXX\_YYY
environment variables
\end{itemize}
\item Understands conditional {\tt GET} request with
{\tt If-Modified-Since} header
\item {\tt kill -HUP } causes {\tt httpd} to re-read
its configuration file
\item {\tt PidFile}
directive for specifying the file to write the process id
\lbrack makes it easy to send the {\tt HUP} signal
\item {\tt ServerRoot}
directive to specify a "home directory" for {\tt httpd}
\item Directory listings with icons; by default icons are in
{\tt icons} subdirectory under {\tt ServerRoot}
\item The precompiled binaries are distributed in a {\tt tar}
packet that contains a set of default icons; the easiest way
to configure the icons is to just set the
{\tt ServerRoot} to point to the binary distribution
directory \lbrack its name is {\tt cern\_httpd}\rbrack
\item Welcome directive to
specify the name of the overview page of the directory;
default values are {\tt Welcome.html},
{\tt welcome.html} and, for compatibility with NCSA
server, {\tt index.html}. Use of {\tt Welcome}
directive will override all the defaults.
\item {\tt AlwaysWelcome} directive to configure if
{\tt /directory} and {\tt /directory/}
are to be taken to mean the same thing, or should only
{\tt /directory/} be mapped to the overview page and
{\tt /directory} produce the directory listing.
\item /\~user causes an automatic redirection to /\~user/
\item Now gives also the {\tt Date:} header.
\item {\tt Port} directive to config file specifying the port
number to listen to.
\end{itemize}
\section{Access Authorization Enhancements / Proxy Protections}
\begin{itemize}
\item Now also domain name templates, like *.cern.ch, can be used in
specifying allowed hosts, not only IP number masks
\item {\tt ACLOverRide} directive to allow ACLs to override
the {\tt Mask}s set in the protection setup \lbrack without
this feature ACLs cannot allow anything more than what the
{\tt Mask}s allow, only restrict access further\rbrack . This
directive disables {\tt Mask} checking if an ACL file
is present.
\item Since setting up protection seemed to be unnecessarily hard,
it is now possible to give the protection setup in the main
configuration file instead of having to use a different file;
it is still ok to use a different file.
\begin{itemize}
\item {\tt Protection} directive defines a protection
setup and associates a name with it:
\begin{verbatim}
Protection prot-name {
AuthType Basic
ServerId Test-Server
PasswdFile /where/ever/passwd
GroupFile /where/ever/group
UserId someuser
GroupId somegroup
GET-Mask list, of, users, and, groups
POST-Mask list, of, users, and, groups
PUT-Mask list, of, users, and, groups
}
\end{verbatim}
The content between the curly braces is the same as used to go
the the protection setup file. What's new is the possibility to
specify the {\tt UserId} and {\tt GroupId} for
the clild process when serving the request in protected mode.
This is not possible with external files for security reasons
\lbrack it is not possible inside the external file, but it
is not possible if the ids are set when calling that file; see
doc for more details\rbrack .
\item A single {\tt Mask} directive for cases when
{\tt GET-Mask}, {\tt POST-Mask} and
{\tt PUT-Mask} are the same.
\item In {\tt Protect} rule the {\it prot-name\/} is
specified instead of the file name; what's more is that
{\tt Protect} can now be used to protect also proxied
URLs:
\begin{verbatim}
Protect http:* prot-name
Protect ftp:* prot-name
Protect gopher:* prot-name
\end{verbatim}
\end{itemize}
\end{itemize}
\section{Enhancements, Fixes}
\begin{itemize}
\item Incorporated Ian Dunkin's $<$imd1707@ggr.co.uk$>$ SOCKS
modifications (thank you, Ian!); read the
{\tt README-SOCKS} file in the source code distribution
for more information.
\item {\tt SIGPIPE} causes a normal child to exit; proxy
child will correctly stop writing to client socket but still
writes to cache file \lbrack previously just kept on writing to the
socket, too\rbrack
\item 401, 402, 403, 404 errors don't go to error log anymore
\item error log contains now the host name and request
\item no longer sends {\tt Content-Transfer-Encoding}, we
agreed upon using {\tt Content-Encoding} for
compression
\item fixed funny panic message from format module in verbose mode
even though everything was ok \lbrack only aesthetic\rbrack
\item now gives again "not authorized" rather than not found if
trying to access a protected but nonexistant file; this way
even filenames don't leak
\item all time specifications in configuration file have more
readable forms:
\begin{verbatim}
1 year
2 months
3 weeks 2 days
5 days 20 hours 30 mins 2 secs
20:30
20:30:01
2 weeks 20:30
\end{verbatim}
\item Case-sense bug with {\tt LogTime},
{\tt LogFormat}, {\tt DirAccess} and
{\tt DirReadme} fixed; now paramters really are handled
in a case-insensitive manner.
\end{itemize}
\section{Proxy Additions, Fixes}
\begin{itemize}
\item Proxy protections, see above
\item Made proxy do smart guesses about the content of an unknown
file while retrieving from the remote; this will end the
problems of some files not being transferred to WinMosaic or Lynx.
{\bf IMPORTANT: Everybody, remove the rule \lbrack if you have
it\rbrack }:
\begin{verbatim}
AddType *.* text/plain
\end{verbatim}
because it would disable this smart feature.
\item Fixed a bug with unknown binary gopher files being truncated
\item Fixed the bug with trailing slashes in ftp directory listings
\item Fixed the bug with requests not being URL-encoded when
forwarding the request
\item Fixed a bug with filenames in directory listings not being
URL-encoded
\item Fixed stupid "mail-us" situation in certain situations when
ftp load fails
\end{itemize}
\section{Proxy Caching}
\begin{itemize}
\item Cache is refreshed using the conditional {\tt GET}
method \lbrack use of {\tt If-Modified-Since} header\rbrack
\item Standalone cache mode with {\tt CacheNoConnect}
directive \lbrack causes an error rather than document fetch when
the document is not in the cache\rbrack
\item Possibility to disable garbage collection altogether
\item Possibility to disable expiry checking
\item Caching Off to explicitly turn off caching even if there are
other caching directives specified
\item {\tt -gc\_only} command line option to do garbage
collection as a {\tt cron} job for sites that run
{\tt httpd} as a proxy from {\tt inetd}.
However, since {\tt httpd} now re-reads its
configuration files when it receives a {\tt HUP} signal,
it makes standalone operation now even more easy, and
{\tt inetd} should no longer be much more convenient.
\item Host names are converted to all-lower-case to avoid doing
multiple caching for a single site.
\item Files expiring immediately never get written to the cache; not
even part of it.
\item By default HTTP-retrieved documents without an
{\tt Expires:} and {\tt Last-Modified:} field
never get cached \lbrack because they are usually generated by
scripts and should never be cached\rbrack ; therefore I strongly
advice against the use of {\tt CacheDefaultExpiry} for
HTTP.
\item Caching control directives have changed to take a URL template
as a first argument, and a more readable time format:
\begin{verbatim}
CacheDefaultExpiry ftp:* 2 weeks 4 days
CacheDefaultExpiry gopher:* 6 days
CacheUnused http:* 1 month
CacheUnused ftp:* 2 weeks
CacheUnused gopher:* 1 week 5 days 2 hours 1 min 30 secs
\end{verbatim}
\item Made the expiry date approximation configurable; by default
documents with {\tt Last-Modified:} but without
{\tt Expires:} expire after 10\% of the time that they
have been unmodified. {\tt CacheLastModifiedfactor}
can be used to change this value, or turn this feature
{\tt Off}. Default value is 0.1 \lbrack =10\%\rbrack .
\item Understands yet another date format:
\begin{verbatim}
Thu, 10 Feb 1994 22:23:32 GMT
\end{verbatim}
This date format is {\bf not} conforming to the
spec, so use of it is discouraged! This is only to make the
proxy more robust.
\item {\tt NoCaching} directive to prevent certain URLs from
being cached at all.
\item Time margin to get rid of problems with machine clocks having
inaccurate times and confusing caching.
\item {\tt GcDailyGc} to specify a daily garbage collection
time, by default 3:00. \lbrack Can be turned {\tt Off}, too.\rbrack
\item Now possible to disable {\tt GcReqInterval} and
{\tt GcTimeInterval} \lbrack by default disabled\rbrack .
\item Expired cache lock files get removed also during gc.
\item {\tt CacheAccessLog} to specify a different log file
for cache accesses; also possible to make a separate log for
each remote host.
\end{itemize}
\section{cgiutils}
A new product {\tt cgiutils} for producing HTTP1 replies from
CGI scripts, and for easily generating the {\tt Expires:}
header given the time to live, e.g. "2 weeks 4 hours 30 mins". \par
\par
\chapter{
{}
CERN httpd 2.18beta Release Notes}
\section{New Features}
\begin{itemize}
\item Long FTP directory listing with last modification dates and sizes
\end{itemize}
\section{Fixes}
\begin{itemize}
\item Fixed a bad bug with {\tt Port} directive $--$ server
didn't fork but rather the parent process served which caused
the service to eventually hang (this is the main reason for
this release).
\item {\tt CLIENT\_CONTROL} removed from SOCKS mods since
{\tt httpd} has now native proxy protection support.
\item No longer fails to sometimes create {\tt .gc\_info} file.
\end{itemize}
\par
\chapter{{}
CERN httpd 3.0 PreRelease Notes}
\section{3.0 Prerelease 3}
\begin{itemize}
\item No longer strips hyphens from content-types and content-encodings
that are given in the configuration file (broken in pre1).
\item GMT-to-localtime transformation works now on all platforms in
caching (was broken on others than Sun).
\item Binary-FTP works again (broken pre2).
\item Unescaping bug fixed in news module (caused many articles to
fail to be retrieved).
\item News module now gives appropriate error reponses for unavailable
articles and non-existent news groups.
\item FTP and HTTP modules now give better error responses.
\item Fixed the cache access log to show the correct content-lengths.
\end{itemize}
\section{3.0 Prerelease 2}
\begin{itemize}
\item Respects UserId and GroupId directives again.
\item FTP module no longer prints messages to stderr in non-verbose mode.
\item \~username form understood with ServerRoot, Search, PutScript,
PostScript, DeleteScript, AccessLog, ErrorLog, CacheAccessLog
directives.
\item Opens cache access log only if caching is turned on.
\item Binary distribution now contains a template configuration file
that has all the configuration directives understood by httpd
(thanks to Sean Gonzalez for it!).
\end{itemize}
\section{3.0 Prerelease 1}
\begin{itemize}
\item If-Modified-Since GET request now works correctly with proxy
(client can do conditional GET/proxy can do conditional GET plus
all the combinations of these).
\item {\tt Pragma: no-cache } supported; by sending this header
to the proxy the client will force it to refresh its cache from
remote server. Pragma headers are also forwarded to the remote
server.
\item Server now resets its state correctly when it receives the HUP
signal (directory listing icons used to stop working).
\item {\tt -restart} option - {\tt httpd} will find out
the actual server process number and send s HUP signal to it to
make it reload its configuration files; note that
{\tt httpd} must still have the same configuration file
command line parameters ({\tt -r } options) as the actual
server (so it finds out the ServerRoot and PidFile).
\item Now makes appropriate entry to error log when restarting.
\item Made common logfile format default, the old format can still be
used with the {\tt LogFormat} directive:
\begin{verbatim}
LogFormat old
\end{verbatim}
\item Multiple wild-card (asterisk) matching in configuration file
works; it is a bit different from typical regular expression
matching in that the wildcard matches the {\em shortest\/}
possible amount of characters instead of the longest matching
string; this is the best choise in most of the cases. Consider:
\begin{verbatim}
Pass http://*/* /mirror/*/http/*
\end{verbatim}
Clearly the first asterisk should rather match only the hostname,
and {\bf not} the entire path except the filename.
\item Rules can now have asterisks and whitespace in them: precede them
with a backslah; as a result also the backslash itself has to be
escaped with another backslash.
\item The tilde character after a slash has to be explicitly matched:
\begin{verbatim}
Map /* /foo/bar/*
\end{verbatim}
does {\em not\/} match user-supported directories, but:
\begin{verbatim}
Map /~* /Webs/users/*
\end{verbatim}
does match them.
\item Fixed the problem that user-supported directories could not be
mapped or {\tt Protect}'ed.
\item Hostname matching made case-insensitive in access control/caching
\item Added suffixes {\tt .htm} and {\tt .htmls} to the
default set of known suffixes.
\item Fixed some of the mysterious caching problems (all that were
reported to me and that I could reproduce).
\item Made it possible to specify the various byte/kilo/mega sizes in
cache configuration with letters after the number (so it's no
longer necessary to remember if the default is kilobytes or
megabytes):
\begin{verbatim}
CacheSize 150 M
CacheLimit_1 100 K
CacheLimit_2 2 M
\end{verbatim}
The numbers still have to be cardinals.
\item Content-Length given for {\em all\/} documents, including
(non-nph-)script responses, generated directory listings, error
responses, all the documents retrieved over another protocol by
the proxy (FTP, Gopher, ...), including HTTP responses from
servers that didn't give it originally.
\item {\tt MaxContentLengthBuffer} directive to specify the
maximum bytecount for the proxy to buffer in order to find out
the content-length for the client - content-length is
{\em always\/} calculated for the logs, but the user migth
interrupt the connection if nothing seems to be happening, even
though it is the proxy that is just buffering the entire file in
order to find out the content-length before actually sending it
to the client.
\item Caching module now checks that it receives the correct
content-length; if not it discards the cached document. This
rules out the possibility to cache a truncated document from a
timed out connection in 99.99\% of the cases (0.01\% comes from the
fact that Plexus sends a timeout error message concatenated to
the document and if so should happen that this produces exactly
the correct content-length then there is nothing that can be done
about it; in practice this never happens).
\item Made {\tt HEAD} work always, even on proxy with other
protocols (FTP, Gopher...).
\item PASV (Passive mode) in FTP now supported. It is no longer
necessary to allow incoming connections above 1024 on the
firewall host just to make FTP work. If PASV fails
{\tt httpd} will retry PORT.
\item Welcome messages from FTP servers get shown on top of the
directory listings.
\item Fixed bug with old FTP files fixed getting wrong date in the listing.
\item Gopher listings now have icons.
\item Proxy now reports unknown host errors appropriately.
\item Fixed encoding-decoding problems with directory listings.
\item Added {\tt ScriptTimeOut} - scripts that do not finish in
this amount of time will be killed by {\tt httpd}. Default
value is 5 minutes.
\item A /\~username URL with an invalid username no longer causes an
infinite redirection loop.
\item The two files missing in FTP listings are no longer missing (they
weren't in 2.18beta, either).
\item Fixed a possible error condition that might cause the server to
stop responding, or even die.
\item Server now resets its UserId and GroupId even when in gc-only
mode (this solves problems with {\tt .cache\_info} files
sometimes being unwritable to actual caching processes).
\item CacheAccessLog is now opened during startup while running as root
to avoid opening problems. There is no longer logging to
individual files according to remote hosts - all cache accesses
are logged to this single file.
\item {\tt CacheOnly} directive for specifying a set of URLs
that should be cached (for cases when there are only a few sites
that should be cached).
\item Added {\tt DELETE-Script} directive for specifying the CGI
script to handle {\tt DELETE} method.
\item {\tt NoProxy} directive to allow the proxy to do direct
access to some servers instead of connecting to another proxy
server (contains a list of domain names). This works exactly
like the {\tt no\_proxy} environment variable on clients.
(Thanks to Rainer Klute for the patch!) This is only necessary
when running multiple proxy servers that connect to each other.
\item Fixed a bug that sometimes caused time directives to be parsed
incorrectly (e.g. {\tt CacheDefaultExpiry}).
\item Multilanguage addition to allow server to understand e.g. that
British English is also English, and that the US citizens do
understand it (thanks to Toshihiro Takada for the patch!).
\item Removed:
\begin{itemize}
\item {\tt GcReqInterval} and
{\tt GcTimeInterval} - not very good criteria to
start doing garbage collection ({\tt GcDailyGc} is
better, giving the actual time to lauch gc)
\item cache access logging to individual logfiles according to
remote host (wasted resources - a separate program is better
for collecting this information from a single log file).
\item {\tt -a} and {\tt -R} options (never used).
\item {\tt BodyTimeOut} replaced by {\tt ScriptTimeOut}
\item {\tt include}s from Makefiles (not supported by
all the {\tt make}s).
\item {\tt \#elif} preprocessor directive removed (wasn't
supported by all the HP preprocessors)
\end{itemize}
\end{itemize}
\par
\end{document}