\author{Generated from the Hypertext}\title{CERN Server User Guide} \maketitle \cleardoublepage \pagenumbering{roman} \setcounter{page}{1} \tableofcontents \cleardoublepage \pagenumbering{arabic} \setcounter{page}{1} \chapter{{} CERN httpd 3.0 Guide for Prereleases} CERN WWW Server \lbrack {\tt httpd}, HyperText Transfer Protocol Daemon\rbrack is a generic, full featured server for serving files using the HTTP protocol. This is a TCP/IP based protocol running by convention on port 80. \par Files can be real or synthesized, produced by scripts generating virtual documents. It handle clickable images, fill-out forms, and searches etc. \par CERN {\tt httpd} can also be run as a proxy server to allow people behind firewalls to use the Web as if the firewall was not present. A powerful feature is caching performed by the proxy, which makes {\tt cern\_httpd} as proxy attract even those not inside a firewall. \par \begin{itemize} \item This documentation is also available in PostScript. \item Documentation for older versions is still available: \lbrack 2.14 or older\rbrack \lbrack 2.15\rbrack \lbrack 2.16\rbrack \lbrack 2.17 \& 2.18\rbrack . \item If you upgrade see also release notes for \lbrack 2.15\rbrack \lbrack 2.16\rbrack \lbrack 2.17\rbrack \lbrack 2.18\rbrack $<$A HREF="ReleaseNotes\_3.0pre.html"$>$\lbrack 3.0pre1-3\rbrack . \item {\bf Current VMS Version is 2.16beta. See distribution.} See also Foteos Macrides' fixes. \par \end{itemize} \par \section{In This Guide...} \begin{DL}{allow this much space} \item[Installation ] The steps necessary to install CERN server. \item[Administration ] How to set up document protection, index search, clickable images, server-side scripts, ... \end{DL} \par \section{About documents generated from hypertext}Paper manuals generated from hypertext are made for convenience, for example for reading when one has no computer to turn to. We have tried to make the hypertext into fairly conventional paper documents, but they may seem a little strange in some ways.\par All the links have been removed. Therefore, it is worth looking at the table of contents to see what there is in the manual. Something which is not explained in place may be explained in detail elsewhere.\par We have tried to keep related matter together, but sometimes necessarily you might have to check the table of contents to find it.\par Please remember that these are for the most part "living documents". That is, they are constantly changing to reflect current knowledge. If you see a statement such as "Product xxx does not support this feature", remember that it was the case when the document was generated, and may not be the same now. So if in doubt, check the online version. Of course, the living document may be out of date too, in which case it is helpful to mail its author. \chapter{{} Installing CERN Server} {\bf VMS note:} There are special instructions if you are installing under VMS. \par \par \section{Getting the Program} CERN server distribution is available from {\tt info.cern.ch} anonymous ftp account. Often you don't need to compile the server yourself, precompiled binaries are available for many Unix platforms. If there is no precompiled version for your platform, of if it doesn't work (e.g. the name resolution doesn't work), you should get the source code and compile it yourself. \begin{itemize} \item Precompiled versions can be found under directory {\tt ftp://info.cern.ch/pub/www/bin} (in the subdirectory corresponding your machine architecture). \par \item Source code {\tt ftp://info.cern.ch/pub/www/src/cern\_httpd.tar.Z}. \par Compilation: \begin{itemize} \item Uncompress and untar the distribution tar file: \begin{verbatim} uncompress cern_httpd.tar.Z tar xvf cern_httpd.tar \end{verbatim} \item Go to newly-created {\tt WWW} directory, and give command {\tt ./BUILD}: \begin{verbatim} cd WWW ./BUILD \end{verbatim} \item Executable {\tt httpd} appears in directory {\tt .../WWW/Daemon/sun4} (if you have a Sun4 machine), or in another subdirectory corresponding to your machine architecture. The utility programs go to the same directory ({\tt htadm}, {\tt htimage}, {\tt cgiparse} and {\tt cgiutils}). \end{itemize} \end{itemize} \par \section{Configuration File} \begin{itemize} \item {\tt httpd} requires a configuration file, the default configuration file is {\tt /etc/httpd.conf}. If this doesn't suit you, you can specify another location to it using the {\tt -r } option: \begin{verbatim} httpd -r /other/place/httpd.conf \end{verbatim} \item Sample configuration files are available from \begin{itemize} \item directory {\tt cern\_httpd/config} inside the binary distribution, or \item under {\tt WWW/server\_root} inside the source code distribution. \item If this is missing you can get them from {\tt ftp://info.cern.ch/pub/www/src/server\_root.tar.Z} \end{itemize} \end{itemize} If you have all your documents in a single directory tree, say {\tt /Public/Web}, the easiest way to make them available to the world is to specify the following rule in your configuration file: \begin{verbatim} Pass /* /Public/Web/* \end{verbatim} This maps all the requests under the directory {\tt /Public/Web} and accepts them. \par The default welcome document (what you get with URL of form {\tt http://your.host/}) is now {\tt Welcome.html} in the directory {\tt /Public/Web}. \par \par \section{First Trying It Out In Verbose Mode} Often it is easy to make mistakes in the configuration file that makes configuring {\tt httpd} feel tedious - this doesn't have to be so. In the beginning start {\tt httpd} by hand in verbose mode to listen to some port, and look what happens when you make a request to that port with your browser. \par Typically test servers are run on a non-priviledged port above 1024 (you don't have to be {\tt root} to bind to them), often 8001, 8080, or such. Official HTTP port is 80. \par The server port is defined in the configuration file with the {\tt Port} directive, but you can override it with the {\tt -p } command line option while testing; e.g. \begin{verbatim} httpd -v -r /home/you/httpd.conf -p 8080 \end{verbatim} This will start {\tt httpd} in verbose mode, use configuration file {\tt httpd.conf} in your home directory, and accept connections to port 8080. \par You can now try to request a document form your server using a URL of form: \begin{verbatim} http://your.host:8080/document.html \end{verbatim} where {\tt document.html} is relative to the directory that you have exported in your configuration file. \par If you get an error message back see the verbose output to find out what is going wrong - it is usually self-explanatory. \par And remember, you should always feel free to ask advice from {\bf httpd@info.cern.ch}. \par \par \section{The Actual Installation of httpd} In Unix you can run the server either as stand-alone, or from Internet Daemon {\tt (inetd)}. A stand-alone server is typically started once at system-boot time. It waits for incoming connections, and forks itself to serve a request. {\bf This is much faster} than letting {\tt inetd} spawn {\tt httpd} every time a request comes. {\bf We therefore recommend that you run CERN httpd in stand-alone mode.} \par \subsection{Stand-alone Installation} A stand-alone server is started from the bootstrap command file (for example {\tt /etc/rc.local)} so that it runs continuously like the {\tt sendmail} daemon, for example. \par This method has the advantage over using the {\tt inetd} that the response time is reduced. \par Add a line starting {\tt httpd} to your system startup file (usually {\tt /etc/rc.local} or {\tt /etc/rc}). If you have the configuration file in the default place, {\tt /etc/httpd.conf}, and if it specifies the port to listen to via the {\tt Port} directive, you don't need any command line options: \begin{verbatim} /usr/etc/httpd & \end{verbatim} {\tt httpd} will automatically go background so there is really no need for an ampersand in the end (as long as your configuration file {\tt /etc/httpd.conf} really exists). \par Or a little more safely in case httpd is removed: \begin{verbatim} if [ -f /usr/etc/httpd ]; then (/usr/etc/httpd && (echo -n ' httpd') ) & >/dev/console fi \end{verbatim} Naturally you can use any of the command line options, if necessary. \par \par \section{Registering Your Server} Once you have your {\tt httpd} up and running, and you have documents to show the word, announce your server, so that others can find it. \par \par \section{If It Doesn't Work...} ...first run it in verbose mode with the {\tt -v } option and try to figure out what goes wrong. See also the debugging chart and the FAQ. If you can't figure out what's going wrong, feel free to send mail to {\bf httpd@info.cern.ch} \par \par \section{ {} Installing httpd Under inetd} This is how to to set up {\tt inetd} to run {\tt httpd} whenever a request comes in. (These steps are the same for any daemon under unix: you will probably find a similar thing has been done for the FTP daemon, {\tt ftpd,} for example.) \par \par \subsection{Step 1: Install httpd Binary} Copy {\tt httpd} into a suitable directory such as {\tt /usr/etc.} Make it owned by {\tt root}, and make it writable only to {\tt root,} for example by saying: \begin{verbatim} chmod 755 httpd \end{verbatim} \par \subsection{Step 2: Add http Service to /etc/services} Put "http" in the {\tt /etc/services} file, or use the name of a specific service of your own if you want to use a special port number. Standard port number for HTTP is 80. \begin{verbatim} http 80/tcp # WWW server \end{verbatim} {\bf Exceptions:} \begin{itemize} \item On a NeXT, see using the NetInfomanager \item On any machine running NIS (yellow pages), see specicial instructions. \end{itemize} \par \subsection{Step 3: Add a Line to /etc/inetd.conf} Put a line in the internet daemon configuration file, {\tt /etc/inetd.conf.} \begin{verbatim} http stream tcp nowait root /usr/etc/httpd httpd \end{verbatim} First word is the same as in {\tt /etc/services} file. \par If you want to pass command line options or parameters to {\tt httpd,} they would listed be in the end of line, for example to set the rule file to something else than the default {\tt /etc/httpd.conf:} \begin{verbatim} http stream tcp nowait root /usr/etc/httpd httpd -r /my/own/rules \end{verbatim} {\bf Note:} For {\tt httpd} version 2.15 and later we recommend that it is run as user {\tt root.} Running {\tt httpd} as {\tt root} is safe, since it automatically resets its user-id to {\tt nobody.} However, if you decide to use access authorization features, and you need to serve protected files, {\tt httpd} will have to be able to set its user-id to some other uid as well. In any case, {\tt httpd} always sets its user-id to something other than {\tt root} before serving the file to the client. \par {\bf Note:} {\tt /etc/inetd.conf} syntax varies from system to system, for example all systems don't have the field specifying the user name, in which case the default is {\tt root.} If in doubt, sopy the format of other lines in your existing {\tt inetd.conf.} \par {\bf Note:} There seems to be a limit of 4 arguments passed across by {\tt inetd,} at least on the NeXT. \par \par \subsection{Step 4: Send HUP Signal to inetd } When you have updated {\tt inetd.conf,} find out the process number of {\tt inetd,} and send a "HUP" signal to it. \par For example on BSD unix do this: \begin{verbatim} > ps -aux | grep inetd | grep -v grep root 85 0.0 0.9 1.24M 304K ? S 0:01 /usr/etc/inetd > kill -HUP 85 \end{verbatim} For system V, use {\tt ps -el} instead of {\tt ps -aux}. Be aware that on some systems your local file /etc/services may not be consulted by your system (see notes on debugging). \par \par \subsection{Test It!} \par \subsection{{} Using NIS (Yellow Pages)} If your machine is running Sun's "Network Information Service", originally know as "yellow pages", read this.\par You must: \begin{itemize} \item First make an addition to the {\tt /etc/services} file just as for a normal unix system. \item Then, change directory to {\tt /var/yp} and run {\tt make}. \end{itemize} This will load the {\tt /etc/services} file info the NIS information system.\par Some people have found that they needed to reboot he system afterward for the change to take effect. \par \par \subsection{{} Adding a Service on the NeXT} The NeXT uses the the "netinfo" database instead of the {\tt /etc/services} file. This is managed with the {\tt /NextAdmin/NetInforManager} application. Here's how to add the service {\tt http}: \begin{itemize} \item Start the NetInfomanager by double-clicking on its icon. \par \item If you are operating in a cluster, open either your local domain {\tt (/hostname)} or if you have authority, the whole cluster domain {\tt (/)}. If you're not in a cluster, just use the domain you are presented with. \par \item Select {\tt "services"} from the browser tree. \par \item Select {\tt "ftp"} from the list of services. \par \item Select {\tt "dupliacte"} from the edit menu. \par \item Select {\tt "copy of ftp"} and double-click on its icon to get the property editor. \par \item Click on {\tt "name"} and then on the value {\tt "copy of ftp"}. Change this to {\tt "http"} by typing "http" in the window at the botton, and hitting return. \item Click on {\tt "port"}, and then on the value {\tt 21}. Change it to {\tt 80}. \par \item Use {\tt "Directory:Save"} menu {\tt (Command/s)} to save the result. You will have to give a root password or netinfo manager password. \par \end{itemize} \par \section{{} Priviliged ports} The TCP/IP port numbers below 1024 are special in that normal users are not allowed to run servers on them. This is a security feaure, in that if you connect to a service on one of these ports you can be fairly sure that you have the real thing, and not a fake which some hacker has put up for you. \par The normal port number for W3 servers is port 80. This number has been assigned to WWW by the Internet Assigned Numbers Authority, IANA. \par When you run a server as a test from a non-priviliged account, you will normally test it on other ports, such as 2784, 5000, 8001 or 8080. \par \par \subsection{Under Unix} The Internet Daemon {\tt inetd} (running as root) can listen for incomming conections on port 80 and pass them down to a process with a safer uid for the server itself. However, the {\tt httpd} versions 2.14 and later can be safely run as {\tt root} since they automatically change their user-id to {\tt nobody} or some other user-id depending on server setup. \par \par \subsection{Under VMS } Under UCX, the process running as a server needs BYPASS privilege to listen to ports below 1024. This might mean you have to install the server. With other TCP/IP packages, privilege of some sort is similarly required. \par \par \section{{} Debugging httpd} Suppose you think you have installed {\tt httpd} but it doesn't work. Here we assume you have used port 80. If you have a situation not handled by this problem-solving guide, please mail {\tt httpd@info.cern.ch}. \par \par Type \begin{verbatim} www http://myhost.domain/ \end{verbatim} What happens? \par \subsection{Connection Refused} The browser tries to connect to the daemon but gets this status in the trace. \par This means that nobody was listening on that port number. Check the port numbers match between server and client. Make sure you specify the port number explicitly in the document address for {\tt www}.\par If you are running the daemon standalone (as you should be), check that it is actually running by taking a list of processes, and that it is listening to the correct port (specified with {\tt -p } {\it port\/} option), or try running it from the terminal with {\tt -v} option as well. The trace for the server should say {\tt "socket, bind and listen all ok".} If it does, and you still get "{\tt connection refused}", then you must be talking to the wrong host (or, conceivably, different ethernet adapters on the same host).\par If you are running with the inet daemon, then check both the services file {\tt (/etc/services)} or database (yellow pages, netinfo) if your system uses it, and the {\tt /etc/inetd.conf} file. Check the service name matches between these two (e.g. {\tt http}).\par Did you remember to kill -HUP the {\tt inetd} when you changed the {\tt inetd.conf} file? \par {\em Be aware that on some systems your local file {\tt /etc/services} will not be consulted\/} E.g. when {\tt ypbind} is running on Suns, then you should type \begin{verbatim} ypwhich -m services \end{verbatim} and ask the administrator of the machine named to change its own {\tt /etc/services}. \par Try running the deamon from a shell window to see better what happens. \par \par \subsection{Cannot Connect To Information Server} The usual cause of this is that the server is not running, or it's running on a different port. \par There is more information you can get. Use the "verbose" option on the LineMode browser to find out what went wrong: \begin{verbatim} www -v http://myhost.domain:80/ \end{verbatim} \par What do you get? A load of trace messages. There are several cases. \begin{itemize} \item The browser can't look up the name of the host. If it can, it will display "Parsed address as" message. If not, try fixing your name server or {\tt /etc/hosts} file, or quoting the IP number of the host in decimal notation (like 128.141.77.45) instead. \par \item The browser can get to the host but gets {\tt Connection refused} status back. \par \item Your browser gets an error number but prints "error message not translated". This is because when it was compiled on your platform it didn't know what form the error message table took. Try the same thing form a unix platform for example. \par \item You get some network error like "network unreachable". Depending on whether the IP network is your responsibility or not, and your attitude to life, either fix it, try again in an hour's time, or complain to someone. \par \end{itemize} \par \subsection{Unable To Access Document} Typical cause of this is that the configuration file is incorrect, or files are not readable by the user-id under which the server runs. When you are running the server as {\tt root,} it will automatically switch it to {\tt nobody} just before serving the document. This can be changed with the {\tt UserId} configuration directive. \par \par \subsection{An Empty Document Is Displayed} The document sent back is empty, but there is no error message.\par The {\tt inetd} has started a process to run your server but it immediately failed. Possibilities include: \begin{itemize} \item When running from {\tt inetd}, the daemon may not be in the file specified, or may not be executable by the specified user (or, if a user id is not specified in your variety of {\tt inetd.conf}, {\tt root}). \par \item For some reason server crashes when it's trying to serve the request. If you can, try to tract down when this happens, and send mail to {\tt httpd@info.cern.ch}. Try running the daemon from a terminal window to see what happens. \par \item Script fails to produce any result, which may be due to the fact that there is no empty line after the header section output by the script, causing server to read the entire generated document as the header section. \par \end{itemize} \par \subsection{Document Address Invalid Or Access Not Authorized...} ...or some similar kind of error message. This means either: \begin{itemize} \item You have been passed a bad document address. If you are following a link, check with the author of the document which contained the link. \item The document has been moved. Check with the server administrator. You should be able to find out who runs the server by going to the welcome page (type "g /" with the line mode browser) and seeing a link to information about the maintainers. \end{itemize} If you are the server administrator, and you can't understand why the daemon refuses to deliver the file, \begin{itemize} \item Check the configuration file (rule file, by default {\tt /etc/httpd.conf}) if you have one. Think out way the document name will be mapped successively by each line, and what the result will be. \par \item Run the daemon in debug mode from a terminal session to get trace information. \par \end{itemize} \par \subsection{Bad Output} A document is displayed, but not the one you wanted. \par These are some ideas: \begin{itemize} \item Try running the server from the terminal. \par \item Check the HTML source the daemon produces with \begin{verbatim} www -source http://my.host.domain/ \end{verbatim} \item Try telnetting to httpd and simulating the client: \begin{verbatim} > telnet my.host.domain 80 Connected to my.host.domain on port 80 Escape is ^[ GET /document/name \end{verbatim} \end{itemize} \par \par \subsection{{} Running Under Shell} You don't have to run the daemon under the {\tt inetd} if it doesn't work (and we recommend running it standalone anyway). You can run it from a shell session.\par Run {\tt httpd} from your terminal turned on, with a different port number like 8080: \begin{verbatim} httpd -p 8080 \end{verbatim} {\bf Note:} You must be {\tt root} (under VMS, have some privilege) to run with a port number below 1024. If you select a port above 1024, then you can run as a normal user. This way, anyone can publish files on the net. Howeever, it isn't very reliable, as your server will not automatically come back up if the machine is rebooted. In the long term it is best to install it to be started from the system startup file {\tt /etc/rc} or {\tt /etc/rc.local}. \par You may not be able to use a port number which has been used by a daemon process recently (port may still be bound), so you may have to switch port number if you {\char94}C and restart {\tt httpd}. When it is running like this, you can also read the debugging messages (when running with {\tt -v} option), and use a debugger on it if necessary. (See also: telnetting to the server). \par \par \subsubsection{Debugging using Trace} If you can't understand why a server refuses to give back a document, then run with the {\tt -v} option to turn on debugging messages. Use {\tt -v} as the very first command line option (this way debugging is turned on right away). You will see the daemon setting up the rules for translating requests into local URLs, and you will see its attept to access the file (assuming you map requests onto files). \begin{verbatim} httpd -v -p 8080 \end{verbatim} Try to access the document from a client using another terminal window. Look at the debugging output. It will probably explain what is happening. If you still can't figure out the problem, mail your local guru help desk or if desperate {\tt httpd@info.cern.ch} {\bf enclosing} a copy of debugging output. \par \par \subsubsection{Even simpler} For testing a daemon very simply, without using a client, you can make the terminal be the client. With {\tt httpd} try just running it with the terminal and typing {\tt GET} {\it /document/url\/} into its input: \begin{verbatim} httpd -v GET /document/url \end{verbatim} \par \subsection{{} Telnetting to httpd} Most implementations of telnet allow you to specify a port number. Under unix this is often just a second parameter, under VMS a {\tt /PORT} option. \par The HTTP protocol is a telnet protocol, so you can simulate it just by typing things in. This will help you to see exactly what a sending back, and it will check you that it really is the server not the browser which has a problem. \par Here is a simple example (keybord input is in {\bf boldface}): \begin{verbatim} > telnet myhost.domain 80 Connected to myhost.domain on port 80 Escape is ^[ GET /document/url ...document or error message... \end{verbatim} \par \chapter{{} Command Line of CERN httpd} The command line syntax for {\tt httpd} allows a number of options and an optional directory argument: \begin{verbatim} httpd [-opt -opt -opt ...] [directory] \end{verbatim} The directory argument, if present, indicates the directory to be exported. If not present, either a rule file is be used, to export combinations of directories, or else the default is to export the {\tt /Public} directory tree. \par \par \section{Options} \begin{DL}{allow this much space} \item[ {\tt -r } {\it rulefile\/} ] Use {\it rulefile\/} as configuration file. {\bf This is the only necessary command line option} if you don't have the default configuration file, {\tt /etc/httpd.conf}. All the other options can be given as directives in the configuration file. \item[ {\tt -p } {\it port\/} ] Listen to port {\it port\/}. Without this argument {\tt httpd} assumes that it has been run by {\tt inetd}, and uses {\tt stdin} and {\tt stdout} as its communication channel. {\bf Note} that port numbers under 1024 are privileged. \item[ {\tt -l } {\it logfile\/} ] Use {\it logfile\/} to log the requests. \item[ {\tt -restart} ] Restart an already running {\tt httpd}. {\tt httpd} finds the out the process number of the running server from {\tt PidFile} and sends it the {\tt HUP} signal (HangUP). This will cause {\tt httpd} to reload its configuration files and reopen its log files. {\bf Important:} To find out the {\tt PidFile} {\tt httpd} will have to read the same configuration file as the running {\tt httpd} has, so you have to specify the same {\tt -r } options on the command line as for the actual {\tt httpd}. \item[ {\tt -gc\_only} ] \lbrack only for proxies\rbrack Do only garbage collection and then exit. This can be used to run {\tt httpd} periodically by {\tt cron} to do garbage collection on a cache that is used by {\tt httpd} run from the {\tt inetd} daemon rather than standalone. When {\tt httpd} is not running standalone it cannot monitor the cache, nor perform automatic garbage collection. \item[ {\tt -v} ] Verbose, turn on debugging messages. \item[ {\tt -vv} ] Very Verbose, turn on even more verbose debugging messages. \item[ {\tt -version} ] Print version number of {\tt httpd} and {\tt libwww} (the WWW Common Library). $<$!$--$ DT $<$CODE -newlog $<$/CODE $<$I logfile$<$/I $<$DD Use $<$I logfile$<$/I to log the requests using the new, common logfile format. This will eventually become the default. $<$DT $<$CODE -errlog $<$/CODE $<$I errorlogfile$<$/I $<$DD Use $<$I errorlogfile$<$/I to log errors. If this is not specified, but $<$I logfile$<$/I is (with $<$CODE -l$<$/CODE or $<$CODE -newlog$<$/CODE option), $<$I logfile.error$<$/I is used. $<$DT $<$CODE -gmt$<$/CODE $<$DD Use GMT instead of localtime in logfile (localtime is default). $<$DT $<$CODE -nolog $<$/CODE $<$I template$<$/I $<$DD Don't log accesses from hosts matching $<$I template$<$/I . Template is either an IP number mask like $<$CODE 128.141.*.*$<$/CODE or a hostname template containing at most one wildcard, for example $<$CODE *.cern.ch$<$/CODE $<$DT $<$CODE -disable $<$/CODE $<$I METHOD$<$/I $<$DD Disable $<$I METHOD$<$/I on this server. You can also use the $<$CODE Disable$<$/CODE directive in configuration file. $<$DT $<$CODE -enable $<$/CODE $<$I METHOD$<$/I $<$DD Enable $<$I METHOD$<$/I on this server. You can also use the $<$CODE Enable$<$/CODE directive in configuration file. $<$DT $<$CODE -setuid$<$/CODE $<$DD When using user authentication, set server user-id to authenticated user id (for people who have login accounts on the same machine as the documents reside, and nobody else needs to access them). $--$$>$ \end{DL} \par \subsection{Directory Browsing} You can set these also with the {\tt DirAccess} configuration directive. \begin{DL}{allow this much space} \item[ {\tt -dy} ] Enable direcory browsing. Directories are returned as hypertext documents. See browsing directories. {\em Default.\/} \item[ {\tt -dn} ] Disable directory browsing. An attempt to access a directory will generate an error response. \item[ {\tt -ds} ] Selective directory browsing; enabled only for directories containing a file named {\tt .www\_browsable} \end{DL} \par \subsection{README Feature} It is common practice to put a file named {\tt README} into a directory containing instructions or notices to be read by anyone new to the directory. {\tt httpd} will by default embed any {\tt README} file in the hypertext version of a directory. \par You can set these also with the {\tt DirReadme} configuration directive. \begin{DL}{allow this much space} \item[ {\tt -dt} ] For any browsable directory which contains a {\tt README} file, include the text of the {\tt README} file at the top of the document before the listing. {\em Default.\/} \item[ {\tt -db} ] As {\tt -dt} but put the {\tt README} at the bottom, after the listing. The {\tt -db} and {\tt -dt} options may be combined with {\tt -dy} as {\tt -dyb}, {\tt -dty} etc. \item[ {\tt -dr} ] Disables the {\tt README} inclusion feature. \end{DL} \par \section{Examples} \begin{verbatim} httpd -r /usr/etc/httpd.conf -p 80 \end{verbatim} This is a standalone server running on port 80. Configuration file is {\tt /usr/etc/httpd.conf} instead of the default, {\tt /etc/httpd.conf}. \par {\bf Note} that if the {\tt Port} directive is given in the configuration file the {\tt -p } option is not necessary (it can be used to override the value set in the configuration file). \par \begin{verbatim} httpd \end{verbatim} {\tt httpd} uses its default configuration file {\tt /etc/httpd.conf}. If that file doesn't exist, {\tt httpd} exports the {\tt /Public} directory tree. This tree may contain soft links to other directory trees. \par If the configuration file {\tt /etc/httpd.conf} didn't define the port number to listen to this is an {\tt httpd} reading its {\tt stdin} and writing to its {\tt stdout}, so it is run by {\tt inetd}. \par \begin{verbatim} httpd -r /usr/local/lib/httpd.conf \end{verbatim} The same as before, but uses {\tt /usr/local/lib/httpd.conf} as a rule file instead of the default {\tt /etc/httpd.conf}. \par \par \chapter{{} Configuration File of CERN httpd} The configuration file (often referred to as the rule file) defines how {\tt httpd} will translate a request into a document name. The directives controlling {\tt httpd} features are also put into the configuration file, as well as protection configuration. This is essential to prevent unauthorized access to your private documents. \par \section{Default Configuration File} By default, the configuration file {\tt /etc/httpd.conf} is loaded, unless specified otherwise with the {\tt -r} command line option: \begin{verbatim} httpd -p 80 -r /your/own/httpd.conf \end{verbatim} See also example configuration files. \par \section{Comments in Configuration File} Each line consists of an operation code and one or two parameters, referred to as the template and the result. Lines starting with a hash sign {\tt \#} are ignored, as are empty lines. \par \par \section{Restarting the Server} When you are running the server in standalone mode (not from {\tt inetd}), and modify the configuration file, send the {\tt HUP} signal to {\tt httpd} to make it re-read the configuration file. You can find out the process number from the pid file written by httpd, e.g. \begin{verbatim} > cat /server_root/httpd-pid 2846 > kill -HUP 2846 > \end{verbatim} {} You must specify the configuration file as an {\bf absolute pathname} for the {\tt -r} option because when the server is started in standalone mode it changes its current directory to {\tt /} so after startup it cannot reload configuration files that were specified with relative filenames. \par To make restarting easier {\tt httpd } has a {\tt -restart } option, which will automatically send the HUP signal to another {\tt httpd} process. {\bf Important:} To find out the {\tt PidFile} {\tt httpd} will have to read the same configuration file as the running {\tt httpd} has, so you have to specify the same {\tt -r } options on the command line as for the actual {\tt httpd}, e.g. \begin{verbatim} > httpd -r /usr/etc/httpd.conf -restart Restarting.. httpd Sending..... HUP signal to process 21379 > \end{verbatim} \par \section{Exhaustive List of Configuration Directives} \begin{itemize} \item General settings: \begin{itemize} \item {\tt ServerRoot} \item {\tt HostName} \item {\tt Port} \item {\tt PidFile} \item {\tt UserId} \item {\tt GroupId} \item {\tt Enable} \item {\tt Disable} \item {\tt IdentityCheck} \item {\tt Welcome} \item {\tt AlwaysWelcome} \item {\tt UserDir} \item {\tt MetaDir} \item {\tt MetaSuffix} \item {\tt MaxContentLengthBuffer} \end{itemize} \item URL translation rules: \begin{itemize} \item {\tt Map} \item {\tt Pass} \item {\tt Fail} \item {\tt Redirect} \item {\tt Protect} \item {\tt DefProt} \item {\tt Exec} \end{itemize} \item Filename suffix definitions: \begin{itemize} \item {\tt AddType} \item {\tt AddEncoding} \item {\tt AddLanguage} \item {\tt SuffixCaseSense} \end{itemize} \item Accessory scripts: \begin{itemize} \item {\tt Search} \item {\tt POST-Script} \item {\tt PUT-Script} \item {\tt DELETE-Script} \end{itemize} \item Directory listings: \begin{itemize} \item {\tt DirAccess} \item {\tt DirReadme} \item {\tt DirShowIcons} \item {\tt DirShowBrackets} \item {\tt DirShowMinLength} \item {\tt DirShowMaxLength} \item {\tt DirShowDate} \item {\tt DirShowSize} \item {\tt DirShowBytes} \item {\tt DirShowHidden} \item {\tt DirShowOwner} \item {\tt DirShowGroup} \item {\tt DirShowMode} \item {\tt DirShowDescription} \item {\tt DirShowMaxDescrLength} \item {\tt DirShowCase} \end{itemize} \item Icons in directory listings: \begin{itemize} \item {\tt AddIcon} \item {\tt AddBlankIcon} \item {\tt AddUnknownIcon} \item {\tt AddDirIcon} \item {\tt AddParentIcon} \end{itemize} \item Logging: \begin{itemize} \item {\tt AccessLog} \item {\tt ErrorLog} \item {\tt LogFormat} \item {\tt LogTime} \item {\tt NoLog} \item {\tt CacheAccessLog} \end{itemize} \item Timeouts: \begin{itemize} \item {\tt InputTimeOut} \item {\tt OutputTimeOut} \item {\tt ScriptTimeOut} \end{itemize} \item Proxy Caching: \begin{itemize} \item {\tt Caching} \item {\tt CacheRoot} \item {\tt CacheSize} \item {\tt NoCaching} \item {\tt CacheOnly} \item {\tt CacheClean} \item {\tt CacheUnused} \item {\tt CacheDefaultExpiry} \item {\tt CacheLastModifiedFactor} \item {\tt CacheTimeMargin} \item {\tt CacheNoConnect} \item {\tt CacheExpiryCheck} \item {\tt Gc} \item {\tt GcDailyGc} \item {\tt GcMemUsage} \item {\tt CacheLimit\_1} \item {\tt CacheLimit\_2} \item {\tt CacheLockTimeOut} \item {\tt CacheAccessLog} \end{itemize} \item Going through many proxies: \begin{itemize} \item {\tt http\_proxy} \item {\tt ftp\_proxy} \item {\tt gopher\_proxy} \item {\tt wais\_proxy} \item {\tt no\_proxy} \end{itemize} \end{itemize} \par \section{{} General CERN httpd Configuration Directives} \begin{itemize} \item {\tt ServerRoot} \item {\tt HostName} \item {\tt Port} \item {\tt PidFile} \item {\tt UserId} \item {\tt GroupId} \item {\tt Enable} \item {\tt Disable} \item {\tt IdentityCheck} \item {\tt Welcome} \item {\tt AlwaysWelcome} \item {\tt UserDir} \item {\tt MetaDir} \item {\tt MetaSuffix} \item {\tt MaxContentLengthBuffer} \end{itemize} \par \subsection{ServerRoot} Server's "home" diretory is specified via {\tt ServerRoot} directive. If server root is specified, but no {\tt AddIcon} directive has been used in configuration file to set up icons, the default icon directory is under server root {\tt icons}. The default icons that should be present are: \begin{itemize} \item {\tt blank.xbm} blank icon for aligning the header with listing \item {\tt directory.xbm} for directories \item {\tt back.xbm} for parent directory \item {\tt unknown.xbm} for unknown types \item {\tt binary.xbm} for binary files \item {\tt text.xbm} for text files \item {\tt image.xbm} for image files \item {\tt movie.xbm} for movies \item {\tt sound.xbm} for audio files \item {\tt tar.xbm} for tar files \item {\tt compressed.xbm} for compressed files \end{itemize} If these defaults don't please you you can define all from the scratch. As an example of {\tt AddIcon} directive, the defaults would be specified as follows: \begin{verbatim} Pass /httpd-internal-icons/* /server_root/icons/* AddBlankIcon /httpd-internal-icons/blank.xbm AddDirIcon /httpd-internal-icons/directory.xbm DIR AddParentIcon /httpd-internal-icons/back.xbm UP AddUnknownIcon /httpd-internal-icons/unknown.xbm AddIcon /httpd-internal-icons/binary.xbm BIN binary AddIcon /httpd-internal-icons/text.xbm TXT text/* AddIcon /httpd-internal-icons/image.xbm IMG image/* AddIcon /httpd-internal-icons/movie.xbm MOV video/* AddIcon /httpd-internal-icons/sound.xbm AU audio/* AddIcon /httpd-internal-icons/tar.xbm TAR multipart/*tar AddIcon /httpd-internal-icons/compressed.xbm CMP x-compress x-gzip \end{verbatim} \subsubsection{{} On Proxy Server} On proxy server the icon URLs {\bf must be full URLs}, because otherwise clients would translate them relative to remote host. This means that in the above example all the {\tt AddIcon*} directives have to read: \begin{verbatim} AddIcon http://your.server/httpd-internal-icons/... \end{verbatim} {\bf and} you have to pass also the full icon URL: \begin{verbatim} Pass http://your.server/httpd-internal-icons/* /server_root/icons/* \end{verbatim} Since future smart browsers might notice that the icon server is the same one as the proxy server it may be best in this case to also {\tt Pass} the partial URL as above: \begin{verbatim} Pass /httpd-internal-icons/* /server_root/icons/* \end{verbatim} \par \subsection{HostName} On some hosts the hostname lookup fails producing only the name without the domain part. Full hostname is necessary when {\tt httpd} is generating references to itself (redirection responses to clients). If necessary, provide full server hostname with {\tt HostName} directive: \begin{verbatim} HostName full.server.host.name \end{verbatim} You may want to use this also when the real host name is different from what you want the clients to see (you have a DNS alias for the host). \par \par \subsection{Default Port Setting} For standalone server (the one running continuously, listening to a certain port, and forking a child to handle the request) the port to listen to can be defined via {\tt Port} configuration directive instead of the {\tt -p } {\it port\/} command line option. Normally: \begin{verbatim} Port 80 \end{verbatim} {\tt -p } {\it port\/} command line line option still overrides this default. \par \par \subsection{PidFile} {\tt httpd} re-reads its configuration file when it receives a {\tt HUP} signal \lbrack HANGUP\rbrack , the signal number 1. To make it easy to find out the parent {\tt httpd} process id, it writes it to a file. \par By default, if {\tt ServerRoot} is specified, this is the file {\tt httpd-pid} under server root; if not, it defaults to {\tt /tmp/httpd-pid}. \par The {\tt PidFile} directive can be used to set the process id file name; it can be either an absolute path, or a relative one. Relative path is relative to {\tt ServerRoot}, or if not defined, relative to {\tt /tmp}. \subsubsection{Example} \begin{verbatim} ServerRoot /Web/serverroot PidFile logs/httpd-pid \end{verbatim} would cause the process id to be written to {\tt /Web/serverroot/logs/httpd-pid}. \par \par \subsection{Default User Id} {\tt UserId} directive sets the default user to run as instead of {\tt nobody}. This directive is only meaningful when running server as {\tt root.} \begin{verbatim} UserId whoever \end{verbatim} \par \subsection{Default Group Id} {\tt GroupId} directive sets the default group to run under instead of {\tt nogroup}. This directive is only meaningful when running server as {\tt root.} \begin{verbatim} GroupId whichever \end{verbatim} \par \subsection{Enabling and Disabling HTTP Methods} You can enable/disable methods that you do/don't want your server to accept: \begin{verbatim} Enable METHOD Disable METHOD \end{verbatim} By default {\tt GET}, {\tt HEAD} and {\tt POST} are enabled, and the rest are disabled. \par \subsubsection{Examples} \begin{verbatim} Enable POST Disable DELETE \end{verbatim} \par \subsection{IdentityCheck} If {\tt IdentityCheck} configuration directive is turned {\tt On}, {\tt httpd} will connect to the ident daemon (RFC931) of the remote host and find out the remote login name of the owner of the client socket. This information is written to access log file, and put into the {\tt REMOTE\_IDENT } CGI environment variable. \par Default setting is {\tt Off}: \begin{verbatim} IdentityCheck Off \end{verbatim} and if you don't need this information you will save the resources by keeping it off. Furthermore, this information does not provide any more security and should not be trusted to be used in access control, but rather just for informational purposes, such as logging. \par \subsubsection{{} WARNING {}} On some systems there is a kernel bug that causes all the connections to the remote node to be broken if the remote ident request is not answered (ident daemon not running, for example). This is reported for at least SunOS 4.1.1, NeXT 2.0a, ISC 3.0 with TCP 1.3, and AIX 3.2.2, and later are ok. Sony News/OS 4.51, HP-UX 8-?? and Ultrix 4.3 still have this bug. A fix for Ultrix is availabe (CSO-8919). \par \lbrack Thanks to Per-Steinar Iversen from Norway for pointing this out!\rbrack \par If the operating system on your server host has this bug, {\bf do not use IdentityCheck!} \par \par \subsection{Welcome} {\tt Welcome} directive specifies the default file name to use when only a directory name is specified in the URL. There may be many {\tt Welcome} directives giving alternative welcome page names. The one that was defined earlier will have precedence. \par Default values are {\tt Welcome.html}, {\tt welcome.html} and {\tt index.html}. {\tt index.html} is there only for compatibility with NCSA server; the word "Welcome" is more descriptive, and has precedence. \par All default values will be overridden if {\tt Welcome} directive is used. \par Default values could be defined as: \begin{verbatim} Welcome Welcome.html Welcome welcome.html Welcome index.html \end{verbatim} \par \subsection{AlwaysWelcome} By default there is no difference between directory names with and without a trailing slash when it comes to welcome pages. The one without a trailing slash will cause an automatic redirection to the one with a trailing slash, which then gets mapped to the welcome page. \par If it is desirable to have plain directory names to produce a directory listing, and only the ones with a trailing slash cause the welcome page to be returned, set the {\tt AlwaysWelcome} directive to off: \begin{verbatim} AllwaysWelcome Off \end{verbatim} Default value is {\tt On}. \par \par \subsection{User-Supported Directories} User-supported directories, URLs of form {\bf /\~username}, are enabled by {\tt UserDir} directive: \begin{verbatim} UserDir dir-name \end{verbatim} The {\it dir-name\/} argument is the directory in each user's home directory to be exported, for example {\tt WWW}: \begin{verbatim} UserDir WWW \end{verbatim} \par \subsection{Meta-Information} It is possible to tell {\tt httpd} to add meta-information to response. Meta-information is stored in a directory specified by {\tt MetaDir} directive, under the same directory as the file being retrieved: \begin{verbatim} MetaDir dir-name \end{verbatim} Meta-information is stored in a file with the same name as the actual document, but appended with a suffix specified via {\tt MetaSuffix} directive: \begin{verbatim} MetaSuffix .suffix \end{verbatim} Meta-information files contain RFC822-style headers. \par Default settings are: \begin{verbatim} MetaDir .web MetaSuffix .meta \end{verbatim} meaning that meta-information files are located in the {\tt .web} subdirectory, and they end in {\tt .meta} suffix, i.e. the metafile for file: \begin{verbatim} /Web/Demo/file.html \end{verbatim} would be: \begin{verbatim} /Web/Demo/.web/file.html.meta \end{verbatim} \par \subsection{MaxContentLengthBuffer} {\tt httpd} normally gives a content-lenght header line for every document it returns. When it's running as a proxy it buffers the document received from the remote server before sending it to the client. This directive can be used to set the value of this buffer - if it is exceeded the document will be returned without a content-lenght header field. \par Default setting is 50 kilobytes: \begin{verbatim} MaxContentLengthBuffer 50 K \end{verbatim} \par \section{{} Rules In The Configuration File} Rules define the mapping between virtual URLs and physical file names. Currently the following rules are understood: \begin{itemize} \item {\tt Map} - Map URLs to actual files \item {\tt Pass} - Accept a request \item {\tt Fail} - Fail a request \item {\tt Redirect} - Redirect a request \item {\tt Protect} - Set up protection \item {\tt DefProt} - Default protection setup \item {\tt Exec} - Executable server scripts \end{itemize} \par \subsection{Mapping, Passing and Failing} There are three main rules: {\tt Map,} {\tt Pass} and {\tt Fail.} The server uses the top rule first, then {\bf each successive rule} unless told otherwise by a {\tt Pass} or a {\tt Fail} rule. \par \begin{DL}{allow this much space} \item[ {\tt Map } {\it template result\/} ] If the address matches the {\it template\/}, use the {\it result\/} string from now on for future rules. \item[ {\tt Pass } {\it template\/} ] If the address maches the {\it template\/}, use it as it is, porocessing no further rules. \item[ {\tt Pass } {\it template result\/} ] If the string matches the {\it template\/}, use the {\it result\/} string as it is, processing no futher rules. \item[ {\tt Fail } {\it template\/} ] If the address matches the {\it template\/}, prohibit access, processing no futher rules. \end{DL} The {\it template\/} string may contain wildcards (asterisks) {\tt *}. (Versions earlier than 3.0 support only a single wildcard.) The {\it result\/} string may have wildcards only if the {\it template\/} has them. In this case they expand to matched strings in respective order. \par {\bf Whitespace, (literal) asterisks and backslashes} are allowed in templates if they are preceded by a backslash. \par {\bf The tilde character} (see user-supported directories) just after a slash (in other words in the beginning of a directory name) has to be explicitly matched, i.e. wildcard does not match it. \par When matching, \begin{itemize} \item Rules are scanned from the top of the file to the bottom. \item If a request matches a {\tt Map} template exactly, the result string is used instead of the original string and applied to successive rules. \item If the request maches a {\tt Map} {\it template\/} with wildcard, then the text of the request which matches the wildcard is inserted in place of the wildcard in the {\it result\/} string to form the translated request. If the result string has no wildcard, it is used as it is. \item When a {\tt Map} substitution takes place, the rule scan continues with the next rule using the new string in place of the request. This is not the case if a {\tt Pass} or {\tt Fail} is matched: they terminate the rule scan. \end{itemize} \par \subsection{Redirecting Requests Elsewhere} When documents, or entire trees of documents, are moved from one server to another, you can use {\tt Redirect} rule to tell {\tt httpd} to redirect the request to another server. If the client program is smart enough user won't even notice that the document is retrieved from a different server. \begin{DL}{allow this much space} \item[ {\tt Redirect } {\it template result\/} ] Document matching {\it template\/} is redirected to {\it result\/}, which must be a {\bf full URL} (i.e. containing {\tt http:} and the host name). \end{DL} \subsubsection{Example} \begin{verbatim} Redirect /hypertext/WWW/* http://www.cern.ch/WebDocs/* \end{verbatim} This redirects everything starting with {\tt /hypertext/WWW} to host {\tt www.cern.ch} into virtual directory {\tt /WebDocs}. For example, {\tt /hypertext/WWW/TheProject.html} would be redirected to {\tt http://www.cern.ch/WebDocs/TheProject.html}. \par \par \subsection{Setting Up User Authentication and Document Protection} Documents are protected by {\tt Protect} and {\tt DefProt} rules. Their syntax is the following: \begin{DL}{allow this much space} \item[ {\tt DefProt } {\it template \/} {\it setup-file\/} {\tt \lbrack }{\it uid.gid\/}{\tt \rbrack } ] Any document matching the {\it template\/} is associated with protection {\it setup-file.\/} The documents are not yet taken to be protected, but they may become protected by an existing access control list file in the same directory as the requested file, or by later matching a {\tt Protect} rule. If that {\tt Protect} rule doesn't specify {\it setup-file\/}, the one from the latest {\tt DefProt} rule is used.\par \item[ {\tt Protect } {\tt \lbrack }{\it template setup-file\/} {\tt \lbrack }{\it uid.gid\/}{\tt \rbrack \rbrack } ] Any document matching {\it template\/} is protected. The type of protection is defined in finer detail in {\it setup-file.\/} \par If {\it setup-file\/} is not specified the one from previous matched {\tt DefProt} rule will be used. If none have matched access to the file is forbidden. \end{DL} {\it setupfile\/} is always a full pathname for the protection setup file which specifies the actual protection parameters. \par Setup file can be omitted from {\tt Protect} rule, but it is obligatory in {\tt DefProt} rule. If setup file is omitted it is not possible to give the {\it uid.gid\/} part, either. \par {\it uid.gid\/} are the Unix user id and group id (either by name or by number, separated by a dot) to which the server should change when serving the request. These are only meaningful when the server is running as {\tt root.} If they are missing they default to {\it nobody.nogroup\/}.\par {\bf Note:} Uid and gid are inherited from {\tt DefProt} rule to {\tt Protect} rule {\bf only} when the {\it setup-file\/} is also inherited. If {\it setup-file\/} is specified for {\tt Protect} rule but {\it uid.gid\/} is not, they default to {\it nobody.nogroup\/} regardless of the previous {\tt DefProt} rule. \par This is to avoid accidentally running the server under wrong user id with wrong setup file. This information should logically go into the protection setup file, but for safety reasons it cannot be done, because a non-trustworthy collaboration could specify it to be {\tt root}. This way only the main {\tt webmaster} can control user and group ids. \par \par \subsection{Executable Server Scripts} Document address is mapped into a script call by {\tt Exec} rule: \begin{verbatim} Exec template script \end{verbatim} {} In both {\it template\/} and {\it script\/} there {\bf must be a {\tt *} wildcard, that matches everything starting from the script filename.} This is to enable {\tt httpd} to know what is the script name and what is the extra path information to be passed to the script.\par \subsubsection{Example} You want to map everything starting with {\tt /your/url/doit} to execute the script {\tt /usr/etc/www/htbin/doit.} You do this by saying: \begin{verbatim} Exec /your/url/* /usr/etc/www/htbin/* \end{verbatim} Here asterisk mathes the script name {\tt doit} (and everything else that follows it). Usually people use some fixed keyword in front of the pathname in URL to point out that the document is actually a script call. Often this keyword is {\tt /htbin}. That is, usually your {\tt Exec} rule looks like this: \begin{verbatim} Exec /htbin/* /usr/etc/www/htbin/* \end{verbatim} and all the URLs pointing to the scripts start with {\tt /htbin}, for example {\tt /htbin/doit} in the previous example. \par \par \subsubsection{Historical Note (HTBin Rule)} CERN {\tt httpd} versions 2.13 and 2.14 had a hard-coded handling of URL pathnames starting {\tt /htbin} that mapped them to scripts in a directory specified via {\tt HTBin} rule: \begin{verbatim} HTBin /your/htbin/directory \end{verbatim} This is still handled automatically by {\tt httpd}, by translating it to its equivalent {\tt Exec} form: \begin{verbatim} Exec /htbin/* /your/htbin/directory/* \end{verbatim} Always use {\tt Exec} instead $--$ it is more general. \par \par \section{{} Suffix Definitions for CERN httpd} {\tt cern\_httpd} uses suffixes to discover the content-type, content-encoding and content-language of a file. Default values are so extensive that {\tt httpd} knows the usual file types. The following configuration directives can be used to add new suffix bindings and override existing defaults: \begin{itemize} \item {\tt AddType} - Filename suffix mappings to MIME Content-Types \item {\tt AddEncoding} - Filename suffix mappings to MIME Content-Encodings \item {\tt AddLanguage} - Multilanguage support, suffix mappings to different Content-Languages \item {\tt SuffixCaseSense} - Set suffix case sensitivity \end{itemize} \par \subsection{Binding Suffixes to MIME Content-Types} As well as any mapping lines in the rule file, the rule file may be used to define the data types of files with particular suffixes. CERN {\tt httpd} has an extensive set of predefined suffixes, so usually you don't need to specify any. \par The syntax is: \begin{verbatim} AddType .suffix representation encoding [quality] \end{verbatim} The parameters are as follows: \begin{DL}{allow this much space} \item[{\it suffix\/}] The last part of the filename. There are two special cases. {\tt *.*} matches to all files which have not been matched by any explicit suffixes but do contain a dot. {\tt *} by itself matches to any file which does not match any other suffix. \par \item[{\it representation\/}] A MIME Content-Type style description of the repreentation in fact in use in the file. See the HTTP spec. This need not be a real MIME type - it will only be used if it matches a type given by a client. \par \item[{\it encoding\/}] A MIME content transfer encoding type. Much more limited in variety than representations, basically whether the file is ASCII (7bit or 8bit) or binary. A few other encodings are allowed, and maybe extension to compression. \par \item[{\it quality\/}] Optional. A floating point number between 0.0 and 1.0 which determines the relative merits of files {\tt xxx.*} which differ in their suffix only, when a link to {\tt xxx.multi} is being resolved. Defaults to 1.0. \par \end{DL} \subsubsection{Examples} \begin{verbatim} AddType .html text/html 8bit 1.0 AddType .text text/plain 7bit 0.9 AddType .ps application/postscript 8bit 1.0 AddType *.* application/binary binary 0.1 AddType * text/plain 7bit \end{verbatim} \par \subsubsection{Historical Note (Suffix Directive)} {\tt AddType} was previously called {\tt Suffix.} The old name is still understood, but may be misleading since suffixes are also used to determine Content-Encoding and language. Always use {\tt AddType} instead. \par \par \subsection{Binding Suffixes to MIME Content-Endocings} Suffixes are also used to determine the Content-Encoding of a file ({\tt .Z} suffix for {\tt x-compressed}, for example). Syntax is: \begin{verbatim} AddEncoding .suffix encoding \end{verbatim} \subsubsection{Example} \begin{verbatim} AddEncoding .Z x-compress \end{verbatim} \par \subsection{Multilanguage Support} Multilanguage support is also built on using suffixes to determine the language of a document. Suffix is bound to a language by {\tt AddLanguage} rule ({\tt .en} suffix for english, for example). Syntax is: \begin{verbatim} AddLanguage .suffix encoding \end{verbatim} \subsubsection{Examples} \begin{verbatim} AddLanguage .en en AddLanguage .uk en_UK \end{verbatim} \par \subsection{Suffix Case Sensitivity} Suffix case sensitivity is by default {\it off.\/} You can make suffixes case sensitive with {\tt SuffixCaseSense} directive: \begin{verbatim} SuffixCaseSense On \end{verbatim} \par \section{{} Accessory Scripts} In addition to having a fully configurable CGI script interface to handle form requests, CERN {\tt httpd} has a few special directives to handle certain tasks always via CGI scripts: \begin{itemize} \item keyword searches \item general {\tt POST} \item general {\tt PUT} \item general {\tt DELETE} \end{itemize} \par \subsection{Keyword Search Facility} Server automatically calls a script to perform search, if the {\bf absolute pathname} of search script is supplied by a {\tt Search} directive in the configuration file: \begin{verbatim} Search /search/script/pathname \end{verbatim} This script is called with the vital information in the following CGI environment variables: \begin{DL}{allow this much space} \item[ {\tt PATH\_INFO} ] contains the virtual URL of the file from where the query was issued from. \par \item[ {\tt PATH\_TRANSLTED} ] contains the physical filename of the document corresponding to the virtual URL in {\tt PATH\_INFO}. \par \item[ {\tt QUERY\_STRING} ] contains the (URL-encoded) keywords, which are also available decoded as command line parameters, one in each of {\tt argv\lbrack 1\rbrack }, {\tt argv\lbrack 2\rbrack }, ... \par \end{DL} Search script must conform to CGI/1.1 rules, that is, it has to start its output with a MIME header {\bf followed by a blank line}, after which comes the actual document. MIME header {\bf must} contain either a {\tt Location: } field, or a {\tt Content-Type: } field, typically: \begin{verbatim} Content-Type: text/html \end{verbatim} if the document is an HTML document. \par \par \subsection{General POST Method Handler Script} {\tt POST} requests are handled by calling the script defined by {\tt POST-Script} directive: \begin{verbatim} POST-Script /absolute/path/post-handler \end{verbatim} POST handler script is called in the normal CGI manner, and its output must be CGI compliant. \par {} Only such {\tt POST} requests are handled by the POST handler that haven't already matched an {\tt Exec} rule (which causes a specified script to be called). \par \par \subsection{General PUT Method Handler Script} {\tt PUT} requests are handled by calling the script defined by {\tt PUT-Script} configuration directive: \begin{verbatim} PUT-Script /absolute/path/put-handler \end{verbatim} PUT handler script is called in the normal CGI manner, and its output must be CGI compliant. \par {} By default {\tt PUT} method is disabled; you must explicitly enable it in the configuration file: \begin{verbatim} Enable PUT \end{verbatim} This is to enhance security. \par {} Since {\tt PUT} can be a very dangerous method because it allows files to be written back to the server, it is not possible to use {\tt PUT} without access authorization module being activated. This means that you have to have at least a {\tt DefProt} rule specifying a default protection setup, which then in turn defines the {\tt PutMask} containing the list of allowed users and hosts to perform PUT operation. \par \par \subsection{General DELETE Method Handler Script} {\tt DELETE} requests are handled by calling the script defined by {\tt DELETE-Script} configuration directive: \begin{verbatim} DELETE-Script /absolute/path/put-handler \end{verbatim} DELETE handler script is called in the normal CGI manner, and its output must be CGI compliant. \par {} By default {\tt PUT} method is disabled; you must explicitly enable it in the configuration file: \begin{verbatim} Enable DELETE \end{verbatim} This is to enhance security. \par {} Since {\tt DELETE} can be a very dangerous method because it allows files to be deleted from the server, it is not possible to use {\tt DELETE} without access authorization module being activated. This means that you have to have at least a {\tt DefProt} rule specifying a default protection setup, which then in turn defines the {\tt DeleteMask} containing the list of allowed users and hosts to perform DELETE operation. \par \par \section{{} Directory Browsing} By default references to directories which don't include a welcome page cause {\tt httpd} to generate a hypertext view of the directory listing. There are numerous configuration directives controlling this feature: \begin{itemize} \item {\tt DirAccess} - Enable/Selective/Disable directory listings \item {\tt DirReadme} - Configure/disable README-feature \item Controlling the appearance of directory listings: \begin{itemize} \item {\tt DirShowIcons} - Show icons in directory listings \item {\tt DirShowDate} - show last-modified date \item {\tt DirShowSize} - show file sizes \item {\tt DirShowBytes} - show byte count for small files \item {\tt DirShowDescription} - show descriptions for files \item {\tt DirShowMaxDescrLength} - maximum description length \item {\tt DirShowBrackets} - use brackets around ALTernative text used instead of an icon \item {\tt DirShowMinLength} - minimum width to reserve for filenames \item {\tt DirShowMaxLength} - maximum width to reserve for filenames \item {\tt DirShowHidden} - show also files starting with a dot (hidden Unix files) \item {\tt DirShowOwner} - show owner of the file \item {\tt DirShowGroup} - show group of the file \item {\tt DirShowMode} - show permissions of the file \item {\tt DirShowCase} - do sorting in a case-sensitive manner \end{itemize} \item Icons: \begin{itemize} \item {\tt AddIcon} - bind icon URL to a MIME Content-Type or Content-Encoding \item {\tt AddBlankIcon} - icon URL used in the heading of the listing to align it \item {\tt AddUnknownIcon} - icon URL for unknown file types \item {\tt AddDirIcon} - icon URL for directories \item {\tt AddParentIcon} - icon URL for parent directory \end{itemize} \end{itemize} \par \subsection{Controlling Directory Browsing} \begin{DL}{allow this much space} \item[{\tt DirAccess on}] Enable directory browsing in all directories (which are not forbidden by rules). Synonym with {\tt -dy} command line option. {\it Default.\/}\par \item[{\tt DirAccess off}] Disable directory browsing. Synonym with {\tt -dn} command line option. \par \item[{\tt DirAccess selective}] Enable selective directory browsing - only directories containing the file {\tt .www\_browsable} are allowed. Synonym with {\tt -ds} command line option. \par \end{DL} \par \par \subsection{README Feature} \begin{DL}{allow this much space} \item[{\tt DirReadme top}] For any browsable directeory containing a {\tt README} file, include the text at the top of the directory listing. Synonym with {\tt -dt} command line option. {\it Default.\/} \par \item[{\tt DirReadme bottom}] Same as previous, but contents of {\tt README} appear on the bottom. Synonym with {\tt -db} command line option. \par \item[{\tt DirReadme off}] Disables the {\tt README} inclusion feature. Synonym with {\tt -dr} command line option. \par \end{DL} \par \par \subsection{Controlling The Look of Directory Listings} The following {\tt On/Off} directives control how the directory listings look like. The default is to show icons, use brackets around ALTernaltive text, show last-modifid, size and description, and allow filename field width to vary between 15-22 characters, and reserve 25 characters for description. \par \begin{DL}{allow this much space} \item[ {\tt DirShowIcons} ] Generate inlined image calls in front of each line. Icons visualize the content-type of the file, and they are defined by {\tt AddIcon} configuration directive. {\em Default.\/} \par \item[ {\tt DirShowDate} ] Show last modification date. {\em Default.\/} \par \item[ {\tt DirShowSize} ] Show the size of files. {\em Default.\/} \par \item[ {\tt DirShowBytes} ] By default files smaller than 1K are shown as just 1K. Setting this directive to {\tt On} will cause the exact byte count to appear. \par \item[ {\tt DirShowDescription} ] Show description if available. {\em Default.\/} \par At the time of release of 2.17 there was no consensus about where the descriptions come from, and the mechanism is currently undocumented. For HTML files description it the TITLE element; for other files the description field is left empty. \par \item[ {\tt DirShowMaxDescrLenght} ] The maximum number of characters to show in the description field. \par \item[ {\tt DirShowBrackets} ] Use brackets around ALTernative text used by browsers not capable of displaying images. {\em Default.\/} \par \item[ {\tt DirShowHidden} ] Show hidden Unix files (the ones starting with a dot). \par \item[ {\tt DirShowOwner} ] Show the owner of the file. \par \item[ {\tt DirShowGroup} ] Show the group of the file. \par \item[ {\tt DirShowMode} ] Show the permissions of files. \par \item[ {\tt DirShowCase} ] Sort entries in a case-sensitive manner, i.e. all capital letters before lower-case letters. \par \end{DL} \par \subsection{Filename Length} There is a minimum and maximum width for the filename field. Entries longer than the maximum value will be truncated. Default values are 15 and 25, and they can be changed with these directives: \begin{DL}{allow this much space} \item[ {\tt DirShowMinLength } {\it num\/} ] At least this amount of characters is always reserved for filenames. If the longest filename in the directory is longer than {\it num\/} the field will be extended, but no more than the maximum limit (see next directive).\par \item[ {\tt DirShowMaxLength } {\it num\/} ] Filenames longer than {\it num\/} will be truncated to fit in length. \par \end{DL} \subsubsection{Example} The default values would be set by saying: \begin{verbatim} DirShowMinLength 15 DirShowMaxLength 25 \end{verbatim} \par \section{ {} Icons In The Directory Listings} {\tt cern\_httpd} directory icons are used, if enabled, for both regular directory listings, and FTP listings (when runnins as a proxy). \par \begin{itemize} \item {\tt AddIcon} - bind icon URL to a MIME Content-Type or Content-Encoding \item {\tt AddBlankIcon} - icon URL used in the heading of the listing to align it \item {\tt AddUnknownIcon} - icon URL for unknown file types \item {\tt AddDirIcon} - icon URL for directories \item {\tt AddParentIcon} - icon URL for parent directory \end{itemize} These directives are specified in the configuration file. \par \par \subsection{AddIcon Directive} The {\tt AddIcon} directive binds an icon to a MIME Content-Type or Content-Encoding: \begin{verbatim} AddIcon icon-url ALT-text template \end{verbatim} \begin{DL}{allow this much space} \item[ {\it icon-url\/} ] is the URL of the icon. \par \item[ {\it ALT-text\/} ] is the alternative text to use on character terminal browsers. \par \item[ {\it template\/} ] is either a Content-Type template or a Content-Encoding template. Content-Type template must always contain a slash, whereas Content-Encoding template never has it. \par \end{DL} The following important remarks serve also as examples. \par \subsubsection{{} CERN httpd as a Normal HTTP Server} Understand that the {\it icon-url\/} is a virtual URL - one that will be translated through the rules. Therefore you must make sure that your configuration rules allow the icon URLs to be passed, e.g.: \begin{verbatim} AddIcon /icons/UNKNOWN.gif ??? */* AddIcon /icons/TEXT.gif TXT text/* AddIcon /icons/IMAGE.gif IMG image/* AddIcon /icons/SOUND.gif AU audio/* AddIcon /icons/MOVIE.gif MOV video/* AddIcon /icons/PS.gif PS application/postscript Pass /icons/* /absolute/icon/dir/* ...other rules... \end{verbatim} \subsubsection{{} CERN httpd as a Proxy} When using {\tt httpd} as a proxy the icon URL {\bf must be} an absolute URL pointing to your server; otherwise clients would translate it relative to the remote host. \par {\bf Furthermore,} you must have a mapping from this absolute URL to your local file system, e.g.: \begin{verbatim} AddIcon http://your.server/icons/UNKNOWN.gif ??? */* AddIcon http://your.server/icons/TEXT.gif TXT text/* AddIcon http://your.server/icons/IMAGE.gif IMG image/* AddIcon http://your.server/icons/SOUND.gif AU audio/* AddIcon http://your.server/icons/MOVIE.gif MOV video/* AddIcon http://your.server/icons/PS.gif PS application/postscript Pass http://your.server/icons/* /absolute/icon/dir/* Pass /icons/* /absolute/icon/dir/* Pass http:* Pass ftp:* Pass gopher:* \end{verbatim} {} Both the full and partial icon URLs are {\tt Pass}'ed because smart clients may be configured to connect to local servers directly, instead of through the proxy, and in that case the proxy server (which is then just a normal HTTP server from client's point of view) will be requested for {\tt /icons/...} instead of {\tt http://your.server/icons/...}. The proxy server has no way of knowing which will happen. \par \par \subsection{Icons in Gopher Listings} There are special internal (to {\tt httpd}) MIME content types that can be bound to icons for gopher listings (the names should be self-explanatory): \begin{itemize} \item {\tt application/x-gopher-index} \item {\tt application/x-gopher-cso} \item {\tt application/x-gopher-telnet} \item {\tt application/x-gopher-tn3270} \item {\tt application/x-gopher-duplicate} \end{itemize} \par \subsection{Special Icons} {\tt httpd} needs some special icons: \begin{DL}{allow this much space} \item[ {\tt AddBlankIcon} ] Icon URL used in the heading of the listing to align it. This is typically a blank icon, but may contain some nice image that you wish to have on top of all your listings. The only criterion is that it must be the same size as the other icons. \par \item[ {\tt AddUnknownIcon} ] Icon URL used for unknown file types, i.e. files for which no other icon binding applies. If you have an exhaustive set of {\tt AddIcon} directives this needs not be used. \par \item[ {\tt AddDirIcon} ] Icon URL for directories. \par \item[ {\tt AddParentIcon} ] Icon URL for parent directory. \par \end{DL} \subsubsection{Example For a Regular HTTP Server} {} Remember to {\tt Pass} the icon URLs! \par \begin{verbatim} AddBlankIcon /icons/BLANK.gif AddUnknownIcon /icons/UNKNOWN.gif ??? AddDirIcon /icons/DIR.gif DIR AddParentIcon /icons/PARENT.gif UP Pass /icons/* /absolute/icon/dir/* ...other rules... \end{verbatim} \subsubsection{Example For a Proxy Server} {} Icon URLs {\bf must be absolute URLs}, and you must have a mapping from the absolute form to local form, and remember to {\tt Pass} them: \begin{verbatim} AddBlankIcon http://your.server/icons/BLANK.gif AddUnknownIcon http://your.server/icons/UNKNOWN.gif ??? AddDirIcon http://your.server/icons/DIR.gif DIR AddParentIcon http://your.server/icons/PARENT.gif UP Pass http://your.server/icons/* /absolute/icon/dir/* Pass /icons/* /absolute/icon/dir/* Pass http:* Pass ftp:* Pass gopher:* \end{verbatim} \par \section{{} Logging Control In CERN httpd} {\tt cern\_httpd} logs all the incoming requests to an access log file. It also has an error log where internal server errors are logged. \begin{itemize} \item {\tt AccessLog} - Set access log file name \item {\tt ErrorLog} - Set error log file name \item {\tt LogFormat} - Set access log file format \item {\tt LogTime} - Set time zone for log files \item {\tt NoLog} - No log entries for listed hosts/domains \item {\tt CacheAccessLog} - Log cache accesses to a different log file \end{itemize} \par \subsection{Access Log File} Access log file contains a log of all the requests. The name of the log file is spesified either by {\tt -l }{\it logfile\/} command line option, or with {\tt AccessLog} directive: \begin{verbatim} AccessLog /absolute/path/logfile \end{verbatim} \par \subsection{Error Log File} Error log contains a log of errors that might prove useful when figuring out if something doesn't work. Error log file name is set by {\tt ErrorLog} directive: \begin{verbatim} ErrorLog /absolute/path/errorlog \end{verbatim} If error log file is not specified, it defaults to access log file name with {\tt .error} extension. If the filename extension already exists, {\tt .error} will replace it. \par \par \subsection{Log File Format} Previously every server used to have its own logfile format which made it difficult to write general statistics collectors. Therefore there is now a {\em common logfile format\/} (which will eventually become the default). Currently it is enabled by \begin{verbatim} LogFormat Common \end{verbatim} The old CERN {\tt httpd} format can be used by \begin{verbatim} LogFormat Old \end{verbatim} \par \subsection{Log Time Format} Times in the log file are by default local time. That can be changed to be GMT time by {\tt LogTime} directive: \begin{verbatim} LogTime GMT \end{verbatim} Default is: \begin{verbatim} LogTime LocalTime \end{verbatim} \par \subsection{Suppressing Log Entries For Certain Hosts/Domains} It's not always necessary to collect log information of accesses made by local hosts. The {\tt NoLog} directive can be used to prevent log entry being made for hosts matching a given IP number or host name template: \begin{verbatim} NoLog template \end{verbatim} \subsubsection{Examples} \begin{verbatim} NoLog 128.141.*.* NoLog *.cern.ch NoLog *.ch *.fr *.it \end{verbatim} \par \section{{} Timeout Settings} Something may go wrong with the connection to the client causing {\tt httpd} to hang infinitely doing nothing. This can be avoided by setting timeouts on different tasks that the server performs. All of these timeouts have relatively good default values by default and they don't usually need to be changed. \par All the times for these directives are of form: \begin{verbatim} 45 secs 10 mins 2 mins 30 secs 1 hour \end{verbatim} \par \subsection{InputTimeOut} {\tt InputTimeOut} diretictive specifies the time to wait for the client to send the request (the MIME-header part of it, not the message body). Default value is: \begin{verbatim} InputTimeOut 2 mins \end{verbatim} \par \subsection{OutputTimeOut} {\tt OutputTimeOut} diretictive specifies the time to allow for sending the response. Default value is: \begin{verbatim} OutputTimeOut 20 mins \end{verbatim} If you are serving huge files for clients behind slow connections you may want to increase this value if you hear of connections being cut in the middle of transfer. \par \par \subsection{ScriptTimeOut} {\tt ScriptTimeOut} diretictive specifies the time to allow for server scripts to finish. If a script doesn't return in the time specified {\tt httpd} will send {\tt TERM} and {\tt KILL} signals to it (with 5 seconds in between to let scripts do cleanup upon exit). Default value is: \begin{verbatim} ScriptTimeOut 5 mins \end{verbatim} \par \section{{} Proxy Caching} When {\tt cern\_httpd} is run as a proxy it can perform caching of the documents retrieved from remote hosts to make futher requests faster. \par \begin{itemize} \item {\tt Caching} - Turn caching on \item {\tt CacheRoot} - Set cache root directory for a proxy server \item {\tt CacheSize} - Specify cache size (in megabytes) \item {\tt NoCaching} - No caching for URLs matching a given mask \item {\tt CacheOnly} - Cache only if URL matches a given set of URLs \item {\tt CacheClean} - Remove everything older than this (in days) \item {\tt CacheUnused} - Remove if has been unused this long (in days) \item {\tt CacheDefaultExpiry} - Default expiry time if not given by remote server (in days) \item {\tt CacheLastModifiedFactor} - Factor used in approximating expiry date \item {\tt CacheTimeMargin} - Time accuracy between hosts \item {\tt CacheNoConnect} - Standalone cache mode - no external document retrievals \item {\tt CacheExpiryCheck} - Turn off expiry checking for standalone operation \item {\tt Gc} - Enable and disable garbage collection \item {\tt GcDailyGc} - Time for daily garbage collection \item {\tt GcTimeInterval} - Interval to do cache garbage collection (in hours) \item {\tt GcReqInterval} - Number of requests between garbage collections \item {\tt GcMemUsage} - Garbage collector memory usage directive \item {\tt CacheLimit\_1} - First cache file size limit (kilobytes) \item {\tt CacheLimit\_2} - Second cache file size limit (kilobytes) \item {\tt CacheLockTimeOut} - Break cache locks after this timeout \item {\tt CacheAccessLog} - Log cache accesses to a different log file \end{itemize} \par \subsection{Turning Caching On and Off} Caching is normally turned implicitly on by specifying the Cache Root Directory, but it can be explicitly turned on and off by {\tt Caching} directive: \begin{verbatim} Caching On \end{verbatim} \par \subsection{Setting Cache Directory} Caching is enabled on a server running as a gateway (proxy) by {\tt CacheRoot} directive, which is used to set the absolute path of the cache directory: \begin{verbatim} CacheRoot /absolute/cache/directory \end{verbatim} \par \subsection{Cache Size} {\tt CacheSize} directive sets the maximum cache size in megabytes. Default value is 5MB, but its preferable to have several megabytes of cache, like 50-100MB, to get best results. Cache may, however, temporarily grow a few megabytes bigger than specified. \subsubsection{Example} \begin{verbatim} CacheSize 20 M \end{verbatim} sets cache size to 20 megabytes. \par \par \subsection{NoCaching} URLs matching a template given by {\tt NoCaching} directive will never be cached, e.g.: \begin{verbatim} http://really.useless.site/* \end{verbatim} From version 3.0 on templates can have any number of wildcard characters {\tt *}. \par \par \subsection{CacheOnly} Only the URLs matching templates given by {\tt CacheOnly} directives will be cached, e.g.: \begin{verbatim} http://really.important.site/* \end{verbatim} From version 3.0 on templates can have any number of wildcard characters {\tt *}. \par \par \subsection{Maximum Time to Keep Cache Files} All cached documents matching a specified template and that are older than specified by {\tt CacheClean} directive will be removed. This value overrides expiry date in that no file can be stored longer than this value specifies, regardless of expiry date. \subsubsection{Examples} \begin{verbatim} CacheClean http:* 1 month CacheClean ftp:* 14 days CacheClean gopher:* 5 days 12 hours \end{verbatim} \par \subsection{Maximum Time to Keep Unused Files} Cache files matching a template and having been unused longer than specified by {\tt CacheUnused} directive will be removed. \subsubsection{Examples} \begin{verbatim} CacheUnused * 4 days 12 hours CacheUnused http://info.cern.ch/* 7 days CacheUnused ftp://some.server/* 14 days \end{verbatim} Note that the last matching specification will have precedence; therefore HTTP files from {\tt info.cern.ch} will be kept 7 days, and {\bf not} 4.5 days. \par \par \subsection{Default Expiry Time} Files for which the server gave neither {\tt Expires:} nor {\tt Last-Modified:} header will be kept at most the time specified by {\tt CacheDefaultExpiry} directive. Default values are zero for HTTP (script replies shouldn't be cached), and 1 day for FTP and Gopher. \par \subsubsection{Example} \begin{verbatim} CacheDefaultExpiry ftp:* 1 month CacheDefaultExpiry gopher:* 10 days \end{verbatim} {} Default expiry for HTTP will almost always cause problems because there are currently many scripts that don't give an expiry date, yet their output expires immediately. Therefore, it is better to keep the default value for {\tt http:} in zero. \par \par \subsection{CacheLastModifiedFactor} Currently HTTP servers give usually only the {\tt Last-Modified} time, but not {\tt Expires} time. {\tt Last-Modified} can often be successfully used to approximate expiry date. {\tt CacheLastModifiedFactor} gives the fraction of time since last modification to give the remaining time to be up-to-date. \par Default value is {\tt 0.1}, which means that e.g. file modified 20 days ago will expire in 2 days. \par \subsubsection{Examples} \begin{verbatim} CacheLastModifiedFactor 0.2 \end{verbatim} would cause files modified 5 months ago to expire after one month. \par This feature can be turned off by specifying: \begin{verbatim} CacheLastModifiedFactor Off \end{verbatim} \par \subsection{CacheTimeMargin} Sometimes inaccurate times on other hosts cause confusion in caching. It often also makes sense not to cache documents that will expiry in a couple of minutes anyway. {\tt CacheTimeMargin} defines this time margin, by default: \begin{verbatim} CacheTimeMargin 2 mins \end{verbatim} No document expiring in less than two minutes will be written to disk. \par \par \subsection{CacheNoConnect} This directive puts proxy to standalone cache mode, i.e. only the documents found in the cache are returned, and ones no in the cache will return error rather than connection to the outside world. This is useful for demo-purposes and in other cases without network connection: \begin{verbatim} CacheNoConnect On \end{verbatim} Default setting is naturally {\tt Off}. \par This directive is typically used with expiry checking also turned {\tt Off}. \par \par \subsection{CacheExpiryCheck} If (for demo-reasons etc) it's desired that the proxy always returns documents from the cache, even if they have expired, {\tt CacheExpiryCheck} can be turned off: \begin{verbatim} CacheExpiryCheck Off \end{verbatim} Default setting is {\tt On}, meaning that proxy never returns an expired document. \par This is usually used in standalone cache mode ({\tt CacheNoConnect} diretive turned {\tt On}). \par \par \subsection{Garbage Collection} When caching is enabled garbage collection is also activated by default. This can be explicitly turned off with {\tt Gc} directive: \begin{verbatim} Gc Off \end{verbatim} \par \subsection{When to Do Garbage Collection} Garbage collection is launched right away when cache size limit is reached. However, to keep cache smaller it might be desirable to remove expired files even if there is still cache space remaining. It is possible to to launch garbage collection at a certain time, usually outside the busy hours:l \begin{verbatim} GcDailyGc time \end{verbatim} \par {\tt GcDailyGc} specifies the time to do daily garbage collection, normally during the night. Default value is 3:00. Daily garbage collection can be disabled by specifying {\tt Off}. \par \subsubsection{Example} Default value would be specified as: \begin{verbatim} GcDailyGc 3:00 \end{verbatim} Another example: turning daily gc off: \begin{verbatim} GcDailyGc Off \end{verbatim} \par \subsection{Memory Usage of Garbage Collector} Garbage collector performs its job best if if can read information about the whole cache into memory at once. This is not possible if the machine doesn't have enough main memory. \par {\tt GcMemUsage} directive advices garbage collector about how much memory to use. You may imagine this is the number of kilobytes to use for gc data, but it may vary greatly according to dynamic things, like the directory structure of cached files. \par Default is 500; if gc fails because memory runs out make this smaller. If your machine has so much memory that it just can't run out, make this very big. \par \subsubsection{Example} \begin{verbatim} GcMemUsage 100 \end{verbatim} if you have very little memory. \par \par \subsection{Cache File Sizes} There are two limits controlling the size factor of a file when its value is being calculated. {\tt CacheLimit\_1} sets the lower limit; under this all the files have equal size factor. {\tt CacheLimit\_2} sets up higher limit; files bigger than this get extremely bad size factor (meaning they get removed right away because they are too big). \par Sizes are specified in kilobytes, and defaults values are 200K and 4MB, respectively. \subsubsection{Examples} \begin{verbatim} CacheLimit_1 200 K CacheLimit_2 4000 K \end{verbatim} would set the same values as the defaults, 200K and 4MB. \par \par \subsection{Cache Lock Timeout} During retrieval cache files are locked. If something goes wrong a lock file may be left hanging. {\tt CacheLockTimeOut} directive sets the amount of time after which lock can be broken. Time is specified like all the other times in the configuration file, and default value is 20 minutes, the same as default {\tt OutputTimeOut}. {\bf CacheLockTimeOut should never be less than OutputTimeOut!} \subsubsection{Example} \begin{verbatim} CacheLockTimeOut 30 mins \end{verbatim} would set lock timeout to half an hour. \par \par \subsection{CacheAccessLog} Cache accesses can be logged to a different log file instead of the normal access log. The {\tt CacheAccessLog} directive takes an absolute pathname of the cache access log file: \begin{verbatim} CacheAccessLog /absolute/path/file.log \end{verbatim} \par \section{{} Configuring Proxy To Connect To Another Proxy} If there is a need to make an (inner) proxy cern\_httpd connect to the outside world via another (outer) proxy server, you can use the same environment variables as are used to redirect clients to the proxy to make inner proxy use the outer one: \begin{itemize} \item {\tt http\_proxy} \item {\tt ftp\_proxy} \item {\tt gopher\_proxy} \item {\tt wais\_proxy} \end{itemize} E.g. your (inner) proxy server's startup script could look like this: \begin{verbatim} #!/bin/sh http_proxy=http://outer.proxy.server:8082/ export http_proxy /usr/etc/httpd -r /etc/inner-proxy.conf -p 8081 \end{verbatim} This is a little ugly, so there are also the following directives in the configuration file: \begin{itemize} \item {\tt http\_proxy } {\it http://outer.proxy.server/\/} \item {\tt ftp\_proxy } {\it http://outer.proxy.server/\/} \item {\tt gopher\_proxy } {\it http://outer.proxy.server/\/} \item {\tt wais\_proxy } {\it http://outer.proxy.server/\/} \end{itemize} \par \subsection{no\_proxy} In the same way that clients can specify a set of domains for which the proxy should not be consulted, {\tt httpd} has a {\tt no\_proxy} configuration directive to tell it that it should not connect to another proxy for certain URLs: \begin{verbatim} no_proxy cern.ch,ncsa.uiuc.edu,some.host:8080 \end{verbatim} {} The argument string is a comma-separated list and should {\bf not contain spaces!} \par \par \chapter{{} Configuration File Examples} \begin{DL}{allow this much space} \item[ {\tt httpd.conf} ] sample configuration file for running as a normal HTTP server. \item[ {\tt prot.conf} ] sample configuration file for running as a normal HTTP server with access control. \item[ {\tt proxy.conf} ] sample configuration file for running as a proxy {\bf without caching.} \item[ {\tt caching.conf} ] sample configuration file for running as a proxy {\bf with caching.} \end{DL} \par \par \section{Normal HTTP Server Configuration} \begin{verbatim} # # Sample configuration file for cern_httpd for running it # as a normal HTTP server. # # See: # # # for more information. # # Written by: # Ari Luotonen April 1994 # # # Set this to point to the directory where you unpacked this # distribution, or wherever you want httpd to have its "home" # ServerRoot /where/ever/server_root # # The default port for HTTP is 80; if you are not root you have # to use a port above 1024; good defaults are 8000, 8001, 8080 # Port 80 # # General setup; on some systems, like HP, nobody is defined so # that setuid() fails; in those cases use a different user id. # UserId nobody GroupId nogroup # # Logging; if you want logging uncomment these lines and specify # locations for your access and error logs # # AccessLog /where/ever/httpd-log # ErrorLog /where/ever/httpd-errors LogFormat Common LogTime LocalTime # # User-supported directories under ~/public_html # UserDir public_html # # Scripts; URLs starting with /cgi-bin/ will be understood as # script calls in the directory /your/script/directory # Exec /cgi-bin/* /your/script/directory/* # # URL translation rules; If your documents are under /local/Web # then this single rule does the job: # Pass /* /local/Web/* \end{verbatim} \section{Normal HTTP Server With Access Control} \begin{verbatim} # # Sample configuration file for cern_httpd for running it # as a normal HTTP server WITH access control. # # See: # # # for more information. # # Written by: # Ari Luotonen April 1994 # # # Set this to point to the directory where you unpacked this # distribution, or wherever you want httpd to have its "home" # ServerRoot /where/ever/server_root # # The default port for HTTP is 80; if you are not root you have # to use a port above 1024; good defaults are 8000, 8001, 8080 # Port 80 # # General setup; on some systems, like HP, nobody is defined so # that setuid() fails; in those cases use a different user id. # UserId nobody GroupId nogroup # # Logging; if you want logging uncomment these lines and specify # locations for your access and error logs # # AccessLog /where/ever/httpd-log # ErrorLog /where/ever/httpd-errors LogFormat Common LogTime LocalTime # # User-supported directories under ~/public_html # UserDir public_html # # Protection setup by usernames; specify groups in the group # file [if you need groups]; create and maintain password file # with the htadm program # Protection PROT-SETUP-USERS { UserId nobody GroupId nogroup ServerId YourServersFancyName AuthType Basic PasswdFile /where/ever/passwd GroupFile /where/ever/group GET-Mask user, user, group, group, user } # # Protection setup by hosts; you can use both domain name # templates and IP number templates # Protection PROT-SETUP-HOSTS { UserId nobody GroupId nogroup ServerId YourServersFancyName AuthType Basic PasswdFile /where/ever/passwd GroupFile /where/ever/group GET-Mask @(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu) } Protect /very/secret/URL/* PROT-SETUP-USERS Protect /another/secret/URL/* PROT-SETUP-HOSTS # # Scripts; URLs starting with /cgi-bin/ will be understood as # script calls in the directory /your/script/directory # Exec /cgi-bin/* /your/script/directory/* # # URL translation rules; If your documents are under /local/Web # then this single rule does the job: # Pass /* /local/Web/* \end{verbatim} \section{Proxy Configuration With Caching} The configuration {\bf without caching} is otherwise the same, just leave out all the directives starting with "{\tt Cache}" or "{\tt Gc}". \begin{verbatim} # # Sample configuration file for cern_httpd for running it # as a proxy server WITH caching. # # See: # # # for more information. # # Written by: # Ari Luotonen April 1994 # # # Set this to point to the directory where you unpacked this # distribution, or wherever you want httpd to have its "home" # ServerRoot /where/ever/server_root # # Set the port for proxy to listen to # Port 8080 # # General setup; on some systems, like HP, nobody is defined so # that setuid() fails; in those cases use a different user id. # UserId nobody GroupId nogroup # # Logging; if you want logging uncomment these lines and specify # locations for your access and error logs # # AccessLog /where/ever/proxy-log # ErrorLog /where/ever/proxy-errors LogFormat Common LogTime LocalTime # # Proxy protections; if you want only certain domains to use # your proxy, uncomment these lines and specify the Mask # with hostname templates or IP number templates: # # Protection PROXY-PROT { # ServerId YourProxyName # Mask @(*.cern.ch, 128.141.*.*, *.ncsa.uiuc.edu) # } # Protect * PROXY-PROT # # Pass the URLs that this proxy is willing to forward. # Pass http:* Pass ftp:* Pass gopher:* Pass wais:* # # Enable caching, specify cache root directory, and cache size # in megabytes # Caching On CacheRoot /your/cache/root/dir CacheSize 5 # # Specify absolute maximum for caching time # CacheClean * 2 months # # Specify the maximum time to be unused # CacheUnused http:* 2 weeks CacheUnused ftp:* 1 week CacheUnused gopher:* 1 week # # Specify default expiry times for ftp and gopher; # NEVER specify it for HTTP, otherwise documents generated by # scripts get cached which is usually a bad thing. # CacheDefaultExpiry ftp:* 10 days CacheDefaultExpiry gopher:* 2 days # # Garbage collection controls; daily garbage collection at 3am; # Gc On GcDailyGc 3:00 \end{verbatim} \chapter{{} CERN Server CGI/1.1 Script Support} Server scripts are used to handle searches, clickable images and forms, and to produce synthesized documents on the fly. See calendar and finger gateway for examples. \par \par \section{In This Section...} \begin{itemize} \item Using {\tt Exec} rule to allow scripts \item CGI Interface $--$ Script Input \item CGI Interface $--$ Script Output \item NPH-Scripts $--$ No Parsing of Headers \item Setting up a search script \end{itemize} \par \section{{} Important Note!} CERN {\tt httpd} versions 2.15 and newer have {\bf two} script interfaces. The other one is the official CGI, Common Gateway Interface, which enables scripts to be shared between different server implementations (NCSA server, Plexus, etc). The other one is the original, very easy-to-use, interface, that was introduced in version 2.13. \par {\bf Use of CGI instead of the old interface is strongly encouraged.}\par {\bf IMPORTANT:} If you have, or wish to write, scripts that use the old interface, your script name has to end in {\tt .pp} suffix (comes from "Pre-Parsed"). URLs referring to these scripts should not contain this suffix. This is to make it easier to later upgrade to CGI scripts, so you only need to change the script name in the file system, and not the documents pointing to it. If you absolutely want to use the old interface (which is nice for quick hacks that don't need to be portable), see the doc. \par \par \section{Setting Up httpd To Call Scripts} The server knows that a request is actually a script request by looking at the beginning of the URL pathname. You can specify these special strings in the configuration file {\tt (/etc/httpd.conf)} by {\tt Exec} rules: \begin{verbatim} Exec /url-prefix/* /physical-path/* \end{verbatim} Where {\it /url-prefix/\/} is the special string that signifies a script request, and {\it /physical-path/\/} is the absolute filesystem pathname of the {\bf directory} that contains your scripts. \par \subsection{Example} \begin{verbatim} Exec /htbin/* /usr/etc/cgi-bin/* \end{verbatim} makes URL paths starting with {\tt /htbin} to be mapped to scripts in directory {\tt /usr/etc/cgi-bin.} I.e. requesting \begin{verbatim} /htbin/myscript \end{verbatim} causes a call to script \begin{verbatim} /usr/etc/cgi-bin \end{verbatim} \subsection{Historical Note} In {\tt httpd} versions before 2.15 there was an {\tt HTBin} directive: \begin{verbatim} HTBin /physical-path \end{verbatim} which is now obsolite, but understood by the server to mean \begin{verbatim} Exec /htbin/* /physical-path/* \end{verbatim} Use of {\tt Exec} rule instead is recommended for its generality. \par \par \section{Information Passed to CGI Scripts} CGI scripts get their input mainly from environment variables and standard input (when using {\tt POST} method). Search scripts get keywords also as command line arguments. \par Most important environment variables are: \begin{DL}{allow this much space} \item[{\tt QUERY\_STRING}] The query part of URL, that is, everything that follows the question mark. This string is URL-encoded, meaning that special characters like spaces and newlines are encoded into their hex notation (\%xx), and characters like {\tt + = \&} have a special meaning. The contents of this variable can be easily parsed using the {\tt cgiparse} program. \par \item[{\tt PATH\_INFO}] Extra path information given after the script name, for example with {\tt Exec} rule: \begin{verbatim} Exec /htbin/* /usr/etc/cgi-bin/* \end{verbatim} a URL with path \begin{verbatim} /htbin/myscript/extra/pathinfo \end{verbatim} will execute the script {\tt /usr/etc/cgibin/myscript} with {\tt PATH\_INFO} environment variable set to {\tt /extra/pathinfo}. \par \item[{\tt PATH\_TRANSLATED}] Extra pathinfo translated through the rule system. (This doesn't always make sense.) \par \end{DL} See also NCSA's primer to writing CGI scripts. \par \par \section{Results From Scripts} Scripts return their results either outputting a document to their standard output, or by outputting the location of the result document (either a full URL or a local virtual path). \par \subsection{Outputting a Document} Script result must begin with a {\tt Content-Type:} line giving the document content type, followed by {\bf an empty line}. The actual document follows the empty line. Example: \begin{verbatim} Content-Type: text/html Script test> </HEAD> <BODY> <H1>My First Virtual Document</H1> .... </BODY> \end{verbatim} \par \subsection{Giving Document Location} If the script wants to return an existing document (local or remote), it can give a {\tt Location:} header followed by an empty line: Example: \begin{verbatim} Location: http://info.cern.ch/hypertext/WWW/TheProject.html \end{verbatim} This causes the server to send a redirection to client, which then retrieves that document. If {\tt Location} starts with a slash (is not a full URL), it is taken to be a virtual path for a document on the same machine, and server passes this string right away through the rule system and serves that document as if it had been requested in the first place. In this case clients don't do the redirection, but the server does it "on the fly". \par Example: \begin{verbatim} Location: /hypertext/WWW/TheProject.html \end{verbatim} Understand, that this is a {\bf virtual path}, so after translations it might be, for example, {\tt /Public/Web/TheProject.html}. \par {\bf Important:} Only {\bf full} URLs in {\tt Location} field can contain the {\it \#label\/} part of URL, because that is meant only for the client-side, and the server cannot possibly handle it in any way. \par \par \subsection{NPH-Scripts (No-Parse-Headers)} Script wishing to output the entire HTTP reply (including status line and all response headers) should be named to begin with {\tt nph-} prefix. This makes {\tt httpd} connect script's output stream directly to requesting client reducing the overhead of server needlessly parsing the response headers. \par \subsubsection{Example Of NPH-Script Output} \begin{verbatim} HTTP/1.0 200 Script results follow Server: MyScript/1.0 via CERN/3.0 Content-Type: text/html <HEAD> <TITLE>Just testing...

Output From NPH-Script

Yep, seems to work. \end{verbatim} \par \section{Setting Up A Search Script} There is a special {\tt Search} directive in the configuration file givin the {\bf absolute} pathname of the script performing the search: \begin{verbatim} Search /absolute/path/search \end{verbatim} Every time a document is searched, this script is called with \begin{DL}{allow this much space} \item[Command line] containing the search keywords decoded, one in each of {\tt argv\lbrack 1\rbrack }, {\tt argv\lbrack 2\rbrack }, ... \item[{\tt QUERY\_STRING}] containing the query string encoded, as it came in the URL after the question mark. \item[{\tt PATH\_INFO}] Virtual path of the document that the search was issued from. \item[{\tt PATH\_TRANSLATED}] Absolute filesystem path of the document. \end{DL} Search results are output in the usual way: \begin{verbatim} Content-Type: text/html ...generated document... \end{verbatim} \par \chapter{{} cgiparse Manual} {\tt cgiparse} handles {\tt QUERY\_STRING} environment variable parsing for CGI scripts. It comes with CERN server distributions {\bf 2.15} and newer. \par If the {\tt QUERY\_STRING} environment variable is not set, it reads {\tt CONTENT\_LENGTH} characters from its standard input. \par \par \section{Command Line Options} \subsection{Main Options} \begin{DL}{allow this much space} \item[ {\tt cgiparse -keywords}] Parse {\tt QUERY\_STRING} as search keywords. Keywords are decoded and written to standard output, one per line. \par \item[ {\tt cgiparse -form}] Parse {\tt QUERY\_STRING} as form request. Outputs a string which, when {\tt eval}'ed by Bourne shell, will set shell variables beginning with {\tt FORM\_} appended with field name. Field values are the contents of the variables. \par \item[ {\tt cgiparse -value } {\it fieldname\/}] Parse {\tt QUERY\_STRING} as form request. Prints only the value of field {\it fieldname\/}. \par \item[ {\tt cgiparse -read}] Just read {\tt CONTENT\_LENGTH} characters from {\tt stdin} and write them to {\tt stdout.} \par \item[ {\tt cgiparse -init}] If {\tt QUERY\_STRING} is not defined, read {\tt stdin} and output a string that when {\tt eval}'d by Bourne shell it will set {\tt QUERY\_STRING} to its correct value. This can be used when the same script is used with both {\tt GET} and {\tt POST} method. Typical use in the beginning of Bourne shell script: \begin{verbatim} eval `cgiparse -init` \end{verbatim} After this command the {\tt QUERY\_STRING} environment variable will be set regardless of whether {\tt GET} or {\tt POST} method was used. Therefore {\tt cgiparse} may be called multiple times in the same script (otherwise with {\tt POST} it could only be called once because after that the {\tt stdin} would be already read, and the next {\tt cgiparse} would hang). \par \end{DL} \par \subsection{Modifier Options} \begin{DL}{allow this much space} \item[ {\tt -sep } {\it separator\/}] Specify the string used to separate multiple values. With \begin{itemize} \item {\tt -value} default is newline \item {\tt -form} default is "{\it , \/}" \end{itemize} \par \item[ {\tt -prefix } {\it prefix\/}] \begin{itemize} \item Only with {\tt -form.} Specify the prefix to use when making up environment variable names. Default is "{\it FORM\_\/}". \par \end{itemize} \item[ {\tt -count}] With \begin{itemize} \item {\tt -keywords} outputs the number of keywords \item {\tt -form} outputs the number of unique fields (multiple values are counted as one) \item {\tt -value } {\it fieldname\/} gives the number of values of field {\it fieldname\/} (no such field is zero, one field gives 1, one multiple 2, etc). \end{itemize} \par \item[ {\tt -}{\it number\/} , e.g. {\tt -2}] With \begin{itemize} \item {\tt -keywords} gives {\it n\/}'th keyword \item {\tt -form} gives all the values of {\it n\/}'th field \item {\tt -value } {\it fieldname\/} gives {\it n\/}'th of the multiple values of field {\it fieldname\/} (first value is number 1). \end{itemize} \par \item[ {\tt -quiet}] Suppress all error messages. (Non-zero exit status still indicates error.) \par \end{DL} All options have one-character equivalents: {\tt -k -f -v -r -i -s -p -c -q} \par \par \section{Exit Statuses} \begin{itemize} \item {\tt 0 } Success \item {\tt 1 } Illegal command line \item {\tt 2 } Environment variables not set correctly \item {\tt 3 } Failed to get requested information (no such field, {\tt QUERY\_STRING} contains keywords when form field values requested, etc). \end{itemize} \par \section{Examples} Note: In real life, of course, {\tt QUERY\_STRING} is already set by the server. \par Here {\tt \$} is the Bourne shell prompt. \par \par \subsection{Keyword Search} \begin{verbatim} $ QUERY_STRING="is+2%2B2+really+four%3F" $ export QUERY_STRING $ cgiparse -keywords is 2+2 really four? $ \end{verbatim} \par \subsection{Parsing All Form Fields} \begin{verbatim} $ QUERY_STRING="name1=value1&name2=Second+value%3F+That%27s right%21" $ export QUERY_STRING $ cgiparse -form FORM_name1='value1'; FORM_name2='Second value? That'\''s right!' $ eval `cgiparse -form` $ set ... FORM_name1=value1 FORM_name2=Second value? That's right! ... $ \end{verbatim} \par \subsection{Extracting Only One Field Value} \begin{verbatim} QUERY_STRING as in previous example. $ cgiparse -value name1 value1 $ cgiparse -value name2 Second value? That's right! $ \end{verbatim} \par \chapter{{} cgiutils Manual} {\tt cgiutils} program is provided to make it easier to produce easily a full HTTP1 response header by NPH \lbrack No-Parse-Headers\rbrack scripts. It can also be used to just calculate the {\tt Expires:} header, given the time to live in a human-friendly way, like \begin{verbatim} 1 year 3 months 2 weeks 4 days 12 hours 30 mins 15 secs \end{verbatim} \section{Command Line Options} \begin{DL}{allow this much space} \item[ {\tt cgiutils -version} ] print the version information. \par \item[ {\tt -nodate} ] don't produce the {\tt Date:} header. \par \item[ {\tt -noel} ] don't print the empty line after headers \lbrack in case you want to output other MIME headers yourself after the initial header lines\rbrack . \par \item[ {\tt -status } {\it nnn\/} ] give full HTTP1 response, instead of just a set of HTTP headers, with HTTP status code {\it nnn\/}. \par \item[ {\tt -reason } {\it explanation\/} ] specify the reason line for HTTP1 response \lbrack can only be used with the {\tt -status } {\it nnn\/} options. \par \item[ {\tt -ct } {\it type/subtype\/} ] specify the MIME content-type. \par \item[ {\tt -ce } {\it encoding\/} ] specify the content-encoding \lbrack e.g. {\tt x-compress}, {\tt x-gzip}\rbrack . \par \item[ {\tt -dl } {\it language-code\/} ] specify the content-languge code. \par \item[ {\tt -length } {\it nnn\/} ] specify the MIME content-length value. \par \item[ {\tt -expires} {\it time-spec\/} ] specify the time to live, like {\tt "2 days 12 hours"}, and {\tt cgiutils} will compute the {\tt Expires:} field value \lbrack which is the actual expiry date and time in GMT and in format specified by HTTP spec\rbrack . \par \item[ {\tt -expires now} ] means immediate expiry. Often this is exactly what the scripts should output. \par \item[ {\tt -uri } {\it URI\/} ] specify the {\it URI\/} for the returned document. \par \item[ {\tt -extra } {\it xxx: yyy\/} ] specify an extra header which cannot otherwise be specified for {\tt cgiutils}. \par \end{DL} {} Make sure that you quote the option arguments that are more than one word: \begin{verbatim} cgiutils -expires "2 days 12 hours 30 mins" \end{verbatim} \section{Examples} \begin{verbatim} cgiutils -status 200 -reason "Virtual doc follows" -expires now ==> HTTP/1.0 200 Virtual doc follows MIME-Version: 1.0 Server: CERN/2.17beta Date: Tuesday, 05-Apr-94 03:43:46 GMT Expires: Tuesday, 05-Apr-94 03:43:46 GMT \end{verbatim} {} There is an empty line after the output to mark the end of the MIME header section; if you don't want this \lbrack you want to output some more headers yourself\rbrack , specify the {\tt -noel} (NO-Empty-Line) option. \par Note also that {\tt cgiutils} gives automatically the {\tt Server:} header because it is available in the CGI environment. The {\tt Date:} field is also automatically generated unless {\tt -nodate} option is specified. \par To get only the expires field don't specify the {\tt -status} option. If you don't want the empty line after the header line use also the {\tt -noel} option: \begin{verbatim} cgiutils -noel -expires "2 days" ==> Expires: Thursday, 07-Apr-94 03:44:02 GMT \end{verbatim} \par \chapter{ {} CERN Server Clickable Image Support} CERN Server versions 2.14 and newer have a {\tt htimage} program in the distribution, which is an {\tt /htbin} program handling clicks on sensitive images. For versions 2.15 and newer it is a CGI program (uses the Common Gateway Interface to communicate with {\tt httpd}). See demo. \par \par \section{In This Section...} \begin{itemize} \item {\tt htimage} installation \item Writing documents that contain clickable images \item Image configuration file \item Output of {\tt htimage} \end{itemize} \par \section{Installing htimage Binary} After compiling {\tt htimage} you should move the executable binary to the same directory as your other server scripts are, and remember to set up an exec rule. For example if your scripts are in {\tt /usr/etc/cgi-bin}, you could have an {\tt Exec} rule like this: \begin{verbatim} Exec /htbin/* /usr/etc/cgi-bin/* \end{verbatim} Often {\tt htimage} is one of the most often used scripts, and it would therefore be nice to refer to it with as short a name as possible, like {\tt /img}, so you could have a {\tt Map} rule just before the {\tt Exec}: \begin{verbatim} Map /img/* /htbin/htimage/* Exec /htbin/* /usr/etc/cgi-bin/* \end{verbatim} \par \section{Writing a Document With Clickable Images} To create a clickable image in your HTML document, you'll need to: \begin{itemize} \item specify {\tt ISMAP} in your inlined image call, and \item make that image an anchor, with an {\tt HREF} to the script handling the request {\tt (htimage)} with image configuration file name appended to it. \end{itemize} Each clickable image has to be described to {\tt htimage} via an image configuration file. These files are referred to by the extra path information in the URL causing the call to {\tt htimage}: \begin{verbatim} \end{verbatim} Image configuration file can be: \begin{itemize} \item either a virtual path, that is translated through rule system, \item or an absolute path in your filesystem. \end{itemize} {\tt htimage} will look for both of these (afterall, it gets both {\tt PATH\_INFO} and {\tt PATH\_TRANSLATED} environment variables from {\tt httpd} anyway). \par You can even do some very smart mappings in the rule file to allow very short references to {\tt htimage} and picture configuration files. Let's suppose all your image configuration files are in directory {\tt /usr/etc/images}. Then you can use the following two rules in your server's configuration file (by default {\tt /etc/httpd.conf}): \begin{verbatim} Map /img/* /htbin/htimage/usr/etc/images/* Exec /htbin/* /usr/etc/cgi-bin/* \end{verbatim} In this case you can refer to your image mapper very easily; if you have an image configuration file {\tt Dragons.conf} in {\tt /usr/etc/images} directory, all you need to say in the anchor is this: \begin{verbatim} \end{verbatim} \par \section{Image Configuration File} There are four keywords: \begin{DL}{allow this much space} \item[{\tt default} {\em URL\/}] {\em URL\/} which is used if click is in none of the given shapes. This should always be set! \par \item[{\tt circle} ({\em x\/},{\em y\/}) {\em r\/} {\em URL\/}] Circle with center point {\em (x,y)\/} and radius {\em r\/}. \par \item[{\tt rectangle} ({\em x1\/},{\em y1\/}) ({\em x2\/},{\em y2\/}) {\em URL\/}] Rectangle with (any) two opposite corners having coordinates {\em (x1,y1)\/} and {\em (x2,y2)\/}. \par \item[{\tt polygon} ({\em x1\/},{\em y1\/}) ({\em x2\/},{\em y2\/}) ... ({\em xn\/},{\em yn\/}) {\em URL\/}] Polygon having adjacent vertices {\em (xi,yi)\/}. If the path given is not closed (first and last coordinate pairs aren't the same) the first and last coordinate pairs will be connected by {\tt htimage.} So first point is added also as the last one if necessary. \par \end{DL} These can be abbreviated as {\tt def, circ, rect, poly.} \par Shapes are checked in the order they appear in config file, and the URL corresponding to the first match is returned. If none match, the {\tt default} URL is returned. \par {\em URL\/}s are \begin{itemize} \item either full URLs (with access method, machine name and path), in which case server sends a redirection to client, \item or a partial URL containing only pathname part of it (always starting with a slash), in which case server considers that as the original request, translates it through the rule system, access authorization and serves it normally (faster than sending redirection). \end{itemize} \par \section{Output Produced by htimage} {\tt htimage} prints a single {\tt Location:} field to its {\tt stdout}, or an error message with preceding {\tt Content-Type: text/html} so in fact {\tt htimage} behaves exactly as any other CGI/1.0 program (script), and is not in any way handled specially by the server. Therefore, you can rename {\tt htimage} to whatever you prefer, like we called it {\tt /img} in the above example. \par Server understands this {\tt Location:} field, and either directly sends that file to the client (non-full URL), or sends a redirection to client causing it to fetch the document, maybe even from another machine. \par Note that URLs returned by {\tt htimage} may well be other script requests - there is no reason for being limited to just regular documents. \par \par \chapter{{} Protected CERN Server Setup} Access can be restricted according to user name, internet address, or both. Access control can be tree-level, file level, or both.\par \par \section{In This Section...} \begin{itemize} \item Password File \item Group File \item Protect Directive in Configuration File \item Protection Setup File \item Protecting a Tree of Documents \item Protecting Individual Files \item Using Two-Level Protection \item Embedding the Protection Setup in the Configuration File Itself \item Access Control List File \end{itemize} \par \section{Password File} If user-wise access control is used there has to be a password file listing all the users and their encrypted passwords. Password file can be maintained by {\tt htadm} program which is a part ot CERN {\tt httpd} distribution. \par {} Unix password files are understood by CERN daemon (but not vice versa). However, {\bf Unix users are in no way connected to the WWW access authorization.} \par \par \section{Group File} Group file contains declarations of groups containing users and other groups, with possibly an IP address template. Group declarations as viewed from top-level look like this: \begin{verbatim} groupname: item, item, item \end{verbatim} The list of items is called a group definition. Each {\tt item} can be a username, an already-defined groupname, or a comma-separated list of user and group names in parentheses. Any of these can be followed by an at sign {\tt @} followed by either a single IP address template, or a comma-separated list of IP address templates in parentheses. The following are valid group declarations: \begin{verbatim} authors: john, james trusted: authors, jim cern_people: @128.141.*.* hackers: marca@141.142.*.*, sanders@153.39.*.*, (luotonen, timbl, hallam)@128.141.*.*, cailliau@(128.141.201.162, 128.141.248.119) cern_hackers: hackers@128.141.*.* \end{verbatim} If an item contains only IP address template part all users from those addresses are accepted (e.g. {\tt cern\_people} above). Note the last two declarations: {\tt cern\_hackers} group is made up of the {\tt hackers} group by restricting it further according to IP address.\par Group definition can be continued to next line after any comma in the definition. Forward references in group file are illegal (i.e. to use group name before it is defined).\par Group definition syntax is valid not only in group file, but also in \begin{itemize} \item {\tt GetMask} in protection setup file, and \item in last field in ACL entries. \end{itemize} \par \par \section{Server Configuration File} Typically you protect a tree of documents by {\tt protect} rule in rule file, and specify authorized persons and IP addresses in the protection setup file or access control list file: \begin{verbatim} Protect /very/secret/* /WWW/httpd.setup \end{verbatim} If there are Unix file system protections set up so that there is no world read-permission the daemon naturally has to run as the owner or the group member of those files.\par However, if there are protected trees owned by different people this doesn't work. In that case {\em the daemon has to run as {\tt root}, and the user and group ids have to be specified in the {\tt protect} rule,\/} e.g.: \begin{verbatim} Protect /kevin/secret/* /WWW/httpd.setup1 kevin.www Protect /marcus/secret/* /WWW/httpd.setup2 marcus.nogroup \end{verbatim} \par \section{Protection Setup File} Each {\tt protect} rule has an associated protection setup file. It specifies valid authentication schemes, password and group files, and password server-id: \begin{verbatim} AuthType Basic ServerId OurCollaboration PasswordFile /WWW/Admin/passwd GroupFile /WWW/Admin/group \end{verbatim} Password server id needs not be a real machine name. It's only purpose is to inform the browser about which password file it is using (different protection setups on the same machine can use different password file and that would otherwise confuse pseudo-intelligent clients trying to figure out which password to send).\par {} Same server-ids on different machines are considered different by clients (otherwise this would be a security hole).\par \par \subsection{Protecting Entire Tree As One Entity} If you want to control access only to entire trees of documents and don't care to restrict access differently to individual files, it suffices to give a {\tt GetMask} in setup file (and you don't need any ACL files): \begin{verbatim} GetMask group, user, group@address, ... \end{verbatim} Group definition has the same syntax as in group file.\par \par \subsection{Protecting Individual Files Differently} When each individual file needs to be protected separately you should use an ACL (access control list) file in the same directory as the protected files. After that no file in that directory can be accessed unless there is a specific entry in ACL allowing it.\par In this case you don't need the {\tt GetMask} in setup file.\par \par \subsection{Restricting Access Even Further} There may be both {\tt GetMask} {\em and\/} an ACL, in which case both conditions must be met. This is typically used so that {\tt GetMask} defines a general group of people allowed to access the tree, and ACLs restrict access even further.\par \par \section{Protection Setup Embedded in the Configuration File} Often it is not necessary to have the protection information in a different file; as a new feature {\tt cern\_httpd} allows protection setup to be "embedded" inside the configuration file itself. \par Instead of writing the setup in a different file and referring to it by the filename, you can use the {\tt Protection} directive to define the protection setup and bind it to a name, and later refer to this setup via that name. \par The previous example could be written into the main configuration as follows: \begin{verbatim} Protection PROT-NAME { UserId marcus GroupId nogroup AuthType Basic ServerId OurCollaboration PasswordFile /WWW/Admin/passwd GroupFile /WWW/Admin/group GetMask group, user, group@address, ... } Protect /private/URL/* PROT-NAME Protect /another/private/* PROT-NAME \end{verbatim} {} Note that since the protection setup is in the same file as the other configuration directives, it is also possible to specify the {\tt UserId} and {\tt GroupId} for the server to run as, without it being a security hole. With external protection setup this is made impossible because of security reasons; that is why there is an extra field after the protection setup filename specifying the user and group ids in that case: \begin{verbatim} Protect /kevin/secret/* /WWW/httpd.setup1 kevin.www Protect /marcus/secret/* /WWW/httpd.setup2 marcus.nogroup \end{verbatim} If you need a given protection setup only once there is no need to first bind it to a name and then refer to it by that name, but rather just combine the two: \begin{verbatim} Protect /private/URL/* { UserId marcus GroupId nogroup AuthType Basic ServerId OurCollaboration PasswordFile /WWW/Admin/passwd GroupFile /WWW/Admin/group GetMask group, user, group@address, ... } \end{verbatim} {} {\tt httpd} is not very robust in parsing this particular directive; make sure you have a space between the URL template and the curly brace, and that the ending curly brace is alone on that line. Also, comments are {\bf not} allowed inside the protection setup definition. \par \par \section{Access Control List File} ACL file is a file named {\tt .www\_acl} in the same directory as the files the access of which it is controlling. It looks typically something like this: \begin{verbatim} secret*.html : GET,POST : trusted_people minutes*.html: GET,POST : secretaries *.html : GET : willy,kenny \end{verbatim} It is worth noticing that all the templates are matched agaist (unlike in rule file where translation of rules stops in {\tt pass} and {\tt fail.}. So in the previous example all the HTML files are accessible to {\tt willy} and {\tt kenny,} even those matching the two previous templates.\par The last field is just a list of users and group (possibly at required IP addresses), and in fact this field is in same syntax as group file.\par When {\tt PUT} method will be implemented it can appear in the middle field separated by a comma from {\tt get}: \begin{verbatim} *.html : GET,PUT : authors \end{verbatim} \par \par \section{{} Manual Page For htadm} CERN {\tt httpd} password file can be maintained with {\tt htadm} program which is a part ot CERN {\tt httpd} distribution. \par \par \subsection{Command Line Options and Parameters} \begin{DL}{allow this much space} \item[ {\tt htadm -adduser } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password {\tt \lbrack }realname{\tt \rbrack \rbrack \rbrack }\/} ] adds a user into the password file (fails if there is already a user by that name).\par \item[ {\tt htadm -deluser } {\it passwordfile {\tt \lbrack }username{\tt \rbrack }\/} ] deletes a user from the password file (fails if there is no user by that name).\par \item[ {\tt htadm -passwd } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password{\tt \rbrack \rbrack }\/} ] changes user's password (fails if there is no such user).\par \item[ {\tt htadm -check } {\it passwordfile {\tt \lbrack }username {\tt \lbrack }password{\tt \rbrack \rbrack }\/} ] checks user's password (fails if there is no such user). Writes either {\tt Correct} or {\tt Incorrect} to standard output. Also indicates password correctness by a zero return value. \par \item[{\tt htadm -create } {\it passwordfile\/} ] creates an empty password file. \par \end{DL} If {\tt {\it password\/}} or even {\tt {\it username\/}} is missing in either of the previous cases they are prompted interactively. {\tt {\it passwordfile\/}} must be always specified. Missing real name is also prompted when adding a new user.\par \par {} Do NOT use {\tt htadm} to add new users to the actual Unix password file {\tt /etc/passwd,} entries written by {\tt htadm} are missing some necessary fields to Unix. \par {} Passwords should not be longer than 8 characters (this is a restriction from linemode clients using C library function {\tt getpass()} to read the password $--$ there is no other cause for this restriction; the maximum hardcoded password size is actually much larger, and if you only use GUI or other clients that are able to read this long passwords, feel free to use them). \par {} {\tt htadm} destroys the password from command line as soon as possible so that it is very unlikely to see somebody's password by looking at the process listing on the machine (with {\tt ps}, for example).\par \par \chapter{{} Proxies} Proxy is a HTTP server typically running on a firewall machine, providing with access to the outside world for people inside the firewall. {\tt cern\_httpd} can be configured to run as a proxy. Furthermore, it is able to perform caching of documents, resulting in faster response times. \par I (Ari Luotonen, CERN) and Kevin Altis from Intel have written a joint paper about proxies which will be presented in the WWW94 Conference. \par \par \section{In This Section...} \begin{itemize} \item Server setup \item Proxy protection \item Configuring proxy to use another proxy \item Caching \item Client setup \end{itemize} \par \section{Setting Up cern\_httpd To Run as a Proxy} {\tt cern\_httpd} runs as a proxy if its configuration file allows URLs starting with corresponding access method to be passed. Typical proxy configuration file reads: \begin{verbatim} pass http:* pass ftp:* pass gopher:* pass wais:* \end{verbatim} {\bf Note} that {\tt cern\_httpd} is capable of running as a regular HTTP server at the same time; just add your normal rules after those ones. \par {} The {\tt proxy\_xxx} environment variables that are used to redirect clients to use a proxy also affect the proxy server itself. If this is not your intention make sure that those variables are not set in {\tt httpd}'s environment. \par \par \section{Proxy Protection} {\tt cern\_httpd} 2.17 and newer provide a mechanism to protect the proxy against unauthorized use (in fact, the machinery behind this is the same that is used to set up document protection when running as a regular HTTP server). \par \subsection{Enabling and Disabling HTTP Methods} By default only {\tt HEAD}, {\tt GET} and {\tt POST} methods are allowed to go through the proxy. You can enable more methods using the {\tt Enable} directive in the configuration file: \begin{verbatim} Enable PUT Enable DELETE \end{verbatim} The {\tt Disable} directive disables methods: \begin{verbatim} Disable POST \end{verbatim} \subsection{Defining Allowed Hosts} A certain protection setup is defined to the proxy as a single entity that is given a name. Later, when protecting certain URLs this name is used to refer to the protection setup. (The name can also be the absolute pathname of the file that defines the protection, if one wishes to store protection information in a different file.) \par Protection is defined as follows: \begin{verbatim} Protection protname { Mask @(*.cern.ch, *.desy.de) } \end{verbatim} This defines a protection that allows all request methods from domains {\tt cern.ch} and {\tt desy.de}, and none from elsewhere. This protection can be referred to by {\it protname\/}. \par You can also use IP number templates: \begin{verbatim} Protection protname { Mask @(128.141.*.*, 131.169.*.*) } \end{verbatim} {\bf Note} that IP number templates always have four parts separated by dots. \par If allowed methods are different according to domain, e.g. {\tt GET} should be allowed from both of these domains, but {\tt POST} and {\tt PUT} only from {\tt cern.ch}, you can use {\tt GetMask}, {\tt PostMask}, {\tt PutMask} and {\tt DeleteMask} directives instead: \begin{verbatim} Protection protname { GetMask @(*.cern.ch, *.desy.de) PostMask @*.cern.ch PutMask @*.cern.ch } \end{verbatim} {\bf Note} that parentheses are necessary only if there is more than one domain name template. \par \subsection{Actual Protection} The {\tt Protect} rule actually associates protection with a URL. In case of proxy protection you would typically say: \begin{verbatim} Protect http:* protname Protect ftp:* protname Protect gopher:* protname Protect news:* protname Protect wais:* protname \end{verbatim} which would restrict all proxy use to the allowed hosts defined previously in the protection setup {\it protname\/}. {\bf Note} that {\it protname\/} must be defined before it is referenced! \par \par \section{Caching} {\tt cern\_httpd} running as a proxy can also perform caching of files retrieved from remote hosts. See the configuration diretives controlling this feature. \par \par \chapter{{} CERN Server FAQ} If you have problems, first make sure you're using the newest version. You'll find that out by peeking into ftp://info.cern.ch/pub/www/src. \par When something goes wrong you should run server in verbose mode (the {\tt -v } flag) to see exactly what is the problem. If you usually run it from inet daemon start it now standalone to some other port (with {\tt -p } {\it port\/} flag) with otherwise the same parameters as in {\tt /etc/inetd.conf.} \par \par \section{My Scripts Get Served As Text Files...} ...or are completely unaccessible. \par It's important to understand that rules in the configuration file ({\tt Map}, {\tt Pass}, {\tt Exec}, {\tt Fail}, {\tt Protect}, {\tt DefProt} and {\tt Redirect}) are translated from top to bottom, and the first matching {\tt Pass}, {\tt Exec} or {\tt Fail} will {\bf terminate} rule translation. \par So, make sure that your {\tt Exec} rule is before any general {\tt Map}pings. \par \par \section{How do I...} \begin{itemize} \item Set up access authorization? \item Write server-side scripts? \item Get the server to perform searches? \item Make clickable images? \item Handle forms? \item Set up a proxy \item Set up proxy caching \end{itemize} \par \section{Zombies} There used to be one zombie when running {\tt cern\_httpd} standalone; this was fixed in version 2.17beta. If you still see zombies (more than two that don't go away in a few minutes) it is a bug. \par \par \section{Inet daemon complains about looping...} ...and terminates WWW service. {\tt :-(} \par This is a hard-coded {\tt inetd} limitation on at least SunOS-4.1.* and NeXT, which limits maximum allowed connections from a given host to 40 per minute. This can be exceeded by scripts doing Web-roaming, or documents having masses of small inlined images. \par There is a fix for at least SunOS {\tt inetd} (100178-08), and in Solaris this is fixed. You can also run {\tt httpd} standalone (preferably with the {\tt -fork} command line option). \par {\bf Most importantly,} you should stop running {\tt httpd} from {\tt inetd} and rather run it standalone. This is because running from {\tt inetd} is inefficient. \par \par \section{Server looks at funny directories and finds nothing} From version 2.0 until 2.15, you need to have an explicit map to file system in your rule file, e.g.: \begin{verbatim} Map /* file:/* \end{verbatim} but 2.15 doesn't have this limitation anymore. \par \par \section{But the document says rule file is no longer needed} True, but it also says you must remember to give your Web directory as a parameter to {\tt httpd,} e.g. \begin{verbatim} httpd /home/me/MyGloriousWeb \end{verbatim} \par \chapter{ {} CERN httpd 2.15 Release Notes} There is one single thing that needs to be done when changing over from {\tt httpd} 2.14 to 2.15: \begin{verbatim} Rename your old /htbin scripts to end in .pp suffix! \end{verbatim} \section{General Notes} \begin{itemize} \item Code tested under Purify $--$ all detected memory leaks and bugs fixed. \item Forking code enhanced $--$ no longer crashes when running standalone. Everybody should start running CERN httpd standalone instead of from inetd \item Documentation redesigned, but still under construction \item Contains Solaris port, but not VMS \end{itemize} \section{CGI/1.0, Common Gateway Interface} \begin{itemize} \item CGI/1.0 interface fully implemented \item {\bf Old CERN httpd scripts will continue working if you rename them to end with .pp suffix.} Links referencing these scrips do NOT need to be changed. (This feature does not add any overhead to CGI/1.0 script calls.) \item New product cgiparse for CGI/1.0 scripts to parse QUERY\_STRING env.var and to read CONTENT\_LENGTH characters from stdin \item {\tt htimage} upgraded to CGI/1.0 \item The whole server-environment is propagated to CGI script, except for variables that are reserved for CGI/1.0. \item Scripts are spawned by doing a fork() and exec() instead of system() $--$ more efficient and secure \end{itemize} \section{Firewall Gateway Modifications} \begin{itemize} \item Access authorization works thru firewalls \item So does POST, therefore forms also \item -disable/-enable command line options and Disable/Enable configuration directives for dis/enabling HTTP methods. GET, HEAD and POST are enabled by default. \item Fix: text/html and text/plain not passed multiply to servers when running as gateway \item Fix: */*, image/* etc not expanded by the gateway \item Fix: try local search ONLY when accessing local files \end{itemize} \section{Other New Features} \begin{itemize} \item When started standalone in non-verbose mode automatically disconnects from terminal session and goes background \item User-supported directories enabling URLs starting with {\bf /\~username} \item Redirection \item Meta-information files to allow RFC-822-style headers to be appended to server response header section \item New, common logfile format, localtime default, {\tt GMT} as an option \item Ability to suppress logging for certain hosts/domains according to given hostname template or IP number mask, like {\tt *.cern.ch} or {\tt 128.141.*.*} \item -setuid option to set server uid to authenticated uid (local) \item Multilanguage support: same URL can be used to retrieve a document in different languages \item AddLanguage, AddEncoding and AddType directives to configuration file (AddType replaces Suffix) \item Better multiformat algorithm \item HostName directive to configuration file for servers that want to give CGI/1.0 scripts a different hostname than the actual. Useful if machine has many aliases, or if httpd fails to get the full domainname. \item Exec rule obsoliting HTBin directive $--$ now multiple script directories possible, with arbitrary mappings \item Get-Mask, Post-Mask and Put-Mask for protection setup files. Get-Mask obsolites Mask-Group \item Groups All/Users and Anybody/Anyone/Anonymous automatically defined. All means anybody that has been authenticated, and Anybody is just anybody \item Server: \item Last-Modified: \item Content-Length: \item Content-Language: \item Content-Encoding: \item Scripts can output also Uri: and Expires: headers (this will eventually be made more general) \item HEAD works, also with stupid scripts that also output the body \end{itemize} \section{Enhancements, Fixes} \begin{itemize} \item The final explicit Map to filesystem in configuration file no longer required, because it was causing confusion \item Assume Basic authentication scheme even if not explicitly mentioned in setup file \item Get client DNS hostname, for the logfile among other things \item Fail made the default when rules are translated to the end without coming accross with a Pass, Exec or Fail rule (this is to enhance security, it was too easy to forget the Fail * from the end of config file) \item Made config (rule) file understand different ways of writing keywords, e.g.: UserDir, userdir, User-Dir, user\_dir, UserDirectory and so on \item The eight misplaced server-side access authorization files moved away from libwww \item Fix: directory indexing works with a trailing slash \item Fix: HTSimplify() might have behaved unexpectably on some systems (called strcpy() with overlapping args) \end{itemize} \par \chapter{ {} CERN httpd 2.16beta Release Notes} \begin{itemize} \item If you are upgrading from 2.15beta, you need to make {\bf no changes}. \item If you are upgrading from 2.14, there is one single thing that needs to be done: \begin{verbatim} Rename your old /htbin scripts to end in .pp suffix! \end{verbatim} \end{itemize} \section{Firewall Gateway (Proxy) Additions, Fixes} \begin{itemize} \item {\tt ftp} with binary files work \item {\tt x-compress} and {\tt x-gzip} work correctly over proxy \item Firewalling now works through arbitrary number of proxies; {\tt http\_proxy, ftp\_proxy, gopher\_proxy} and {\tt wais\_proxy} configuration directives cause proxy to connect to the outside world through another proxy. Environment variables with the same names have same effects, but config file is user-friendlier for this. \item Now sends all the headers sent by client. \item Proxy log file now gives byte count. \item Proxy log file now gives correct status code also on error. \end{itemize} \section{Firewall Gateway (Proxy) Caching} \begin{itemize} \item {\tt CacheRoot} directive specifies cache root directory, and turns on proxy caching. Cache root directory must be dedicated to {\tt httpd} - all files in there are subject to garbage collection. \item Cache size (in megabytes) is specified by {\tt CacheSize} directive; cache size should be several megabytes, 50-100MB should give good results. Cache may, however, temporarily grow a few megabytes bigger than specified. Also, space taken up by directories is not calculated in the current version. \item {\tt http, ftp, gopher} with {\tt GET } method get cached. \item However, not caching: \begin{itemize} \item HTTP0 responses (you never know if it failed; also confused HTTP1 servers sometimes output garbage in front of HTTP1 headers). \item Protected documents (request had {\tt Authorization:} field). \item Queries - they have too often side-effects. (POST should be {\bf always} used with forms, and all script responses should have {\tt Expires:} header when necessary. Until then, we don't cache them.) \end{itemize} \item Expiry date is extracted: \begin{itemize} \item From {\tt Expires:} header. \item If not present {\tt Last-Modified:} is used to approximate expires. If a file hasn't changed in five months the chances are it won't change during the next week. On the other hand, if a file has changed yesterday, it will probably change again pretty soon. I know this is heuristic but until all the servers give {\tt Expires:} this works much better than not using it, so no flames about it. \item If {\tt Last-Modified:} not given use the time given by {\tt CacheDefaultExpiry} directive, default 7 days. \end{itemize} \item Format of cache files and directory structure under cache root is subject to change if necessary. No application should yet rely on any certain cache format. Eventually I can see clients accessing cache files directly, bypassing proxy server. \item Caching system understands both time formats, also the one output by old NCSA httpds. \item Cache files get locked during transfer. Lock files time out if something goes wrong. Timeout can be set by {\tt CacheLockTimeOut} directive (default 20 minutes). During the lock is in effect, further requests to the same file get retrieved from the remote host. \item Garbage collection directives: \begin{itemize} \item \item {\tt GcMemoryUsage} to advice gc about how radical to be in memory use (more memory =$>$ smarter gc). \item {\tt GcTimeInterval}, how often to do gc. \item {\tt GcReqInterval}, after how many requests to do gc. \item (gc is also automatically started if cache size limit is reached.) \item {\tt CacheLimit\_1}, size in KB until which files are equally valuable despite their size (200K). \item {\tt CacheLimit\_2}, size in KB after which files get discarded because they are too big (4MB). \item {\tt CacheClean}, remove all files older than this (default 21 days). \item {\tt CacheUnused}, remove all files that have not been used in this long time (default 14 days). \end{itemize} \item Garbage collector always removes all expired, too long unused, and too old files. \item If cache size limit is reached some files need to be sacrified; the current algorithm takes into account: \begin{itemize} \item Time remaining to unconditional removal; if it expires tomorrow it might as well be removed today. \item Time last accessed; if it hasn't been accessed in 5 days, it probably won't be accessed anymore before it expires. \item Size; huge files get removed move easily. \item Time it took to load it from the remote host; files that were time-consuming to transfer have much higher value. This compensates the size factor. Load delay is the single most significant value. \item Time it has already been in cache; ancient files get removed more easily than fresh ones. \end{itemize} \end{itemize} \section{Other New Features} \begin{itemize} \item Error log file. \item {\tt Referer:} field ends up in error log when a request fails. \item {\tt UserId} and {\tt GroupId} to set default uid and gid (used instead of nobody and nogroup). \item Timeout for input and output; default time to wait for a request is 2 minutes, and to send response 20 minutes. Timeout causes a note to error log, and terminates child (no more hanging httpds). {\bf Note:} the one zombie is normal; don't report to me about it, I may do something about it some day, or maybe I won't. Zombie doesn't take up any other system resources except the one process table entry. \item Suffixes are no longer case-sensitive by default; this may be changed via the {\tt SuffixCaseSense} configuration directive. \item Lou Montulli's news and proxy diffs added to the library. \item Most command line options now also available as configuration directives: \begin{itemize} \item {\tt DirAccess} \item {\tt DirReadme} \item {\tt AccessLog} \item {\tt ErrorLog} \item {\tt LogFormat} \item {\tt LogTime} \end{itemize} \item {\tt -vv} command line option for Very Verbose trace output. Outputs also request headers as they came in. Otherwise like {\tt -v} flag. \end{itemize} \section{Enhancements, Fixes} \begin{itemize} \item NPH-scripts now work from automatically backgrounded standalone server. \item Fixed the many problems with {\tt Content-Transfer-Encoding}: \begin{itemize} \item Mosaic uses {\tt Content-Encoding}, although spec says {\tt Content-Transfer-Encoding}; I now output both \item {\tt Content-Transfer-Encoding} sometimes didn't show up although it should have, fixed. \item {\tt Content-Transfer-Encoding} didn't come up correctly with ftp, fixed. \end{itemize} \item Strange escaping fixed with directory indexing (legal characters got escaped randomly by a gcc-compiled version). \item Timezone bug around midnight with the new logfile format fixed. (New logfile format is not yet default, use {\tt -newlog} command line option, or {\tt LogFormat} directive in configuration file.) \item Dashes for non-existent status codes and byte counts now show up correctly in the log. \item Forking code once again enhanced - fixed a possible hanging situation. \item Log time fixed to be the time of incoming request, not the time of request served. \item Zombies now correctly waited away on HP (this was in fact fixed already in 2.15beta binaries distributed after February 17th - {\bf note,} that this bug had no effect on any other platforms ). \item Directory listings no longer have {\tt Content-Length:} (because it was wrong). \item Now understands also the old Accept: syntax, with spaces as separators between actual content-type and its parameters. This will eventually be taken out. \item {\tt htadm} now uses the same file creation mask as in the original password file. \end{itemize} \par \chapter{ {} CERN httpd 2.17beta Release Notes} \section{General New Features} \begin{itemize} \item {\tt PUT} and {\tt POST} can be configured to be handled by external CGI scripts; {\tt PUT-Script} and {\tt POST-Script} directives \item BodyTimeOut for timing out scripts waiting for input that never comes from clients \item {\tt IdentityCheck} directive to turn on RFC931 remote login name checking \item {\tt REMOTE\_IDENT} for CGI giving remote login name; this was the only feature missing to be fully CGI/1.0 compiant \item CGI/1.1 upgrade: \begin{itemize} \item all the headers without a special meaning to CGI from CGI scripts get passed to the client \item Status: header to specify the HTTP status code and message for client when not using NPH scripts \item all HTTP request header lines which are not otherwise available to the scripts get passed as HTTP\_XXX\_YYY environment variables \end{itemize} \item Understands conditional {\tt GET} request with {\tt If-Modified-Since} header \item {\tt kill -HUP } causes {\tt httpd} to re-read its configuration file \item {\tt PidFile} directive for specifying the file to write the process id \lbrack makes it easy to send the {\tt HUP} signal \item {\tt ServerRoot} directive to specify a "home directory" for {\tt httpd} \item Directory listings with icons; by default icons are in {\tt icons} subdirectory under {\tt ServerRoot} \item The precompiled binaries are distributed in a {\tt tar} packet that contains a set of default icons; the easiest way to configure the icons is to just set the {\tt ServerRoot} to point to the binary distribution directory \lbrack its name is {\tt cern\_httpd}\rbrack \item Welcome directive to specify the name of the overview page of the directory; default values are {\tt Welcome.html}, {\tt welcome.html} and, for compatibility with NCSA server, {\tt index.html}. Use of {\tt Welcome} directive will override all the defaults. \item {\tt AlwaysWelcome} directive to configure if {\tt /directory} and {\tt /directory/} are to be taken to mean the same thing, or should only {\tt /directory/} be mapped to the overview page and {\tt /directory} produce the directory listing. \item /\~user causes an automatic redirection to /\~user/ \item Now gives also the {\tt Date:} header. \item {\tt Port} directive to config file specifying the port number to listen to. \end{itemize} \section{Access Authorization Enhancements / Proxy Protections} \begin{itemize} \item Now also domain name templates, like *.cern.ch, can be used in specifying allowed hosts, not only IP number masks \item {\tt ACLOverRide} directive to allow ACLs to override the {\tt Mask}s set in the protection setup \lbrack without this feature ACLs cannot allow anything more than what the {\tt Mask}s allow, only restrict access further\rbrack . This directive disables {\tt Mask} checking if an ACL file is present. \item Since setting up protection seemed to be unnecessarily hard, it is now possible to give the protection setup in the main configuration file instead of having to use a different file; it is still ok to use a different file. \begin{itemize} \item {\tt Protection} directive defines a protection setup and associates a name with it: \begin{verbatim} Protection prot-name { AuthType Basic ServerId Test-Server PasswdFile /where/ever/passwd GroupFile /where/ever/group UserId someuser GroupId somegroup GET-Mask list, of, users, and, groups POST-Mask list, of, users, and, groups PUT-Mask list, of, users, and, groups } \end{verbatim} The content between the curly braces is the same as used to go the the protection setup file. What's new is the possibility to specify the {\tt UserId} and {\tt GroupId} for the clild process when serving the request in protected mode. This is not possible with external files for security reasons \lbrack it is not possible inside the external file, but it is not possible if the ids are set when calling that file; see doc for more details\rbrack . \item A single {\tt Mask} directive for cases when {\tt GET-Mask}, {\tt POST-Mask} and {\tt PUT-Mask} are the same. \item In {\tt Protect} rule the {\it prot-name\/} is specified instead of the file name; what's more is that {\tt Protect} can now be used to protect also proxied URLs: \begin{verbatim} Protect http:* prot-name Protect ftp:* prot-name Protect gopher:* prot-name \end{verbatim} \end{itemize} \end{itemize} \section{Enhancements, Fixes} \begin{itemize} \item Incorporated Ian Dunkin's $<$imd1707@ggr.co.uk$>$ SOCKS modifications (thank you, Ian!); read the {\tt README-SOCKS} file in the source code distribution for more information. \item {\tt SIGPIPE} causes a normal child to exit; proxy child will correctly stop writing to client socket but still writes to cache file \lbrack previously just kept on writing to the socket, too\rbrack \item 401, 402, 403, 404 errors don't go to error log anymore \item error log contains now the host name and request \item no longer sends {\tt Content-Transfer-Encoding}, we agreed upon using {\tt Content-Encoding} for compression \item fixed funny panic message from format module in verbose mode even though everything was ok \lbrack only aesthetic\rbrack \item now gives again "not authorized" rather than not found if trying to access a protected but nonexistant file; this way even filenames don't leak \item all time specifications in configuration file have more readable forms: \begin{verbatim} 1 year 2 months 3 weeks 2 days 5 days 20 hours 30 mins 2 secs 20:30 20:30:01 2 weeks 20:30 \end{verbatim} \item Case-sense bug with {\tt LogTime}, {\tt LogFormat}, {\tt DirAccess} and {\tt DirReadme} fixed; now paramters really are handled in a case-insensitive manner. \end{itemize} \section{Proxy Additions, Fixes} \begin{itemize} \item Proxy protections, see above \item Made proxy do smart guesses about the content of an unknown file while retrieving from the remote; this will end the problems of some files not being transferred to WinMosaic or Lynx. {\bf IMPORTANT: Everybody, remove the rule \lbrack if you have it\rbrack }: \begin{verbatim} AddType *.* text/plain \end{verbatim} because it would disable this smart feature. \item Fixed a bug with unknown binary gopher files being truncated \item Fixed the bug with trailing slashes in ftp directory listings \item Fixed the bug with requests not being URL-encoded when forwarding the request \item Fixed a bug with filenames in directory listings not being URL-encoded \item Fixed stupid "mail-us" situation in certain situations when ftp load fails \end{itemize} \section{Proxy Caching} \begin{itemize} \item Cache is refreshed using the conditional {\tt GET} method \lbrack use of {\tt If-Modified-Since} header\rbrack \item Standalone cache mode with {\tt CacheNoConnect} directive \lbrack causes an error rather than document fetch when the document is not in the cache\rbrack \item Possibility to disable garbage collection altogether \item Possibility to disable expiry checking \item Caching Off to explicitly turn off caching even if there are other caching directives specified \item {\tt -gc\_only} command line option to do garbage collection as a {\tt cron} job for sites that run {\tt httpd} as a proxy from {\tt inetd}. However, since {\tt httpd} now re-reads its configuration files when it receives a {\tt HUP} signal, it makes standalone operation now even more easy, and {\tt inetd} should no longer be much more convenient. \item Host names are converted to all-lower-case to avoid doing multiple caching for a single site. \item Files expiring immediately never get written to the cache; not even part of it. \item By default HTTP-retrieved documents without an {\tt Expires:} and {\tt Last-Modified:} field never get cached \lbrack because they are usually generated by scripts and should never be cached\rbrack ; therefore I strongly advice against the use of {\tt CacheDefaultExpiry} for HTTP. \item Caching control directives have changed to take a URL template as a first argument, and a more readable time format: \begin{verbatim} CacheDefaultExpiry ftp:* 2 weeks 4 days CacheDefaultExpiry gopher:* 6 days CacheUnused http:* 1 month CacheUnused ftp:* 2 weeks CacheUnused gopher:* 1 week 5 days 2 hours 1 min 30 secs \end{verbatim} \item Made the expiry date approximation configurable; by default documents with {\tt Last-Modified:} but without {\tt Expires:} expire after 10\% of the time that they have been unmodified. {\tt CacheLastModifiedfactor} can be used to change this value, or turn this feature {\tt Off}. Default value is 0.1 \lbrack =10\%\rbrack . \item Understands yet another date format: \begin{verbatim} Thu, 10 Feb 1994 22:23:32 GMT \end{verbatim} This date format is {\bf not} conforming to the spec, so use of it is discouraged! This is only to make the proxy more robust. \item {\tt NoCaching} directive to prevent certain URLs from being cached at all. \item Time margin to get rid of problems with machine clocks having inaccurate times and confusing caching. \item {\tt GcDailyGc} to specify a daily garbage collection time, by default 3:00. \lbrack Can be turned {\tt Off}, too.\rbrack \item Now possible to disable {\tt GcReqInterval} and {\tt GcTimeInterval} \lbrack by default disabled\rbrack . \item Expired cache lock files get removed also during gc. \item {\tt CacheAccessLog} to specify a different log file for cache accesses; also possible to make a separate log for each remote host. \end{itemize} \section{cgiutils} A new product {\tt cgiutils} for producing HTTP1 replies from CGI scripts, and for easily generating the {\tt Expires:} header given the time to live, e.g. "2 weeks 4 hours 30 mins". \par \par \chapter{ {} CERN httpd 2.18beta Release Notes} \section{New Features} \begin{itemize} \item Long FTP directory listing with last modification dates and sizes \end{itemize} \section{Fixes} \begin{itemize} \item Fixed a bad bug with {\tt Port} directive $--$ server didn't fork but rather the parent process served which caused the service to eventually hang (this is the main reason for this release). \item {\tt CLIENT\_CONTROL} removed from SOCKS mods since {\tt httpd} has now native proxy protection support. \item No longer fails to sometimes create {\tt .gc\_info} file. \end{itemize} \par \chapter{{} CERN httpd 3.0 PreRelease Notes} \section{3.0 Prerelease 3} \begin{itemize} \item No longer strips hyphens from content-types and content-encodings that are given in the configuration file (broken in pre1). \item GMT-to-localtime transformation works now on all platforms in caching (was broken on others than Sun). \item Binary-FTP works again (broken pre2). \item Unescaping bug fixed in news module (caused many articles to fail to be retrieved). \item News module now gives appropriate error reponses for unavailable articles and non-existent news groups. \item FTP and HTTP modules now give better error responses. \item Fixed the cache access log to show the correct content-lengths. \end{itemize} \section{3.0 Prerelease 2} \begin{itemize} \item Respects UserId and GroupId directives again. \item FTP module no longer prints messages to stderr in non-verbose mode. \item \~username form understood with ServerRoot, Search, PutScript, PostScript, DeleteScript, AccessLog, ErrorLog, CacheAccessLog directives. \item Opens cache access log only if caching is turned on. \item Binary distribution now contains a template configuration file that has all the configuration directives understood by httpd (thanks to Sean Gonzalez for it!). \end{itemize} \section{3.0 Prerelease 1} \begin{itemize} \item If-Modified-Since GET request now works correctly with proxy (client can do conditional GET/proxy can do conditional GET plus all the combinations of these). \item {\tt Pragma: no-cache } supported; by sending this header to the proxy the client will force it to refresh its cache from remote server. Pragma headers are also forwarded to the remote server. \item Server now resets its state correctly when it receives the HUP signal (directory listing icons used to stop working). \item {\tt -restart} option - {\tt httpd} will find out the actual server process number and send s HUP signal to it to make it reload its configuration files; note that {\tt httpd} must still have the same configuration file command line parameters ({\tt -r } options) as the actual server (so it finds out the ServerRoot and PidFile). \item Now makes appropriate entry to error log when restarting. \item Made common logfile format default, the old format can still be used with the {\tt LogFormat} directive: \begin{verbatim} LogFormat old \end{verbatim} \item Multiple wild-card (asterisk) matching in configuration file works; it is a bit different from typical regular expression matching in that the wildcard matches the {\em shortest\/} possible amount of characters instead of the longest matching string; this is the best choise in most of the cases. Consider: \begin{verbatim} Pass http://*/* /mirror/*/http/* \end{verbatim} Clearly the first asterisk should rather match only the hostname, and {\bf not} the entire path except the filename. \item Rules can now have asterisks and whitespace in them: precede them with a backslah; as a result also the backslash itself has to be escaped with another backslash. \item The tilde character after a slash has to be explicitly matched: \begin{verbatim} Map /* /foo/bar/* \end{verbatim} does {\em not\/} match user-supported directories, but: \begin{verbatim} Map /~* /Webs/users/* \end{verbatim} does match them. \item Fixed the problem that user-supported directories could not be mapped or {\tt Protect}'ed. \item Hostname matching made case-insensitive in access control/caching \item Added suffixes {\tt .htm} and {\tt .htmls} to the default set of known suffixes. \item Fixed some of the mysterious caching problems (all that were reported to me and that I could reproduce). \item Made it possible to specify the various byte/kilo/mega sizes in cache configuration with letters after the number (so it's no longer necessary to remember if the default is kilobytes or megabytes): \begin{verbatim} CacheSize 150 M CacheLimit_1 100 K CacheLimit_2 2 M \end{verbatim} The numbers still have to be cardinals. \item Content-Length given for {\em all\/} documents, including (non-nph-)script responses, generated directory listings, error responses, all the documents retrieved over another protocol by the proxy (FTP, Gopher, ...), including HTTP responses from servers that didn't give it originally. \item {\tt MaxContentLengthBuffer} directive to specify the maximum bytecount for the proxy to buffer in order to find out the content-length for the client - content-length is {\em always\/} calculated for the logs, but the user migth interrupt the connection if nothing seems to be happening, even though it is the proxy that is just buffering the entire file in order to find out the content-length before actually sending it to the client. \item Caching module now checks that it receives the correct content-length; if not it discards the cached document. This rules out the possibility to cache a truncated document from a timed out connection in 99.99\% of the cases (0.01\% comes from the fact that Plexus sends a timeout error message concatenated to the document and if so should happen that this produces exactly the correct content-length then there is nothing that can be done about it; in practice this never happens). \item Made {\tt HEAD} work always, even on proxy with other protocols (FTP, Gopher...). \item PASV (Passive mode) in FTP now supported. It is no longer necessary to allow incoming connections above 1024 on the firewall host just to make FTP work. If PASV fails {\tt httpd} will retry PORT. \item Welcome messages from FTP servers get shown on top of the directory listings. \item Fixed bug with old FTP files fixed getting wrong date in the listing. \item Gopher listings now have icons. \item Proxy now reports unknown host errors appropriately. \item Fixed encoding-decoding problems with directory listings. \item Added {\tt ScriptTimeOut} - scripts that do not finish in this amount of time will be killed by {\tt httpd}. Default value is 5 minutes. \item A /\~username URL with an invalid username no longer causes an infinite redirection loop. \item The two files missing in FTP listings are no longer missing (they weren't in 2.18beta, either). \item Fixed a possible error condition that might cause the server to stop responding, or even die. \item Server now resets its UserId and GroupId even when in gc-only mode (this solves problems with {\tt .cache\_info} files sometimes being unwritable to actual caching processes). \item CacheAccessLog is now opened during startup while running as root to avoid opening problems. There is no longer logging to individual files according to remote hosts - all cache accesses are logged to this single file. \item {\tt CacheOnly} directive for specifying a set of URLs that should be cached (for cases when there are only a few sites that should be cached). \item Added {\tt DELETE-Script} directive for specifying the CGI script to handle {\tt DELETE} method. \item {\tt NoProxy} directive to allow the proxy to do direct access to some servers instead of connecting to another proxy server (contains a list of domain names). This works exactly like the {\tt no\_proxy} environment variable on clients. (Thanks to Rainer Klute for the patch!) This is only necessary when running multiple proxy servers that connect to each other. \item Fixed a bug that sometimes caused time directives to be parsed incorrectly (e.g. {\tt CacheDefaultExpiry}). \item Multilanguage addition to allow server to understand e.g. that British English is also English, and that the US citizens do understand it (thanks to Toshihiro Takada for the patch!). \item Removed: \begin{itemize} \item {\tt GcReqInterval} and {\tt GcTimeInterval} - not very good criteria to start doing garbage collection ({\tt GcDailyGc} is better, giving the actual time to lauch gc) \item cache access logging to individual logfiles according to remote host (wasted resources - a separate program is better for collecting this information from a single log file). \item {\tt -a} and {\tt -R} options (never used). \item {\tt BodyTimeOut} replaced by {\tt ScriptTimeOut} \item {\tt include}s from Makefiles (not supported by all the {\tt make}s). \item {\tt \#elif} preprocessor directive removed (wasn't supported by all the HP preprocessors) \end{itemize} \end{itemize} \par \end{document}