LPI Linux Certification/Implementing A Web Server

= 208.1 Implementing a Web server =

Detailed Objectives (208.1)
(LPIC-1 Version 4.5)

Weight: 4

Description: Candidates should be able to install and configure a web server. This objective includes monitoring the servers load and performance, restricting client user access, configuring support for scripting languages as modules and setting up client user authentication. Also included is configuring server options to restrict usage of resources. Candidates should be able to configure a web server to use virtual hosts and customize file access.

Key Knowledge Areas:
 * Apache 2.4 configuration files, terms and utilities.
 * Apache log files configuration and content.
 * Access restriction methods and files.
 * and PHP configuration.
 * Client user authentication files and utilities.
 * Configuration of maximum requests, minimum and maximum servers and clients.
 * Apache 2.4 virtual host implementation (with and without dedicated IP addresses).
 * Using redirect statements in Apache’s configuration files to customize file access.

Terms and Utilities:
 * access logs and error logs

Overview
Apache is the most used web server on the Internet, and the "poster child" for successful open source development. While a web server itself don't need to be particularly fancy (many programming languages have tutorials how to write a HTTP server) Apaches "secret of success" is its flexibility and robustness. Apache can be easily extended by various modules,  and   will be featured in this section.

Installation and Configuration
The Apache HTTP server in its most recent version (2.2 as of writing this) can be downloaded in source code from the Apache HTTP Server Website, or pre compiled as binary package from the repository of your favorite Linux distribution.

For the rest of this section we will refer to the Apache documentation for file names. This documentation is usually installed with the Apache binary inside the. If we cannot reach the local documentation, there still is the official documentation from the Apache Website. We will use a virtual network with Slackware 13.0 inside VirtualBox, which is free (as in cost) and available Free (as in Freedom) with small restrictions. Distribution specific summaries for Debian Lenny and a clone of Redhat Enterprise, Centos 5.4 will follow below.

If we want to compile Apache from source, we use the usual,  ,   steps. For further details please refer to the documentation page.

The web server binary  itself is usually located in. We can use the binary directly to start and stop the web server through command line options, but a better idea is to use the control script  to interface with the. can control the web server process (start and stop) in a convenient way and sets up the environment and checks the configuration file in the background. Back in the days of the transition from Apache 1.3 to the Apache 2.x series the control script was called  to tell it apart from the Apache 1.3 script (then).

It is unfortunate that the LPI still refers to  while the Apache source code produces.

[root@lpislack ~]# apachectl Usage: /usr/sbin/httpd [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-k start|restart|graceful|graceful-stop|stop] [-v] [-V] [-h] [-l] [-L] [-t] [-S] Options: -D name           : define a name for use in  directives -d directory      : specify an alternate initial ServerRoot -f file           : specify an alternate ServerConfigFile -C "directive"    : process directive before reading config files -c "directive"    : process directive after reading config files -e level          : show startup errors of level (see LogLevel) -E file           : log startup errors to file -v                : show version number -V                : show compile settings -h                : list available command line options (this page) -l                : list compiled in modules -L                : list available configuration directives -t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings) -S                : a synonym for -t -D DUMP_VHOSTS -t -D DUMP_MODULES : show all loaded modules -M                : a synonym for -t -D DUMP_MODULES -t                : run syntax check for config files

Hmmm. This does not look right, because if  encounters parameters it does not understand, it passes them directly to. And no parameter is such a parameter, so  invokes   without any parameter.

can,   and   the web server, but even more useful is   and   which restarts/stops the web server while not stopping currently open connections. does the same as  in testing the apache configuration file. The options  and   need the   module to display many useful status informations about our http server.

The logs for your Apache instance go to. The two most important log files are, which logs every access to the web server and   which only records errors. Tools like Awstats and Webalizer use the  to generate their reports.

A snippet of  shows (taken from the Debian Lenny machine  ) the IP   accessing   on the website, which is the “welcome” page of the web server (more on this later), then trying to   and then   which both result in a “ ”, which means “ ”. 192.168.10.21 - - [02/Jun/2009:17:06:01 -0400] "GET / HTTP/1.1" 200 56 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.10) Gecko/2009042315 Firefox/3.0.10" 192.168.10.21 - - [02/Jun/2009:17:17:12 -0400] "GET /favicon.ico HTTP/1.1" 404 300 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.10) Gecko/2009042315 Firefox/3.0.10" 192.168.10.21 - - [05/Jun/2009:16:41:39 -0400] "GET / HTTP/1.1" 200 56 "-" "Mozilla/5.0 (compatible; Konqueror/3.5; Linux 2.6.27.7-smp) KHTML/3.5.10 (like Gecko)" 192.168.10.21 - - [05/Jun/2009:16:41:39 -0400] "GET /favicon.ico HTTP/1.1" 404 300 "-" "Mozilla/5.0 (compatible; Konqueror/3.5; Linux 2.6.27.7-smp) KHTML/3.5.10 (like Gecko)" 192.168.10.21 - - [05/Jun/2009:16:41:50 -0400] "GET /login.html HTTP/1.1" 404 299 "-" "Mozilla/5.0 (compatible; Konqueror/3.5; Linux 2.6.27.7-smp) KHTML/3.5.10 (like Gecko)" This snippet from  shows the same errors but in greater detail: [Fri Jun 05 13:41:10 2009] [notice] mod_python: using mutex_directory /tmp [Fri Jun 05 13:41:11 2009] [notice] Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_perl/2.0.4 Perl/v5.10.0 configured -- resuming normal operations [Fri Jun 05 16:41:39 2009] [error] [client 192.168.10.21] File does not exist: /var/www/favicon.ico [Fri Jun 05 16:41:50 2009] [error] [client 192.168.10.21] File does not exist: /var/www/login.html

The configuration of Apache takes place in. This lengthy, but well documented, configuration file is in part structured similar to a HTML page. To strip out any comments you can easily use. root@lpislack:~# grep -v ^# /etc/httpd/httpd.conf | grep -v ^$ | grep -v "^   #" ServerRoot "/usr" Listen 80 LoadModule auth_basic_module lib/httpd/modules/mod_auth_basic.so LoadModule auth_digest_module lib/httpd/modules/mod_auth_digest.so ... LoadModule log_config_module lib/httpd/modules/mod_log_config.so LoadModule userdir_module lib/httpd/modules/mod_userdir.so LoadModule alias_module lib/httpd/modules/mod_alias.so LoadModule rewrite_module lib/httpd/modules/mod_rewrite.so User apache Group apache ServerAdmin webadmin@your.site DocumentRoot "/srv/httpd/htdocs"  Options FollowSymLinks AllowOverride None Order deny,allow Deny from all   Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all  DirectoryIndex index.html ErrorLog "/var/log/httpd/error_log" LogLevel warn LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined CustomLog "/var/log/httpd/access_log" common ScriptAlias /cgi-bin/ "/srv/httpd/cgi-bin/"  AllowOverride None Options None Order allow,deny Allow from all  DefaultType text/plain TypesConfig /etc/httpd/mime.types root@lpislack:~# This slightly striped down  is taken from a Slackware 13.0 system ( . There are two terms to know when talking about  : "directives" and "containers". "Directives" are the configuration options (and their values) themselves, while "containers" are directories or collections of files. Any directive inside a container will only be valid inside this container, directives outside the container are of global effect for the whole site. On the other side, there are directives that are only valid inside a container.
 * This is a tricky on. All relative paths start from here, the absolute ones are, as implied by the name, absolute.
 * This is a tricky on. All relative paths start from here, the absolute ones are, as implied by the name, absolute.


 * : The TCP port the  listens for incoming connection requests. If our machine has more than one network address, we can bind the   to one (ore more) IP adresses/port combinations here as well.
 * Loads the module  located in   relative to the , so the whole path to this module is
 * Loads the module  located in   relative to the , so the whole path to this module is


 * : The user account  runs as. This better be an restricted account. One (the first)   process has to run as root, if it wants to claim port 80.
 * : The group of the user  runs as.
 * : the e-mail address of the administrator responsible for running the . This shows up when errors occur.
 * : This is the directory where the actual HTML documents live on your hard drive!
 * This is a container object. All directives inside are only valid for this directory " " and all of its subdirectories.
 * This is a container object. All directives inside are only valid for this directory " " and all of its subdirectories.


 * Potential security risk! Does what its name suggest.
 * Potential security risk! Does what its name suggest.


 * You can override most directives with a  file. This is a security risk and the use of   is denied by this directive.
 * You can override most directives with a  file. This is a security risk and the use of   is denied by this directive.


 * Controls the access to files and directories. First look who is not allowed, then look who is allowed. The default is the last control that matches, if non matches or both match, use the default (=last)!
 * Controls the access to files and directories. First look who is not allowed, then look who is allowed. The default is the last control that matches, if non matches or both match, use the default (=last)!


 * Denies all hosts the access to all file in this container.
 * Denies all hosts the access to all file in this container.


 * Container for the  directory. Note the   and the   directives. Here we want access from all hosts.
 * Container for the  directory. Note the   and the   directives. Here we want access from all hosts.

If no  exist in this directory the contents of the directory itself is shown. allows this, while  generates an error message instead of listing the directories contents.
 * The file with this name is presented to the client when a web browser accesses a directory and not a specific HTML page.
 * The file with this name is presented to the client when a web browser accesses a directory and not a specific HTML page.


 * Sets the logfile for error messages.
 * Sets the logfile for error messages.


 * Sets the verbosity of the error messages.
 * Sets the verbosity of the error messages.


 * Sets the format of the entries in the custom log file (usually )
 * Sets the format of the entries in the custom log file (usually )


 * Sets name and location of the custom log file.
 * Sets name and location of the custom log file.


 * Directory for CGI scripts.
 * Directory for CGI scripts.


 * Apache uses this MIME type for the HTML pages it provides to the web browser, if the HTML page itself contains no other information.
 * Apache uses this MIME type for the HTML pages it provides to the web browser, if the HTML page itself contains no other information.


 * List of MIME types to use for different types of file names.
 * List of MIME types to use for different types of file names.

Access restrictions methods and files
Access to files and directories on the web server can be restricted based on the machines IP or network (hostname, domain, IP address, or network) or based on user name and password. While the access can be restricted by this methods all content transmitted in both directions is still not encrypted! To secure the communication and ensure the identity of the web server the SSL/TLS protocol will be used in the next chapter of this book.

Container
The behaviour of the Apache web server can be finely tuned in the Apache configuration (or the  file) on a per directory (  container), per file (  container), or per URL (  container) basis. The directives inside a  (or  ) container are valid for the directory itself and all its subdirectories. Most of this directives can be overwritten by external configuration files, usually. This is highly discouraged for security and sanity reasons. Some possible directives for  are:
 * : no use of external configuration changes allowed (safest)
 * : all directives can be changed (most insecure)
 * : some changes are allowed (not secure at all)
 * : mainly authentication related directives can be changed
 * : mainly  directives can be overwritten

Machine Restrictions
sets the sequence of access restrictions, where the last matching rule wins. The last rule is also the default rule if neither rule matches or both match. The possible /  restrictions are hostname (host.domain.example), domain (domain2.example), ip (192.168.10.3) and network (192.168.10 192.168.10.0/24).  Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all Deny from example.com  All access from everywhere is allowed, but the domain  is denied.

User Based Restrictions
User based access restrictions are insecure on different levels:
 * passwords are not encrypted (danger: snooping)
 * every content up- and download is clear text (danger: snooping)
 * there is no guarantee about the identity of the server (danger: fraud/phishing)

One big part of Apaches flexibility is its capability to talk to different back ends for user authentication, the most simple being plain text files, which is OK for smaller numbers of users, but do not scale to more than (about) 150 people.

Usernames, passwords and groups are stored in text files usually called  and. These names are defined in  or   by the directives   and. Both directives are part of the  module. We create/change a username and password with the  utility. The  option creates a new password file, if such a file already exists it will be destroyed without warning! requires two parameters: password and username. root@lpislack# htpasswd -c /etc/httpd/htpasswd newuser ... One other important thing to keep in mind is that the only safe place for password file and group file is outside the , where these files can't accidentally or maliciously be downloaded by unauthorized visitors.

Going back to reality, overwriting the  directives with a   file and placing the   and   inside the document directories is often done, if the web site administrator does not have full access to the Apache configuration, e. g. in shared hosting environments. To protect these files one can restrict the access to them in a  container spelled out in the :  Order allow,deny Deny from all Satisfy All 

Example 1
This example shows the preferred, but sadly not always possible configuration. The restricted directory is, that can be reached at   by my web browser.

AuthType Basic AuthName "Private1! Restricted Access!" require valid-user AuthUserFile /etc/httpd/htpasswd  After fiddling with the configuration file we probably should restart the  server process. root@lpislack:/etc/httpd# /etc/rc.d/rc.httpd restart

Create password file : root@lpislack:/etc/httpd# htpasswd -c /etc/httpd/htpasswd firstuser New password: Re-type new password: Adding password for user firstuser root@lpislack:/etc/httpd# cat htpasswd firstuser:2km7TAXpj3scw root@lpislack:/etc/httpd# This file is only accessible by authorized ( !) users. (And by the way, the password is .)

The password protected page  source code: root@lpislack:/srv/www/htdocs/private1# cat index.html This is private!

Example 2
This example shows a commonly used configuration. It is not the best, but sometimes the only possible setup. We can do much better (much safer) if we can locate the password file outside the . The restricted directory here is, that can be reached at   by the web browser.

The only change to  is to allow. In fact, if we can change, we could do the right thing in the first place (see Example 1).  AllowOverride AuthConfig 

Set up the external configuration file  in  : AuthType Basic AuthName "Private2! Restricted Access!" require valid-user AuthUserFile /srv/httpd/htdocs/private2/.htpasswd

Restart : root@lpislack:/etc/httpd# /etc/rc.d/rc.httpd restart

Create password file with the user  with the password  : root@lpislack:/etc/httpd# htpasswd -c /srv/www/htdocs/private2/.htpasswd seconduser New password: Re-type new password: Adding password for user seconduser root@lpislack:/etc/httpd# cat /srv/www/htdocs/private2/.htpasswd seconduser:2l.jKENGUwyQ6

Modules and CGI
Flexibility and easy extendability are two important reasons for Apaches success. They are achieved in part by the CGI (=common gateway interconnect) concept and the ability to extend an already compiled Apache instance with modules. CGI programs (often called "CGI scripts") are executable programs that can be written in any language, be it bash, pearl, php, basic, assembler or ada. They run on the server, which uses up hardware resources of the server (RAM and CPU time) but do not impact the client. He receives what looks like any static HTML page, although the HTML page was dynamically created by the CGI program. The  takes the output of the CGI program and gives it unchanged and unchecked to the client web browser. (e.g., the HTTP headers have to be crafted by the CGI program). CGI programms can also take user input (via PUT or GET requests).

Example
This  script outputs "don't try this at home" in ugly blinking letters and prints the content of   to show how dangerous CGI programming can be! echo "Content-type: text/html" echo "" echo " " echo " " echo " DON'T TRY THIS AT HOME! " cat /etc/passwd echo " " echo " "
 * 1) !/bin/sh

Modules
Modules on the other hand can extend the abilites of the  with features that are not part of the main Apache source code. (Some modules can be compiled into the  directly.) Modules can be switched on and off (e. g. for security reasons) with a simple change in the. Most modules need addional configuration directives in, usually by importing configuration files.

For security reasons we will only enable modules actually needed by our web site.

mod_php
One very useful example is. If PHP code is executed as a simple CGI script, every script starts the PHP parsing engine, the HTML text is generated, and then the PHP parsing engine is shut down.

starts the PHP engine as a module for the Apache process and the PHP engine will be persisted over multiple requests. This drastically reduces the overhead of using PHP for dynamic web site creation. As an added bonus we can use PHP code directly in our HTTP sourcecode. This code also runs at the server side and is then replaced by its output before the complete HTML page is sent to the client. If we use a database, this connection can also be persisted if we use.

The PHP language itself is configured by, located at  , but this file usually don't need to be changed.

To enable  on Slackware 13.0 we only need to uncomment the line Include /etc/httpd/mod_php.conf in. This will include this already set up configuration directly into our.



We now change  to AddType application/x-httpd-php .php .html .htm to use PHP code inside HTML documents. This can be convenient, but increases the workload on high traffic websites considerably, because ever requested HTML page is shoved through the PHP interpreter. Another thing we can do to make our lives a bit easier, is adding  to the   directive.

Now we restart the

Example
To check if it works we create  somewhere below the Status for PHP  Now remove this file (or at least deny read access), because this will blast our entire web server configuration to the whole internet, where every creep of the planet is just milliseconds away from us. (Try searching for  in Google...)

mod_perl
While Slackware 13.0 comes with  as an installable package, the minimal test CGI script   in   needs a small help. First we need to mark it as an executable by root@lpislack:/srv/www/htdocs# chmod a+x ../cgi-bin/printenv and then change the first line " " to " ". Now we can navigate our web browser to http://lpislack.vbox.privat/cgi-bin/printenv and see if it works: DOCUMENT_ROOT="/srv/httpd/htdocs" GATEWAY_INTERFACE="CGI/1.1" HTTP_ACCEPT="text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1" HTTP_ACCEPT_CHARSET="iso-8859-1, utf-8, utf-16, *;q=0.1" HTTP_ACCEPT_ENCODING="deflate, gzip, x-gzip, identity, *;q=0" HTTP_ACCEPT_LANGUAGE="de-DE,de;q=0.9,en;q=0.8" HTTP_CACHE_CONTROL="no-cache" HTTP_CONNECTION="Keep-Alive, TE" HTTP_HOST="lpislack.vbox.privat" HTTP_TE="deflate, gzip, chunked, identity, trailers" HTTP_USER_AGENT="Opera/9.80 (X11; Linux i686; U; de) Presto/2.2.15 Version/10.10" PATH="/bin:/usr/bin:/sbin:/usr/sbin" QUERY_STRING="" REMOTE_ADDR="192.168.10.21" REMOTE_PORT="40206" REQUEST_METHOD="GET" REQUEST_URI="/cgi-bin/printenv" SCRIPT_FILENAME="/srv/httpd/cgi-bin/printenv" SCRIPT_NAME="/cgi-bin/printenv" SERVER_ADDR="172.25.28.4" SERVER_ADMIN="you@example.com" SERVER_NAME="lpislack.vbox.privat" SERVER_PORT="80" SERVER_PROTOCOL="HTTP/1.1" SERVER_SIGNATURE="" SERVER_SOFTWARE="Apache/2.2.14 (Unix) DAV/2 PHP/5.2.12" UNIQUE_ID="S3ibiawZHAQAAAq2Hf0AAAAD" We do not use  at this time but run CGI scripts written in Perl as we would run any other executable.

So  (from http://perl.apache.org) does for the Perl language the same as   does for PHP : it adds native language support directly into the Apache web server and so reduces load and speeds up response time.

Sadly there is no pre built  package for Slackware 13.0, but http://slackbuilds.org has at http://slackbuilds.org/repository/13.0/network/mod_perl/ a tried and true buildscript for everyone who can read the instructions. (As as sidenote, SlackBuilds are the preferred method to build Slackware packages from source.)


 * This situation demonstrates the use of modules: Functionality that is not included in Apache can be added by external modules without recompiling Apache. If there was a bugfix for Apache and we had to upgrade,  will still work fine as a modul. If   was compiled into Apache we had to get the source code, fit it to our setup, compile and install it. With every update we would need to go through the same process, just to keep using Perl.

After building and installing the package we simply need to include  to   and restart the Apache server.

# mod_perl mode SetHandler perl-script PerlResponseHandler ModPerl::Registry PerlOptions +ParseHeaders Options +ExecCGI </Files> Perl files can live everywhere in the  and their name has to end in " ". Let's go back to the  example. If we call it again, it will still be executed as CGI, but if we copy it to the  and rename it   it will be run by , as we can clearly see by the   and   lines in the output below: DOCUMENT_ROOT="/srv/httpd/htdocs" GATEWAY_INTERFACE="CGI/1.1" HTTP_ACCEPT="text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1" HTTP_ACCEPT_CHARSET="iso-8859-1, utf-8, utf-16, *;q=0.1" HTTP_ACCEPT_ENCODING="deflate, gzip, x-gzip, identity, *;q=0" HTTP_ACCEPT_LANGUAGE="de-DE,de;q=0.9,en;q=0.8" HTTP_CONNECTION="Keep-Alive, TE" HTTP_HOST="lpislack.vbox.privat" HTTP_TE="deflate, gzip, chunked, identity, trailers" HTTP_USER_AGENT="Opera/9.80 (X11; Linux i686; U; de) Presto/2.2.15 Version/10.10" MOD_PERL="mod_perl/2.0.4" MOD_PERL_API_VERSION="2" PATH="/bin:/usr/bin:/sbin:/usr/sbin" QUERY_STRING="" REMOTE_ADDR="192.168.10.21" REMOTE_PORT="45519" REQUEST_METHOD="GET" REQUEST_URI="/printenv.pl" SCRIPT_FILENAME="/srv/httpd/htdocs/printenv.pl" SCRIPT_NAME="/printenv.pl" SERVER_ADDR="172.25.28.4" SERVER_ADMIN="webadmin@lpislack.vbox.privat" SERVER_NAME="lpislack.vbox.privat" SERVER_PORT="80" SERVER_PROTOCOL="HTTP/1.1" SERVER_SIGNATURE="" SERVER_SOFTWARE="Apache/2.2.14 (Unix) DAV/2 PHP/5.2.12 mod_perl/2.0.4 Perl/v5.10.0" UNIQUE_ID="S3i3VqwZHAQAAAxfFDUAAAAA"

Restrict Resource Usage
Apache is capable of serving up pretty busy web sites. One mechanism to provide quick responsetimes under heavy load is to have waiting processes ready to jump into action at any given time. So unlike most other programs Apache spawns multiple processes when it is started. The number of processes is adjusted depending on the numbers of connections by creating and destroying child processes as needed.

One control process listens for new requests, usually on TCP port 80, while every client is connected to its very own child process that serves requests for the whole lifetime of this connection. determines the number of processes to begin with when Apache is started. But this is of little meaning, because  sets the minimum number of idle Apache processes waiting to server new connections. If there are less spare servers left, they are created at a rate of one per second. If that is not enough, the rate of process creation is doubled every second up to 32 new processes per second. If this is not sufficient, we sure as hell have other problems. On the other hand, if there are more idle servers than  the unneeded processes are shut down one by one.

limits the absolute number of simultaneously running server processes, and with that the maximum number of simultaneous client connections. The maximum number of 256 is a hard limit set at compile time. If there are more connection requests than apache processes to serve them, the requests are first moved to a backlog, and only if this backlog is filled up too, the requests are rejected.

The lifetime of an apache (child) process can be limited by the absolute number of connections he will serve as defined by. This can mitigate problems when memory leaks on less stable platforms occur or problems caused by buggy modules or badly written CGIs. If set to  child processes can live indefinitely, if they are not terminated because of too many spare servers.

Redhat/CentOS
Installation The web server binary is called. The control script is called. The access and error log files are located in  and called   and.
 * 1) yum install httpd

Debian
Intallation The web server binary is called. The control script is called, which is not the same as   by another name. The access and error log files are located in  and called   and. (Note the dot “.” instead of the underscore “_”.)
 * 1) aptitude install apache2