Vicompress HTTP Proxy
Copyright (c) 2003-2008 ViSolve

Table of Contents

1. What is Vicompress?

2. Compiling and Installing Vicompress

3. Running Vicompress

4. Configuring Vicompress

5. Compression

6. Caching

7. Load Balancing and Failover

8. Log Statistics

9. Log Files

 

 

1. What is Vicompress?

Vicompress is an HTTP web accelerator. It speeds up download response times by caching frequently requested pages, and by compressing text pages for smaller downloads. Vicompress can be used in two different setups, one for Internet Service Providers (ISPs), and one for individual websites:


Setup for ISPs

Client Browser

Client Browser

Client Browser


  ISP  

Vicompress

 

 Internet  

 

 


Setup for website

Client Browser

Client Browser

Client Browser


 

 Internet  

 

 


Vicompress

Webserver

Webserver

Webserver



What features does Vicompress support?

  • In-Memory compression
    Vicompress supports in-memory compression of text pages, such as HTML, javascript, stylesheets, PDF documents, and Microsoft Word documents. Image files and other file types are not compressed. Because text pages are compressed, the download time will be faster for clients, especially over slow connections.
  • In-Memory caching
    Vicompress supports in-memory caching of static data, such as images, stylesheets, and html files. If a web page is in the cache, Vicompress will respond to the client directly, rather than contacting the web server. For ISPs, this results in a faster response to clients. For single websites, this reduces the load on the backend web server.
  • Load Balancing and Failover
    For websites, Vicompress supports load-balancing over multiple backend web servers. If a backend web server goes down, vicompress will internally mark the server as down and avoid sending requests to that server. Vicompress will then periodically check the down servers, to see if they have come back up.
  • Sticky Sessions
    Vicompress supports sticky sessions, where all traffic from the same client browser goes to the same backend webserver.
  • DNS lookup caching
    When requesting a webpage, the DNS lookup time (looking up the IP address of the website's hostname) can often be slow, sometimes up to 20 or 30 seconds. Vicompress will cache the DNS lookups, thereby improving response time.
  • Log files
    Vicompress logs every HTTP request to a log file. It supports the Apache Combined Log Format, as well as the Squid Access Log format.
  • Log statistics
    Vicompress provides tools to generate log statistics in HTML format. The HTML reports include statistics such as bandwidth, compression, and caching for a given period (hour, day, month).
  • Scales to multiple processors
    Vicompress uses multiple threads for the CPU intensive gzip compression. Vicompress will automatically determine how many processors your system has, and will spawn one compression thread per processor.

What features does Vicompress not support?

  • IPv6 Addresses
    Vicompress does not support IPv6 addresses, only IPv4 addresses.

 

2. Compiling and Installing Vicompress

Requirements

  • A POSIX compatible Unix system. However, only Linux 2.4, Linux 2.6, and Cygwin have been tested.
  • The GNU gcc compiler. Other ANSI-C compilers may work, but will probably require Makefile modifications to compile.
  • The pthread and zlib libraries.
  • GNU make.

Compiling from the source

Extract the vicompress source code, and change into the src directory.

# gunzip vicompress-1.0.x.tar.gz
# tar -xvf vicompress-1.0.x.tar.gz
# cd vicompress-1.0.x/src/

Run the configure script, passing the directory to install vicompress into. The default directory is /usr/local/vicompress.

# ./configure /usr/local/vicompress

If you're using a C compiler other than gcc, you will need to edit the compiler flags in the Makefile. The default flags are:

CC=gcc
LIBS= -lpthread -lz
CFLAGS= -O1 -Wall
LDFLAGS=

Compile the source code.

# make
# make install

The make install command will copy the vicompress runtimes files into the install directory (/usr/local/vicompress ).

The following files will be installed:

 Vicompress Files
/usr/local/vicompress/LICENSE      The License
/usr/local/vicompress/README.html The HTML documentation
/usr/local/vicompress/bin/tune_kernel.sh Script to tune Linux kernel parameters, for performance
/usr/local/vicompress/bin/update_log_stats Program to generate/update log statistics every hour
/usr/local/vicompress/bin/update_log_stats.sh Script to cleanly start/stop the update_log_stats program.
/usr/local/vicompress/bin/vicompress Main vicompress server
/usr/local/vicompress/bin/vicompress.sh Script to cleanly start/stop the vicompress server.
/usr/local/vicompress/etc/vicompress.conf Configuration file
/usr/local/vicompress/log/ Directory where log files are stored
/usr/local/vicompress/logstats/ Directory where HTML log statistics are written to
/usr/local/vicompress/logstats/template.html Template HTML file used when generating HTML statistics reports.
/usr/local/vicompress/logstats/verticalbarN.png Image files used in HTML statistics reports.
/usr/local/vicompress/logstats/visolvelogo.png Image file used in HTML statistics reports.

Installing from an RPM

Visolve also distributes Vicompress as a pre-compiled binary RPM file (RedHat Package Manager) for Linux x86 based systems. You can install Vicompress from the binary RPM by running the rpm install command:

# rpm -i Vicompress-1.0.x-1.i386.rpm

To remove the Vicompress package, use the rpm erase command. All the Vicompress files will be removed, except for log files.

# rpm -e Vicompress

 

3. Running Vicompress

Vicompress can be run in one of two setups:

For ISPs: a forward HTTP proxy that forwards requests to the server given in the URL.
For websites: a reverse HTTP proxy that forwards requests to a set of backend web servers.

If you are running Vicompress in the first setup, you can start the server with the default settings. However, for the website setup, you must add the IP address and port of all the backend webservers to the configuration file /usr/local/vicompress/etc/vicompress.conf

webserver <IP address> <port>

For example:

webserver 192.168.10.2 80
webserver 192.168.10.3 80

See Configuring Vicompress for more details about the configuration file.

To start and stop the server, use the script:

/usr/local/vicompress/bin/vicompress.sh

It can take one of three possible arguments:

start    start the vicompress server
stop    stop a running vicompress server process
status    print whether or not vicompress is running

To start Vicompress, simply use the "start" argument:

# cd /usr/local/vicompress/bin
# ./vicompress.sh start

The server is automatically run in the background. The default listening port for Vicompress is 80.

 

4. Configuring Vicompress

Vicompress uses the configuration file

/usr/local/vicompress/etc/vicompress.conf

One option is specified per line. Blank lines are ignored. Lines beginning with a hash (#) are ignored.

Configuration Summary
webserver
listenip
listenport
outgoingip
enable_sessions
enable_compression
enable_caching
cache_memory
max_cacheditem_size
cache_expires
enable_dns_caching
dns_expires
user
hostheader
accesslog
errorlog
rotatesize
logformat
<IP address> <port>
<IP address>
<port>
<IP address>
<yes|no>
<yes|no>
<yes|no>
<size in megabytes>
<size in kilobytes>
<hours>
<yes|no>
<hours>
<username>
<hostname>
<path to logfile>
<path to logfile>
<size in megabytes>
<apache|squid>

Each option is explained in detail below as follows:

  • Option name and parameters
  • Example parameters
  • Description

webserver <IP address> <port>
webserver 192.168.10.2 80

Specify a backend web server to forward requests to. For ISPs, no webserver entries should be specified. In this scenario, Vicompress will act as a forward HTTP proxy, and will connect to the origin server specified in the HTTP request. For websites, one or more webserver entries should be specified. Multiple webserver entries may be specified, each on a separate line. In this scenario, Vicompress will act as a reverse HTTP proxy, and will distribute the requests among the backend webserver entries specified. See the Load Balancing and Failover section for more details.

listenip <IP address>
listenip 192.168.0.1

Specify the IP address for Vicompress to listen on. The default value is all interfaces, 0.0.0.0.

listenport <port>
listenport 80

Specify the port for Vicompress to listen on. The default port is 80, the standard HTTP port. Only servers started by root can bind to ports less than 1024.

outgoingip <IP address>
outgoingip 192.168.0.1

Specify the IP address for Vicompress to bind to when making outgoing connections. The default value is any interface, 0.0.0.0.

enable_sessions <yes|no>
enable_sessions yes

Enable or disable sticky sessions. The default value is yes. This option is only used when two or more webservers are specified in the Vicompress configuration. When enabled, Vicompress will use HTTP Cookies to ensure that a client is sent to the same backend web server for the duration of it's session. An HTTP session lasts until the client browser is closed. See the Load Balancing and Failover section for more details.

enable_compression <yes|no>
enable_compression yes

Enable or disable gzip compression. The default value is yes. When enabled, Vicompress will gzip HTML and text pages before sending the response to the client.

enable_caching <yes|no>
enable_caching yes

Enable or disable caching of pages. The default value is yes. When enabled, Vicompress will cache static pages and images in memory.

cache_memory <size in megabytes>
cache_memory 50

Specify the size of the in-memory cache, in megabytes. The default value is 50. Note that Vicompress will also use around 30 MB for basic operation and compression. The total memory (cache_memory + 30 MB) should not exceed the amount of RAM memory available. If cache_memory is set to 0, caching is disabled.

max_cacheditem_size <size in kilobytes>
max_cacheditem_size 512

Web pages larger than this size will not be cached. In order to have a high hit rate, Vicompress should cache many small pages, rather than a few large pages. To prevent large pages from being cached, use this option, max_cacheditem_size. The default value is 512 kilobytes.

cache_expires <hours>
cache_expires 240

When a web page is cached, it remains in the cache based on its age. The expiration time is set to half of the item's age. For example, a page that is 4 days old will be cached for 2 days. This option is used to place an upper limit on the expiration time. The default value is 240 hours (10 days). After 10 days, the page is removed from the cache.

enable_dns_caching <yes|no>
enable_dns_caching yes

Enable or disable caching of dns lookups. The default value is yes. When enabled, Vicompress will store the hostname-to-IP address mappings in memory.

dns_expires <hours>
dns_expires 48

Specify the amount of time a cached DNS mapping is valid. The default is 48 hours.

user <username>
user nobody

The user to run the server as. Vicompress will switch to this user after binding to the listening port (usually port 80). Vicompress is generally started as root, since only root can bind to ports less than 1024.

However, it is unsafe to run a server program as root. A buffer overflow error can give clients root access to the server machine. Therefore, Vicompress will switch to a non-root user after binding to port 80. That user is specified by the "user" option given above.

The default value is the user who started the server.

hostheader <hostname>
hostheader mydomain.com

This option only applies when accelerating a single webserver (when the webserver option is enabled). Specify the hostname to use in the HTTP Host header, when sending the HTTP request to the webserver. By default, Vicompress will just send the same HTTP Host header it receives from the client browser.

accesslog <path to logfile>
accesslog /usr/local/vicompress/log/accesslog

Specify the file path where the access log should be stored. The file must be writable by the username given in the "user" option. If the accesslog is not specified, no access log file will be created or written to.

errorlog <path to logfile>
errorlog /usr/local/vicompress/log/errorlog

Specify the file path where the error log should be stored. The file must be writable by the username given in the "user" option. If the errorlog is not specified, no error log file will be created or written to.

rotatesize <size in megabytes>
rotatesize 1024

Rotate the log files when they reach the specified size, in megabytes. The default value is 1024. When rotation occurs, the current log file at <accesslog> is moved to <accesslog>.1 and a blank log file is created at <accesslog>. See Log Files for further details about log file rotation.

logformat <apache|squid>
logformat squid

Specify the format of the accesslog file. The supported formats are the Apache Combined Log Format, and the Squid Access Log Format. The default value is the Squid format. See the Log Files and Log Statistics sections for further details.

 

5. Compression

Vicompress can compress text pages, such as HTML, javascript, stylesheets, PDF documents, and Microsoft Word documents, before sending them to back to the client. This results in faster download times, especially over slow modem connections. Both static and dynamic pages can be compressed, such as output from PHP or CGI scripts. Images and other binary file types are not compressed. Vicompress checks the HTTP header Accept-Encoding to determine whether the client's browser supports gzip encoding or not.

Compression related options:
enable_compression

 

6. Caching

Vicompress can cache data in memory, such as html pages and images. When a browser requests an item found in the cache, Vicompress will send the response directly, rather than contacting the destination webserver. For ISPs, this results in a faster response time for clients. For single websites, this reduces load on the backend webserver.

Vicompress will not cache web pages that are generated dynamically, such as through ASP, PHP, or CGI scripts. Vicompress uses the HTTP headers Last-Modified and Content-Length to determine whether a response is dynamically generated or not. In addition, Vicompress will not cache pages that are password protected (pages that require the HTTP header Authorization.

Items will remain in the cache based on their age (the Last-Modified header). The expiration time is set to half of the item's age. For example, a web page that was last modified 8 days ago will remain in the vicompress cache for 4 days. In addition, the cache_expires option sets an upper limit on the time an item can remain in the cache. When the in-memory cache becomes full, items that have not been recently accessed are removed to make room for new items in the cache.

Users can view the list of URLs in the memory cache by logging into the vicompress machine and sending the following special URL to vicompress:

http://<hostname>/_viewcache_

Vicompress will return a plain text list of the URLs in the cache, one per line. Vicompress will return the list only for http clients on the same machine as vicompress. Outside clients cannot access the cached URL list.

Caching related options:
enable_caching
cache_memory
max_cacheditem_size
cache_expires

 

7. Load Balancing and Failover

Load Balancing

When one or more webserver entries are specified, Vicompress will act as a reverse HTTP proxy, and will distribute requests among the backend webservers. Vicompress uses a simple round-robin algorithm for load distribution.

Failover

If Vicompress fails to connect to a backend web server, that web server is marked as down, and will be skipped for future requests. Clients that had previous sessions with that web server will be forwarded to a new backend web server. Vicompress will try to re-connect to a down web server every 3 minutes. If the connection succeeds, the web server is marked as up again. If all backend web servers are down when a request arrives, Vicompress will simply choose among the down web servers, in round-robin fashion.

Sessions

Many web applications keep session information for each client, such as shopping cart items. Session information may be stored on a central database, or may be stored locally on individual web servers. If your website stores session information on individual web servers, then a client's requests cannot be distributed across multiple web servers. To force a client to use the same backend web server throughout a session, Vicompress provides the enable_sessions option. When enabled, Vicompress will send the client a cookie to indicate which backend web server to use:

Set-Cookie: vicompressid=1

For the duration of the session, the client browser will send the vicompress Cookie for every request:

Cookie: vicompressid=1

Vicompress will forward the requests to the backend webserver specified by the cookie. When the client browser is closed, the browser discards the vicompress Cookie, and the session is ended. Note that if sessions are enabled, client connections may not be evenly distributed across the multiple backend web servers.

Load Balancing related options:
webserver
enable_sessions

 

8. Log Statistics

Vicompress includes a tool to generate statistics about bandwidth, caching, and compression. To generate the log statistics, use the script:

/usr/local/vicompress/bin/update_log_stats.sh

This script takes one of three possible arguments:

start <dir>   Run a daemon to generate/update the log statistics every hour. Store the log statistics in the given directory <dir>. If the <dir> argument is not given, the default directory is /usr/local/vicompress/logstats.
stop  Stop the update_log_stats.sh program.
status  Print whether or not the update_log_stats.sh program is running.

To generate the log statistics, run the following command:

# cd /usr/local/vicompress/bin
# ./update_log_stats.sh start /usr/local/vicompress/logstats

The update_log_stats program will run in the background. Every hour, it will parse the accesslog and write an HTML statistics report to
/usr/local/vicompress/logstats/YYYYMMstats.html

where YYYY is the year, and MM is the month. For example:

/usr/local/vicompress/logstats/200501stats.html (Jan 2005)
/usr/local/vicompress/logstats/200502stats.html (Feb 2005)
/usr/local/vicompress/logstats/200503stats.html (Mar 2005)

This report will show the bandwidth saved with caching and compression for

  • The entire month
  • Each day of the month
  • Each day of the week
  • Each hour of the day

View a sample report.

In addition, an HTML date index file will be created containing hyperlinks to all the monthly statistics.
/usr/local/vicompress/logstats/statsindex.html

 

 
9. Log Files

Vicompress produces two log files: an access log and error log.

Access Log

The accesslog stores information about each client request on a single line. It is used for gathering website statistics. The path of the accesslog is determined by the configuration option accesslog:

accesslog /usr/local/vicompress/logs/accesslog

The log format is determined by the logformat option:

logformat <apache|squid>

The Apache Combined Log Format is described at http://httpd.apache.org/docs/logs.html . A summary of the format is given below. Note that Vicompress uses the "ident" field (2nd field) to store cache and compression information.

clientip The IP Address of the client.
hit and compression Either "hit" or "miss", followed by the content length before compression.
username The username sent for authentication, or "-" if not given.
date The date of the response [day/month/year:hour:min:sec +/-timezone]
firstline The first line of the HTTP request (method url version).
replycode The server HTTP reply status code.
contentlength The length of the server reply body, in bytes.
referer The URL which referred the user to this website.
useragent The platform and version of the client browser.

Here is a sample Apache Log entry:

15.13.130.10 miss15923 - [21/Aug/2003:17:26:45 -0700] "GET /index.html HTTP/1.0" 200 1852 "http://www.google.com/" "Mozilla 4.0 (IE 6.0 compatible)"

The Squid Access Log Format is described at http://www.squid-cache.org/Doc/FAQ/FAQ-6.html . The Squid format contains information similar to the Apache format. The Squid format conatins additional information about cache hits, but does not store the Referer or User-Agent headers. Note that Vicompress uses the "ident" field (9th field) to store compression statistics.

date The date of the response, the number of seconds since 1970.
duration The duration of the response, in milliseconds.
client The IP address of the client.
hit status TCP_HIT if the request is a cache hit, else TCP_MISS.
replycode The server HTTP reply status code.
contentlength The length of the server reply body, in bytes.
method The HTTP request method (GET, POST, etc).
URL The requested URL.
compression The content length of the reply before compression.
peerstatus NONE if the request is a cache hit, else DIRECT.
peerhost The IP address of the backend web server, or "-" if a cache hit.
contenttype The content type of the HTTP reply, or "-" if not given.

Here is a sample Squid log entry:

1112387949.000 529 15.13.130.249 TCP_MISS/200 1031 GET http://www.amazon.com/somefile.jpg 8523 DIRECT/15.0.110.12 text/html

Log Rotation

The accesslog file can grow quickly under heavy load. Therefore, vicompress will automatically rotate log files once they reach a certain size. This size is given by the configuration option:

rotatesize <size in megabytes>

The default size is 1024, or about 1 gigabyte.

When rotation occurs, Vicompress will execute the following:

mv accesslog.8 accesslog.9
mv accesslog.7 accesslog.8
mv accesslog.6 accesslog.7
mv accesslog.5 accesslog.6
mv accesslog.4 accesslog.5
mv accesslog.3 accesslog.4
mv accesslog.2 accesslog.3
mv accesslog.1 accesslog.2
mv accesslog   accesslog.1

The current logfile (accesslog) is moved to accesslog.1. The previous logfile at accesslog.1 is moved to accesslog.2, and so forth. The last logfile at accesslog.9 is deleted.

For errorlog rotation, the same commands occur, except that the errorlog file is moved, instead of the accesslog file.

Error Log

The error log stores error and debugging messages from Vicompress. The path of the errorlog is determined by the configuration option errorlog:

errorlog /usr/local/vicompress/logs/errorlog

By default, Vicompress will write only basic startup and shutdown messages to the error log, prefixed by the current date. The messages are shown below:

Vicompress started
Vicompress shutting down
Maximum concurrent clients is <number>
Cache size is <number> MB

Users can enable additional debug messages during runtime to troubleshoot any problems with vicompress. Debugging is toggled (enabled/disabled) by running the command below. Note that debugging is initially disabled when vicompress is started.

# cd /usr/local/vicompress/bin
# ./vicompress.sh debug

Each debug message in the errorlog contains the date, a message, and the IP address:port of the client connection. Here is a sample entry:

[Tue May 10 17:18:41 2005] [127.0.0.1:52689] New client accepted.

Below is the complete list of debugging messages:

New client accepted.
Read http request from client: status=<status message>, url=<urlpath> <HTTP request>
Reading server reply from <IP address>:<port>
Read http reply from server <IP address>:<port>: status=<status message> <HTTP reply>
Writing server reply
Writing and caching server reply
Writing and compressing server reply
Writing, caching, and compressing server reply
Writing error reply to client
Writing cache url list reply to client
Writing cached reply to client
Writing direct reply to client
Wrote server reply: status=<status message>
Wrote cached reply: status=<status message>
Wrote direct reply: status=<status message>
Wrote error reply: status=<status message>
Wrote cache url list reply: status=<status message>
Keeping client connection alive
Closing connection

For the <HTTP request> and <HTTP reply>, vicompress will print out the full http request and reply headers. For <status message>, vicompress will print out one of the messages below:

Success
Bad HTTP Request from client
Bad HTTP Reply from server
Client closed prematurely
Server closed prematurely
Connect failed
Error reading from client
Error writing to client
Error reading from server
Error writing to server
DNS lookup failed