|
Note: My latest patch has support for persistent connections, which can potentially make WWWOFFLE significantly more efficient. |
WWWOFFLE is a simple proxy server written by Andrew M. Bishop with special
features for use with dial-up internet connections.
The official WWWOFFLE homepage by the author can be found at http://www.gedanken.demon.co.uk/wwwoffle/.
I've been using WWWOFFLE on Linux for a number of years now, and I've found
it to be very useful in keeping my phone costs down. I have a permanent connection
to the Internet now, but I still use WWWOFFLE to share the internet connection
on my home LAN, to filter out ads and other undesirable stuff, and to keep
backup copies of the webpages I've visited.
I've come to know WWWOFFLE so intimately now that I find it hard to switch
to another proxy, even though there are plenty of alternatives.
As with almost any piece of software, WWWOFFLE has its quirks. Because
WWWOFFLE is licensed under the GPL and the source is freely available, I was
able to fix myself most of the features that I found defective or lacking.
The WWWOFFLE source is nicely structured and quite readable, and I found
hacking WWWOFFLE quite enjoyable, even addictive to a certain degree.
During the course of time I made numerous changes to the WWWOFFLE source, fixing bugs, adding a few features, and sometimes just modifying the code to suit my personal taste. Even though I like the overall structure of WWWOFFLE, I disagree with a number of the implementation details and I've changed most of the things I didn't like.
Out of respect for the free-software community, I'm making the modifications I've made available via this webpage as a patch file wwwoffle-2.9d-par.diff.gz. (Previous patches are still available from this page.)
To use this patch file, first untar a fresh copy of the original WWWOFFLE
version 2.9d source and cd into the source directory wwwoffle-2.9d.
Then apply the patch using the command
gzip -cd <path_to_patch>/wwwoffle-2.9d-par.diff.gz | patch -p2 -N -E
Then ./configure, make and make install
just as you would with the original source code. Note: The current patch breaks
IPv6 functionality. Because in version 2.9 IPv6 is enabled by default, you will
need to configure using the --without-ipv6 option.
|
Important: Before you can start wwwoffled you must make some changes to the CensorHeader section of the configuration file and convert the U* files in the cache to a new database file. The details can be found in the file README.par in the source directory after applying the patch (but you can also download this file separately) . |
Here's a summary of the most important modifications I've made (in reverse chronological order):
keep-alive") connections, both to the client and to remote servers.
This can be significantly more efficient for two reasons:
allow-keep-alive = yes. Of course support must
also be enabled in the browser, but most browsers have this by default.
See README.par for more details.
keep-cache-if-header-matches. This option is similar to
keep-cache-if-not-found, but examines the header lines (instead of the status
code) of the reply from the remote server to determine whether the new page is
less desirable than the already cached one. I use this new option to prevent
a certain site that requires a subscription from overwriting a good copy of
an article with a login page. Unfortunately, this new option is not as generally
useful as I had hoped, because only few of these types of sites give
clues in the reply header which can be used to determine which replies contain
the full article text and which contain the login page.
insert-file. The insert-file option can be
used to add the content of a local file at the end of a webpage, without
modifications. One of the more powerful applications of this feature is to add
your own Javascript to webpages; this allows you to implement most of the
possibilities of Greasemonkey, a popular Firefox extension, even on browsers
that do not support Greasemonkey (but do support Javascript).disable-script=yes to disable the website's
Javascript and use the insert-file option to add my own Javascript
to remove any remaining ads that are otherwise difficult to get rid of.
Here is an example of an include file that I
use to get rid of some of the more annoying ads on
linuxtoday.com.
urlhashtable instead of
the many small U* files. This reduces the number of files in the WWWOFFLE cache
by almost 50%. Having many small files is quite inefficient. For example, on my
system each U* file occupies 4K of disk space. With 78587 U* files that is 307MB
of occupied disk space. My urlhashtable file uses less than 7MB!
(The file will appear to be much larger, but actually occupies only a very modest
amount of disk space. See README.par for
details.)wwwoffle-ls2 utility uses a
separate lookup table to make pattern matching of URLs significantly more
efficient, I got an idea how to implement this directly in WWWOFFLE. My
implementation uses a single file which is mmapped to an area of address space
that is shared between all WWWOFFLE processes."use-url = yes" is now much faster (by as much as a factor of
4 on my system). I also believe that storing the contents of webpages while
online has become somewhat faster (because only one file per webpage needs to be
written instead of two), but I haven't actually done any measurements to verify
this. The reading of webpages stored in the cache will probably not be
significantly improved.
replacement-meta-refresh-time.
See README.par for more information.
always-use-etag. Certain server farms spoil the usefulness of Etags by
issuing different Etags for the same content, causing WWWOFFLE to needlessly
download the same page or image repeatedly, even though the content hasn't
changed at all. Setting always-use-etag = no can remedy this problem by
forcing WWWOFFLE to base conditional requests on the Last-Modified time only, if
it can be considered a strong validator.validate-with-etag, but I still
use Marc Boucher's implementation of this feature.
keep-cache-if-not-found. This
is useful for preventing old cached versions of pages being overwritten by
error messages from a web server. An implementation of this feature is also
available as a separate small patch file from this page.cache-control-no-cache that can be used
in the offline section of the configuration file. This option works similarly
to "pragma-no-cache" and can be used to reduce the number of outgoing requests
that are generated when you hit the reload button of your web-browser while
offline.session-cookies-only. When enabled,
WWWOFFLE strips the expires field from Set-Cookie: server headers.
Most browsers will not store such cookies permanently and forget them in between
sessions.A more detailed list of the changes I've made can be found in the README.par file in the source directory after patching. At the end of this file you can also find my email address should you wish to contact me.
PS: If you use pdnsd, you might also be interested in my pdnsd webpage.
Have fun.
Paul Rombouts.
