Automatic proxy HTTP server configuration in web browsers

You've come to this page because you've asked questions similar to the following:

How does automatic proxy HTTP server configuration in web browsers work ? How can I support it ?

This is the Frequently Given Answer to such questions.

Automatic proxy HTTP server configuration involves three things:

a proxy auto-configuration (PAC) script
publication of the PAC script
providing web browsers with the location of the PAC script, either manually or via the Web Proxy Auto-Discovery (WPAD) protocol

Proxy Auto-Configuration files

Most web browsers may be configured manually, with a single, fixed, proxy HTTP server. However, Netscape Navigator version 2.0 and later and Microsoft's Internet Explorer version 3.0a and later may also instead be configured to use Proxy Auto-Configuration (PAC) files.

PAC are files that contain the text of a single JavaScript function, FindProxyForURL(). In theory, every time that a web object is about to be fetched, the JavaScript function is invoked (by the web browser) with two arguments: the URL of the object and the hostname derived from that URL. The result of the function is a string comprising a semi-colon-separated sequence of one or more instructions that determine whence the web browser is to fetch the object from:

Instruction	Meaning
`DIRECT`	Fetch the object directly from the content HTTP server denoted by its URL
`PROXY` name:port	Fetch the object via the proxy HTTP server at the given location (name and port)
`SOCKS` name:port	Fetch the object via the SOCKS server at the given location (name and port)

The Netscape 2.0 documentation for PAC scripts describes in detail the JavaScript facilities that are available for use in the FindProxyForURL() function.

Proxy caching in Microsoft's Internet Explorer

In theory, the FindProxyForURL() function is invoked every time that an object is about to be fetched by the web browser.

In practice, however, Microsoft's Internet Explorer has what Microsoft terms an "Automatic Proxy Result Cache". Whenever a proxy HTTP server (located using the results of a call to the FindProxyForURL() function or otherwise) is successfully contacted to fetch an object, the APR cache is updated to contain that <hostname,server> pair. If, when about to call the FindProxyForURL() function, Internet Explorer finds the host already listed in the APR cache, it uses the proxy HTTP server listed in the APR cache entry instead of calling the FindProxyForURL() function again for the same host. (The intent of the APR cache is to attempt to reduce the number of times that the JavaScript function has to be run, and thus reduce the overhead of fetching objects.)

Because Internet Explorer's APR cache is indexed by hostname, this means that it is impossible for a PAC script to reliably yield multiple different results according to any part of a URL in addition to the hostname. It is impossible, for example, to provide different proxy configurations according to the path portions of URLs on a single host.

Because Internet Explorer's APR cache caches the proxy HTTP server rather than the full results of the FindProxyForURL() function, this means that fallback from one proxy HTTP server to another does not occur in the event of a problem, even if the FindProxyForURL() function returned a list of several proxy HTTP servers.

Microsoft's KnowledgeBase article #271361 summarizes these problems and describes how to turn Internet Explorer's APR cache off.

Microsoft's Internet Explorer also caches information about "bad" proxy HTTP servers for 30 minutes. This has no direct bearing upon PAC scripts, except that it often causes confusion when people are setting up a proxy HTTP server and creating a PAC script at the same time, and a problem with the proxy HTTP server, causing it to be cached as "bad" for 30 minutes, is misdiagnosed as a problem with the PAC script.

Examples of PAC scripts

This article from Microsoft Internet Developer contains a few examples of PAC scripts, as does the Microsoft Internet Explorer 6 documentation.

John R. LoVerso has created a PAC script that recognizes the URLs of many advertisement publishing services and redirects them, effectively removing banner advertisements by stopping the web browser from even trying to contact the advertisement publishing service.

"Bruce" has created a similar, shorter and less comprehensive, PAC script for the same thing.

Publishing PAC scripts for web browsers to download

Web browsers download PAC scripts once either at program startup or (as is the case with Mozilla) when the web browser component is first invoked; and again when explicity instructed to "re-load" them by the user. (Users thus have to manually re-load PAC scripts when their view of proxy HTTP services changes, such as when they reconnect a machine via a different ISP for example.)

Officially, web browsers obtain PAC scripts via HTTP, requiring that PAC scripts be published by content HTTP servers that are directly reachable by the web browsers. (In practice, Mozilla, for one, is also happy to read PAC scripts directly from a file on the machine or to use other protocols. Microsoft's Internet Explorer versions 5 and later are similarly permissive, although version 4 is strict about the HTTP requirement.)

PAC scripts are not required to have any particular names. However, they are officially required to have the application/x-ns-proxy-autoconfig MIME type. (In practice, Mozilla, for one, is lax about the MIME type, and will have no problems with PAC scripts that have other MIME types such as text/plain. Netscape Navigator is reported to be somewhat stricter about the MIME type, however.)

Since it is easiest with most content HTTP server softwares to configure MIME types to be automatically determined from filename extensions, the common convention is for PAC scripts to have names that end in .pac, and for the content HTTP server software to associate the .pac extension with the application/x-ns-proxy-autoconfig MIME type.

For apache the web server administrator publishes the file with a name that ends in .pac, such as proxy.pac, and configures apache to deduce the application/x-ns-proxy-autoconfig MIME type from the .pac filename extension with
- the following directive in the httpd.conf file:
  AddType application/x-ns-proxy-autoconfig .pac
  or
- the following line in the mime.types file:
  application/x-ns-proxy-autoconfig pac
For Netscape's web server the web server administrator publishes the file with a name that ends in .pac, such as proxy.pac, and configures Netscape server to deduce the application/x-ns-proxy-autoconfig MIME type from the .pac filename extension with the following line in the mime.types file:
application/x-ns-proxy-autoconfig pac
For Microsoft's Internet Information Server the web server administrator publishes the file with a name that ends in .pac, such as proxy.pac, and configures IIS to deduce the application/x-ns-proxy-autoconfig MIME type from the .pac filename extension by adding the association between the two to the "MIME Map" section of the "File Types" section of the "HTTP Headers" section of the web site properties.
For httpd in The Internet Utilities the web server administrator publishes the file with a name that ends in .pac, such as proxy.pac, and configures httpd to deduce the application/x-ns-proxy-autoconfig MIME type from the .pac filename extension by adding the association between the two via the CONTENTTYPE_EXT_PAC environment variable that the server inherits:
setenv CONTENTTYPE_EXT_PAC application/x-ns-proxy-autoconfig

Of course, the .pac filename extension is only a convention. With some content HTTP server softwares the application/x-ns-proxy-autoconfig MIME type can be directly associated with the file, and the name can be anything one chooses.

For httpd in Dan Bernstein's publicfile the web server administrator can simply publish the file using publicfile's mechanism for encoding the MIME type directly into the filename, publishing the file with the name (for example) auto-proxy.application=x-ns-proxy-autoconfig.
For httpd in The Internet Utilities the web server administrator publishes the file with any name that he/she likes, having set the application/x-ns-proxy-autoconfig MIME type of the file via its extended attributes.

The effects of problems with the content HTTP service publishing the PAC script

The content HTTP server publishing a PAC script (assuming that the web browser is using HTTP to obtain it) should be reliable, directly accessible, and continuously available whilst web browsers configured to download the PAC script from it may be operating. This is because web browsers do not have good failure modes if they are unable to download PAC scripts:

If it is unable to download a PAC script, Netscape Navigator and Mozilla display an error message, after a timeout. It will then apply its URL "search path" rules to the URL for the PAC script (attempting, for example, to contact http://www.proxy.com./ if the PAC script is named proxy.pac), which is a security loophole that is highly undesirable (since it allows the owners of http://www.proxy.com./ to supply spoof PAC scripts).
If it is unable to download a PAC script, Microsoft's Internet Explorer displays an error message, after a 60 second timeout, and provides the option of continuing without using a proxy HTTP server auto-configuration.

One example of a situation where such problems will become visible is attempting to use a web browser to view a local HTML document whilst the machine is disconnected from the network.

The `squid` FAQ document's recommendation for dealing with this is wrong

The example of providing redundant content HTTP servers that is given in § 5.4 of the squid FAQ document is wrong. It relies upon the notion of "multiple CNAMEs", that is contrary to the DNS paradigm, that only ever "worked" in one particular DNS server software (ISC's BIND) because of a bug that we were warned many years ago would be fixed and which has now been fixed, and that has never worked with any other DNS server softwares. Client-side aliases in the DNS are one-to-one mappings. To provide one-to-many mappings, use multiple A resource records.

Avoiding promiscuous proxy HTTP service

Best practice is not to provide any sort of promiscuous proxy service to the whole of Internet. This includes proxy HTTP service. Best practice is to have all of one's proxy servers (proxy HTTP servers, proxy DNS servers, and so forth) listening on IP addresses that are not reachable by the rest of Internet.

As such, one might find onesself in the situation where a "roaming" user has left xyr web browser configured to use one's PAC script (served up by one's content HTTP server, of course, and which thus may be publically reachable), which is directing it to use a proxy HTTP server that the user, not being "internally" connected to one's organisation, has no actual access to.

One way to avoid this problem is to employ "split horizon" HTTP service, with one version of the PAC script, containing the real proxy information, being published to "internal" users, and another version of the PAC script, containing a FindProxyForURL() function that always returns "DIRECT", being published to the rest of Internet.

Another way to avoid this problem is to use Web Proxy Auto-Discovery via DHCP, so that "roaming" users are only configured to use one's PAC script when they have actually obtained a lease for an IP address off one's DHCP server and are thus "internally" connected to one's organization.

Configuring web servers manually with the locations of PAC scripts

The simplest way to configure web browsers to download and use PAC scripts is manually. The service provider publishes the PAC script on a suitable content HTTP server with the appropriate MIME type, and informs web browser users of its URL. Web browser users then enter that URL into their web browser configuration settings.

For Netscape Navigator and Mozilla the user enters the URL in the "Automatic proxy configuration URL" entry field of the "Proxies" page of the browser preferences notebook.
For Microsoft's Internet Explorer the user enters the URL as the "Automatic Configuration Script" in the "Automatic Configuration" dialog box in the "Advanced" tab of the browser options notebook.

Web Proxy Auto-Discovery protocol

Microsoft's Internet Explorer supports two mechanisms for automatically configuring it to download PAC scripts, under the banner of the Web Proxy Auto-Discovery (WPAD) protocol, which is described in detail here. With both mechanisms, Internet Explorer automatically determines the URL of the PAC script, without the user having to enter it manually.

WPAD mechanism 1: "DNS based"

The "DNS based" WPAD mechanism simply constructs a series of "well-known" URLs, starting with the machine's full primary domain name sans the initial label and proceeding to progressively shorter suffixes thereof until only a single label is left, as follows:

The schema is http://.
The domain name is wpad.current-suffix.
The path is /wpad.dat.

Sadly, there's a bug in at least one implementation of "DNS based" WPAD. The list to the left is the one that the specification dictates. The algorithm in the specification says to stop when it strips the "example." off the front of "example.com." and finds that it has reached just "com.". The algorithm as implemented in several softwares fails to stop there, as Duane Wessels, who controls http://wpad.com./wpad.dat, readily attests. He publishes his server request statistics for WWW browsers attempting to download that file. (He also publishes letters sent to him that make their senders look very foolish.)

So, for example, if the machine's full primary domain name were workstation.division.country.example.com., the URLs would be

http://wpad.division.country.example.com./wpad.dat
http://wpad.country.example.com./wpad.dat
http://wpad.example.com./wpad.dat

The web browser attempts to download a PAC script from each "well-known" URL in turn until it either succeeds or runs out of URLs.

It is thus necessary for the proxy server administrator to do the following:

Create a DNS mapping from one of the domain names to a content HTTP server.

For example: The administrator creates the appropriate DNS resource record sets to map wpad.example.com. to the IP address 10.0.0.80.
Ensure that the content HTTP server recognizes the host name.

For example: The administrator ensures that the content HTTP server recognises the virtual host name wpad.example.com..

Note: Some versions of web browsers are reported to use the raw IP address of the content HTTP server as the virtual host name, rather than the domain name which was used to look that IP address up.
Publish the PAC script as /wpad.dat in that (virtual) host.

For example: The administrator saves the PAC script as wpad.dat in the root directory of the (virtual) host.

For example: The administrator creates a server-side alias on that (virtual) host from /wpad.dat to the real location of the PAC script. An administrator may choose to do this in order to need only to maintain one actual physical PAC script. How an administrator does this will vary according to what content HTTP server software is being used:
- For apache the administrator uses the following directive in the httpd.conf file:
  Redirect permanent /wpad.dat /proxy.pac
- For httpd in Dan Bernstein's publicfile the administrator sets up a symbolic link:
  ln -s proxy.pac ./0/wpad.dat
Ensure that the wpad.dat script file is published with the correct MIME type. Again, how this is done depends from what content HTTP server software is being used:
- For apache the web server administrator configures apache to deduce the application/x-ns-proxy-autoconfig MIME type from the .dat filename extension with
  - the following directive in the httpd.conf file:
    AddType application/x-ns-proxy-autoconfig .dat
    or
  - the following line in the mime.types file:
    application/x-ns-proxy-autoconfig dat
- For Netscape's web server the web server administrator configures Netscape server to deduce the application/x-ns-proxy-autoconfig MIME type from the .dat filename extension with the following line in the mime.types file:
  application/x-ns-proxy-autoconfig dat
- For Microsoft's Internet Information Server the web server administrator configures IIS to deduce the application/x-ns-proxy-autoconfig MIME type from the .dat filename extension by adding the association between the two to the "MIME Map" section of the "File Types" section of the "HTTP Headers" section of the web site properties.
- For httpd in The Internet Utilities the web server administrator sets the MIME type of /wpad.dat to be application/x-ns-proxy-autoconfig via its extended attributes.
Note that there is no way to set the correct MIME type for /wpad.dat with httpd in Dan Bernstein's publicfile.

WPAD mechanism 2: "DHCP based"

The "DHCP based" WPAD mechanism simply passes the URL of the PAC script as option number 252 in the DHCP lease granted to the machine. The web browser obtains the URL from the lease, and simply downloads the PAC script from there.

It is thus necessary for the proxy server administrator to ensure that the DHCP Server is configured to hand out option 252 in the leases that it grants, containing the URL of the PAC script.

One caveat: Microsoft's Internet Explorer version 6.01 expects the string in option 252 to be NUL-terminated. As such, it unconditionally strips off the final octet of the string before using it. Earlier versions of Microsoft's Internet Explorer do not do this. To satisfy all versions, simply explicitly include a NUL as the last octet of the string.

Security considerations

Web browsers have been a major source of security headaches over the years. Many of these headaches have been attributable to the simple bad design of having a web browser download from somewhere else on the network a program, whose code is written by and whose actions are thus determined by someone else, and then automatically run it on the local machine.

Unfortunately, PAC scripts, being JavaScript programs that are downloaded and automatically run by web browsers (every time that they wish to fetch web objects), employ exactly that bad design. Essentially: By employing a PAC script, a web browser user is running a program, written by a third party and downloaded from a web site on the network, on xyr machine under the aegis of xyr user account, allowing it to do everything that xe can do.

This is a shame. Most PAC scripts are little more than long lists of if statements ("If the URL matches this pattern, return this result."), and are little more than glorified sequential lookup tables. A far better design would have had PAC scripts be only data, not executable code, comprising just the lookup table and not the code to search it. (The access control rules database in UCSPI-TCP are a good example of a simple ruleset database design that could have been followed.) The possibilities for malicious use would have been far fewer.

Given the combination of this bad design and the quirky "search path" behaviour of Netscape Navigator, when it fails to locate a PAC script at the URL that it is given, and the automated search behaviour of "DNS based" Web Proxy Auto-Discovery, allowing anyone who can set up a content HTTP server for a suitable hostname to give Microsoft's Internet Explorer an arbitrary JavaScript program to run (and which will be run even if JavaScript is otherwise turned off); PAC scripts are a security nightmare.

Microsoft keeps trying to fix the security problems with "DNS based" WPAD and missing. (One of its purported fixes actually makes the problem worse.) In part, this is because it keeps fixing the wrong thing. Microsoft, the problem is in Internet Explorer, and that is what needs fixing. Stop fixing the wrong components. This isn't a DNS Client or a DNS Server flaw. It's a web browser flaw. Fix your web browser.

Some security advice:

Only configure web browsers to use PAC scripts published by entities that you trust.

For example: You have a contractual agreement with your own ISP. You have no contractual agreement with some randomly picked ISP from the rest of Internet. Both may publish malicious PAC scripts that do you harm if you configure your web browser to automatically download and run them. But you have far more comeback against the former than you have against the latter.
Don't enable "DHCP based" Web Proxy Auto-Discovery unless you trust all of the DHCP servers on the network you are attaching to.

For example: An attacker can always set up a bogus DHCP server, that hands out, in its leases, the URL of a malicious PAC script.
These are a selection of the people that you probably didn't know until now you have to trust when you use "DNS based" Web Proxy Auto-Discovery (data current as of November 2009):
- If you are in the United Kingdom …
  - … and are a company: Est. Adhemar Bebiano, of Rio de Janeiro, Brazil, who controls http://wpad.co.uk./wpad.dat
  - … and are an organization: Mr Tim Slack, of Stanton Hill, Nottinghamshire, who controls http://wpad.org.uk./wpad.dat
- If you are in Mexico …
  - … and are a company: Gabriel Almagar Guerza, of San Javier, Nueva Leon, who controls http://wpad.com.mx./wpad.dat
- If you are in New Zealand …
  - … and are a company: Beau Butler, of Auckland, who controls http://wpad.co.nz./wpad.dat
  - … and are an organization: Beau Butler, of Auckland, who controls http://wpad.org.nz./wpad.dat
- If you are in Brazil …
  - … and are a company: Marcos Baptista Moraes Machado, who controls http://wpad.com.br./wpad.dat
It should be mentioned that you also have to trust whomever these people happen to pay for their WWW hosting and their DNS hosting.

And if you are unlucky enough to be using softwares that, as pointed out earlier, don't even implement WPAD correctly even as specified, you also get to trust whoever owns http://wpad.TLD./wpad.dat for your enclosing top-level domain. Bad luck if your domain name is a subdomain of one of the top-level domains where someone is already publishing a maleficent PAC script that diverts WWW browsers, as was the case in November 2008 for 19 TLDs and in May 2009 for 42 TLDs.
Don't enable "DNS based" Web Proxy Auto-Discovery unless you trust all of the content HTTP servers that could possibly be contacted.

The WPAD "DNS based" search algorithm has the usual flaw, shared by many mechanisms that attempt to stop "at the administrative boundary", of assuming that that boundary is universally in the same place when it patently isn't, by being defined to stop when it reaches a single-label domain name. A better algorithm (sadly, not deployed by any web browsers) would at the very least stop when it hits a domain name suffix with a non-empty SOA resource record set, implying a "zone" apex.

In addition, as if that weren't bad enough, the WPAD "DNS based" search algorithm is vulnerable to DNS Client search path effects as well. Without a trailing dot, domain names in URLs are not fully qualified, and WWW browsers generally don't put the trailing dots in URLs when they perform WPAD "DNS based" searches. So whilst the WWW browser is performing its own search algorithm, the DNS Client is performing a second search algorithm under the covers.

The upshot of this is that whilst the WWW browser may look for (observe the lack of trailing dots):
1. http://wpad.division.example.co.uk/wpad.dat
2. http://wpad.example.co.uk/wpad.dat
3. http://wpad.co.uk/wpad.dat
what actually happens, if the DNS Client has a search path with the domain suffix example.com., is that the actual search path used, with the combination of the WPAD search algorithm and the DNS Client search algorithm, is:
1. http://wpad.division.example.co.uk.example.com./wpad.dat
2. http://wpad.division.example.co.uk.com./wpad.dat
3. http://wpad.division.example.co.uk./wpad.dat
4. http://wpad.example.co.uk.example.com./wpad.dat
5. http://wpad.example.co.uk.com./wpad.dat
6. http://wpad.example.co.uk./wpad.dat
7. http://wpad.co.uk.example.com./wpad.dat
8. http://wpad.co.uk.com./wpad.dat
9. http://wpad.co.uk./wpad.dat
http://wpad.division.example.co.uk.com./wpad.dat, http://wpad.example.co.uk.com./wpad.dat, and http://wpad.co.uk.com./wpad.dat, are controlled by CentralNic Ltd of London.

Sadly, Microsoft's idea of how to fix this was a Half-Baked Idea. Instead of fixing the lookup algorithm in the WWW browser, where the problem actually lies, it changed its DNS server and its DNS client, neither of which are the locus of the problem:
- In December 2007, it published temporary and service fixes that prevented its DNS server from publishing information about domain names beginning with the label "wpad.". The DNS server gained a "Global Block List" which caused it to provide negative answers for domain names beginning with several labels, of which "wpad." is one.
- It also issued a security advisory telling people how to configure Microsoft's DNS client so that it didn't do so much searching using its own search paths under the covers.
Neither fixes the problem. The irony is that the former actually makes the situation worse. The DNS client adjustment doesn't stop http://wpad.co.uk./wpad.dat from being downloaded and used, because that name results from the search path in Internet Explorer, not from the search path in the DNS client. And the DNS server Global Block List, that makes the DNS server report "wpad.example.co.uk." as not existing, forces WWW browsers to always fall back to wpad.co.uk.. The fix that stops someone within an organization from using WPAD to subvert the organization's WWW browsers instead makes all of those WWW browsers always go outside of the organization, to those nice people in Brazil.

The correct fix is of course to fix Internet Explorer, which is where the flaw actually lies:
- Internet Explorer should always use fully qualified domain names, properly terminated with dots, to locate the content HTTP servers, preventing the DNS client from doing any behind-the-scenes searching of its own at all, however it is configured. The WPAD algorithm is already using a search path. A second one in the DNS client is not needed, and using fully-qualified domain names is the correct, obvious, well-known, and documented, means of achieving this.
  
  So Internet Explorer should be searching, effectively (The fully-qualified domain names should be passed to DNS lookup, even if they aren't used in the HTTP transaction itself.):
  1. http://wpad.division.example.co.uk./wpad.dat
  2. http://wpad.example.co.uk./wpad.dat
  3. http://wpad.co.uk./wpad.dat
- Internet Explorer should perform SOA DNS lookups on each suffix that it appends, and stop after it has reached a domain name that has one.
  
  So in the above example, since division.example.co.uk. doesn't have a SOA resource record but example.co.uk. does, the search list becomes:
  1. http://wpad.division.example.co.uk./wpad.dat
  2. http://wpad.example.co.uk./wpad.dat
Microsoft would do well to learn from Stuart Cheshire, pioneer of Zero Configuration, which also tries to automatically discover things given just a domain name to start with, much like WPAD, and which uses fully-qualified domain names to avoid these very problems, as he explains.

© Copyright 2004,2009 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.