You've come to this page because you've asked questions similar to the following:
How does automatic proxy HTTP server configuration in web browsers work ? How can I support it ?
This is the Frequently Given Answer to such questions.
Automatic proxy HTTP server configuration involves three things:
Most web browsers may be configured manually, with a single, fixed, proxy HTTP server. However, Netscape Navigator version 2.0 and later and Microsoft's Internet Explorer version 3.0a and later may also instead be configured to use Proxy Auto-Configuration (PAC) files.
PAC are files that contain the text of a single JavaScript function,
FindProxyForURL()
. In theory, every time that a web object
is about to be fetched, the JavaScript function is invoked (by the web
browser) with two arguments: the URL of the object and the hostname
derived from that URL. The result of the function is a string comprising
a semi-colon-separated sequence of one or more instructions that determine
whence the web browser is to fetch the object from:
Instruction | Meaning |
---|---|
DIRECT |
Fetch the object directly from the content HTTP server denoted by its URL |
PROXY name:port |
Fetch the object via the proxy HTTP server at the given location (name and port) |
SOCKS name:port |
Fetch the object via the SOCKS server at the given location (name and port) |
The
Netscape 2.0 documentation for PAC scripts
describes in detail the JavaScript facilities that are available for use
in the FindProxyForURL()
function.
In theory, the FindProxyForURL()
function is invoked every
time that an object is about to be fetched by the web browser.
In practice, however, Microsoft's Internet Explorer has what Microsoft
terms an "Automatic Proxy Result Cache". Whenever a proxy HTTP server
(located using the results of a call to the FindProxyForURL()
function or otherwise) is successfully contacted to fetch an object, the
APR cache is updated to contain that <hostname,server>
pair. If, when about to call the FindProxyForURL()
function,
Internet Explorer finds the host already listed in the APR cache, it uses
the proxy HTTP server listed in the APR cache entry instead of calling the
FindProxyForURL()
function again for the same host.
(The intent of the APR cache is to attempt to reduce the number of times
that the JavaScript function has to be run, and thus reduce the overhead
of fetching objects.)
Because Internet Explorer's APR cache is indexed by hostname, this means that it is impossible for a PAC script to reliably yield multiple different results according to any part of a URL in addition to the hostname. It is impossible, for example, to provide different proxy configurations according to the path portions of URLs on a single host.
Because Internet Explorer's APR cache caches the proxy HTTP server rather
than the full results of the FindProxyForURL()
function, this
means that fallback from one proxy HTTP server to another does not occur
in the event of a problem, even if the FindProxyForURL()
function returned a list of several proxy HTTP servers.
Microsoft's KnowledgeBase article #271361 summarizes these problems and describes how to turn Internet Explorer's APR cache off.
Microsoft's Internet Explorer also caches information about "bad" proxy HTTP servers for 30 minutes. This has no direct bearing upon PAC scripts, except that it often causes confusion when people are setting up a proxy HTTP server and creating a PAC script at the same time, and a problem with the proxy HTTP server, causing it to be cached as "bad" for 30 minutes, is misdiagnosed as a problem with the PAC script.
This article from Microsoft Internet Developer contains a few examples of PAC scripts, as does the Microsoft Internet Explorer 6 documentation.
John R. LoVerso has created a PAC script that recognizes the URLs of many advertisement publishing services and redirects them, effectively removing banner advertisements by stopping the web browser from even trying to contact the advertisement publishing service.
"Bruce" has created a similar, shorter and less comprehensive, PAC script for the same thing.
Web browsers download PAC scripts once either at program startup or (as is the case with Mozilla) when the web browser component is first invoked; and again when explicity instructed to "re-load" them by the user. (Users thus have to manually re-load PAC scripts when their view of proxy HTTP services changes, such as when they reconnect a machine via a different ISP for example.)
Officially, web browsers obtain PAC scripts via HTTP, requiring that PAC scripts be published by content HTTP servers that are directly reachable by the web browsers. (In practice, Mozilla, for one, is also happy to read PAC scripts directly from a file on the machine or to use other protocols. Microsoft's Internet Explorer versions 5 and later are similarly permissive, although version 4 is strict about the HTTP requirement.)
PAC scripts are not required to have any particular names. However, they
are officially required to have the
application/x-ns-proxy-autoconfig
MIME type. (In practice,
Mozilla, for one, is lax about the MIME type, and will have no problems with
PAC scripts that have other MIME types such as text/plain
. Netscape
Navigator is reported to be somewhat stricter about the MIME type, however.)
Since it is easiest with most content HTTP server softwares to configure MIME
types to be automatically determined from filename extensions, the common
convention is for PAC scripts to have names that end in .pac, and
for the content HTTP server software to associate the .pac
extension with the application/x-ns-proxy-autoconfig
MIME type.
For apache the web server administrator publishes the
file with a name that ends in .pac, such as
proxy.pac, and configures apache to deduce the
application/x-ns-proxy-autoconfig
MIME type from the
.pac filename extension with
AddType application/x-ns-proxy-autoconfig .pac
or
application/x-ns-proxy-autoconfig pac
For Netscape's web server the web server
administrator publishes the file with a name that ends in .pac,
such as proxy.pac, and configures Netscape server to deduce the
application/x-ns-proxy-autoconfig
MIME type from the
.pac filename extension with the following line in the
mime.types file:
application/x-ns-proxy-autoconfig pac
For Microsoft's Internet Information Server the web
server administrator publishes the file with a name that ends in
.pac, such as proxy.pac, and configures IIS to deduce
the application/x-ns-proxy-autoconfig
MIME type from the
.pac filename extension by adding the association between the two
to
the "MIME Map" section of the "File Types"
section of the "HTTP Headers" section of the web site properties.
For httpd in
The Internet Utilities the web
server administrator publishes the file with a name that ends in
.pac, such as proxy.pac, and configures
httpd to deduce the application/x-ns-proxy-autoconfig
MIME type from the .pac filename extension by adding the
association between the two via the CONTENTTYPE_EXT_PAC
environment variable that the server inherits:
setenv CONTENTTYPE_EXT_PAC application/x-ns-proxy-autoconfig
Of course, the .pac filename extension is only a convention. With
some content HTTP server softwares the
application/x-ns-proxy-autoconfig
MIME type can be directly
associated with the file, and the name can be anything one chooses.
For httpd in Dan Bernstein's publicfile the web server administrator can simply publish the file using publicfile's mechanism for encoding the MIME type directly into the filename, publishing the file with the name (for example) auto-proxy.application=x-ns-proxy-autoconfig.
For httpd in
The Internet Utilities the web
server administrator publishes the file with any name that he/she likes, having
set the application/x-ns-proxy-autoconfig
MIME type of the file
via its extended attributes.
The content HTTP server publishing a PAC script (assuming that the web browser is using HTTP to obtain it) should be reliable, directly accessible, and continuously available whilst web browsers configured to download the PAC script from it may be operating. This is because web browsers do not have good failure modes if they are unable to download PAC scripts:
If it is unable to download a PAC script, Netscape Navigator and Mozilla display
an error message, after a timeout. It will then apply its URL "search path"
rules to the URL for the PAC script (attempting, for example, to contact
http://www.proxy.com./
if the PAC script is named
proxy.pac
), which is a security loophole that is highly
undesirable (since it allows the owners of http://www.proxy.com./
to supply spoof PAC scripts).
If it is unable to download a PAC script, Microsoft's Internet Explorer displays an error message, after a 60 second timeout, and provides the option of continuing without using a proxy HTTP server auto-configuration.
One example of a situation where such problems will become visible is attempting to use a web browser to view a local HTML document whilst the machine is disconnected from the network.
The example of providing redundant content HTTP servers that is given in
§ 5.4 of the squid FAQ document is wrong.
It relies upon the notion of "multiple CNAMEs",
that
is contrary to the DNS paradigm,
that only ever "worked" in one particular DNS server software (ISC's BIND)
because of a bug
that
we were warned many years ago would be fixed
and which has now been fixed,
and that
has never worked with any other DNS server softwares.
Client-side aliases in the DNS are one-to-one mappings. To provide one-to-many
mappings, use multiple A
resource records.
Best practice is not to provide any sort of promiscuous proxy service to the whole of Internet. This includes proxy HTTP service. Best practice is to have all of one's proxy servers (proxy HTTP servers, proxy DNS servers, and so forth) listening on IP addresses that are not reachable by the rest of Internet.
As such, one might find onesself in the situation where a "roaming" user has left xyr web browser configured to use one's PAC script (served up by one's content HTTP server, of course, and which thus may be publically reachable), which is directing it to use a proxy HTTP server that the user, not being "internally" connected to one's organisation, has no actual access to.
One way to avoid this problem is to employ "split horizon" HTTP service, with
one version of the PAC script, containing the real proxy information, being
published to "internal" users, and another version of the PAC script, containing
a FindProxyForURL()
function that always returns
"DIRECT"
, being published to the rest of Internet.
Another way to avoid this problem is to use Web Proxy Auto-Discovery via DHCP, so that "roaming" users are only configured to use one's PAC script when they have actually obtained a lease for an IP address off one's DHCP server and are thus "internally" connected to one's organization.
The simplest way to configure web browsers to download and use PAC scripts is manually. The service provider publishes the PAC script on a suitable content HTTP server with the appropriate MIME type, and informs web browser users of its URL. Web browser users then enter that URL into their web browser configuration settings.
For Netscape Navigator and Mozilla the user enters the URL in the "Automatic proxy configuration URL" entry field of the "Proxies" page of the browser preferences notebook.
For Microsoft's Internet Explorer the user enters the URL as the "Automatic Configuration Script" in the "Automatic Configuration" dialog box in the "Advanced" tab of the browser options notebook.
Microsoft's Internet Explorer supports two mechanisms for automatically configuring it to download PAC scripts, under the banner of the Web Proxy Auto-Discovery (WPAD) protocol, which is described in detail here. With both mechanisms, Internet Explorer automatically determines the URL of the PAC script, without the user having to enter it manually.
The "DNS based" WPAD mechanism simply constructs a series of "well-known" URLs, starting with the machine's full primary domain name sans the initial label and proceeding to progressively shorter suffixes thereof until only a single label is left, as follows:
So, for example, if the machine's full primary domain name were workstation.division.country.example.com., the URLs would be
The web browser attempts to download a PAC script from each "well-known" URL in turn until it either succeeds or runs out of URLs.
It is thus necessary for the proxy server administrator to do the following:
Create a DNS mapping from one of the domain names to a content HTTP server.
For example: The administrator creates the appropriate DNS resource record sets to map wpad.example.com. to the IP address 10.0.0.80.
Ensure that the content HTTP server recognizes the host name.
For example: The administrator ensures that the content HTTP server recognises the virtual host name wpad.example.com..
Note: Some versions of web browsers are reported to use the raw IP address of the content HTTP server as the virtual host name, rather than the domain name which was used to look that IP address up.
Publish the PAC script as /wpad.dat in that (virtual) host.
For example: The administrator saves the PAC script as wpad.dat in the root directory of the (virtual) host.
For example: The administrator creates a server-side alias on that (virtual) host from /wpad.dat to the real location of the PAC script. An administrator may choose to do this in order to need only to maintain one actual physical PAC script. How an administrator does this will vary according to what content HTTP server software is being used:
For apache the administrator uses the following directive in the httpd.conf file:
Redirect permanent /wpad.dat /proxy.pac
For httpd in Dan Bernstein's publicfile the administrator sets up a symbolic link:
ln -s proxy.pac ./0/wpad.dat
Ensure that the wpad.dat script file is published with the correct MIME type. Again, how this is done depends from what content HTTP server software is being used:
For apache the web server administrator configures
apache to deduce the application/x-ns-proxy-autoconfig
MIME type
from the .dat filename extension with
AddType application/x-ns-proxy-autoconfig .dat
or
application/x-ns-proxy-autoconfig dat
For Netscape's web server the web server
administrator configures Netscape server to deduce the
application/x-ns-proxy-autoconfig
MIME type from the
.dat filename extension with the following line in the
mime.types file:
application/x-ns-proxy-autoconfig dat
For Microsoft's Internet Information Server the web
server administrator configures IIS to deduce the
application/x-ns-proxy-autoconfig
MIME type from the
.dat filename extension by adding the association between the two
to
the "MIME Map" section of the "File Types"
section of the "HTTP Headers" section of the web site properties.
For httpd in
The Internet Utilities the web
server administrator sets the MIME type of /wpad.dat to be
application/x-ns-proxy-autoconfig
via its extended attributes.
Note that there is no way to set the correct MIME type for /wpad.dat with httpd in Dan Bernstein's publicfile.
The "DHCP based" WPAD mechanism simply passes the URL of the PAC script as option number 252 in the DHCP lease granted to the machine. The web browser obtains the URL from the lease, and simply downloads the PAC script from there.
It is thus necessary for the proxy server administrator to ensure that the DHCP Server is configured to hand out option 252 in the leases that it grants, containing the URL of the PAC script.
One caveat: Microsoft's Internet Explorer version 6.01 expects the string in option 252 to be NUL-terminated. As such, it unconditionally strips off the final octet of the string before using it. Earlier versions of Microsoft's Internet Explorer do not do this. To satisfy all versions, simply explicitly include a NUL as the last octet of the string.
Web browsers have been a major source of security headaches over the years. Many of these headaches have been attributable to the simple bad design of having a web browser download from somewhere else on the network a program, whose code is written by and whose actions are thus determined by someone else, and then automatically run it on the local machine.
Unfortunately, PAC scripts, being JavaScript programs that are downloaded and automatically run by web browsers (every time that they wish to fetch web objects), employ exactly that bad design. Essentially: By employing a PAC script, a web browser user is running a program, written by a third party and downloaded from a web site on the network, on xyr machine under the aegis of xyr user account, allowing it to do everything that xe can do.
This is a shame. Most PAC scripts are little more than long lists of
if
statements ("If the URL matches this pattern, return this
result."), and are little more than glorified sequential lookup tables. A far
better design would have had PAC scripts be only data, not executable
code, comprising just the lookup table and not the code to search it.
(The access control rules
database in UCSPI-TCP are a good example of a simple ruleset database
design that could have been followed.) The possibilities for malicious use
would have been far fewer.
Given the combination of this bad design and the quirky "search path" behaviour of Netscape Navigator, when it fails to locate a PAC script at the URL that it is given, and the automated search behaviour of "DNS based" Web Proxy Auto-Discovery, allowing anyone who can set up a content HTTP server for a suitable hostname to give Microsoft's Internet Explorer an arbitrary JavaScript program to run (and which will be run even if JavaScript is otherwise turned off); PAC scripts are a security nightmare.
Microsoft keeps trying to fix the security problems with "DNS based" WPAD and missing. (One of its purported fixes actually makes the problem worse.) In part, this is because it keeps fixing the wrong thing. Microsoft, the problem is in Internet Explorer, and that is what needs fixing. Stop fixing the wrong components. This isn't a DNS Client or a DNS Server flaw. It's a web browser flaw. Fix your web browser.
Some security advice:
Only configure web browsers to use PAC scripts published by entities that you trust.
For example: You have a contractual agreement with your own ISP. You have no contractual agreement with some randomly picked ISP from the rest of Internet. Both may publish malicious PAC scripts that do you harm if you configure your web browser to automatically download and run them. But you have far more comeback against the former than you have against the latter.
Don't enable "DHCP based" Web Proxy Auto-Discovery unless you trust all of the DHCP servers on the network you are attaching to.
For example: An attacker can always set up a bogus DHCP server, that hands out, in its leases, the URL of a malicious PAC script.
Don't enable "DNS based" Web Proxy Auto-Discovery unless you trust all of the content HTTP servers that could possibly be contacted.
The WPAD "DNS based" search algorithm has the usual flaw, shared by many mechanisms that attempt to stop "at the administrative boundary", of assuming that that boundary is universally in the same place when it patently isn't, by being defined to stop when it reaches a single-label domain name. A better algorithm (sadly, not deployed by any web browsers) would at the very least stop when it hits a domain name suffix with a non-empty SOA resource record set, implying a "zone" apex.
In addition, as if that weren't bad enough, the WPAD "DNS based" search algorithm is vulnerable to DNS Client search path effects as well. Without a trailing dot, domain names in URLs are not fully qualified, and WWW browsers generally don't put the trailing dots in URLs when they perform WPAD "DNS based" searches. So whilst the WWW browser is performing its own search algorithm, the DNS Client is performing a second search algorithm under the covers.
The upshot of this is that whilst the WWW browser may look for (observe the lack of trailing dots):
what actually happens, if the DNS Client has a search path with the domain suffix example.com., is that the actual search path used, with the combination of the WPAD search algorithm and the DNS Client search algorithm, is:
http://wpad.division.example.co.uk.com./wpad.dat, http://wpad.example.co.uk.com./wpad.dat, and http://wpad.co.uk.com./wpad.dat, are controlled by CentralNic Ltd of London.
Sadly, Microsoft's idea of how to fix this was a Half-Baked Idea. Instead of fixing the lookup algorithm in the WWW browser, where the problem actually lies, it changed its DNS server and its DNS client, neither of which are the locus of the problem:
In December 2007, it published temporary and service fixes that prevented its DNS server from publishing information about domain names beginning with the label "wpad.". The DNS server gained a "Global Block List" which caused it to provide negative answers for domain names beginning with several labels, of which "wpad." is one.
It also issued a security advisory telling people how to configure Microsoft's DNS client so that it didn't do so much searching using its own search paths under the covers.
Neither fixes the problem. The irony is that the former actually makes
the situation worse. The DNS client adjustment doesn't stop
http://wpad.co.uk./wpad.dat from being downloaded and used,
because that name results from the search path in Internet Explorer,
not from the search path in the DNS client. And the DNS server
Global Block List, that makes the DNS server report
"wpad.example.co.uk.
" as not existing, forces WWW browsers to
always fall back to wpad.co.uk.
. The fix that stops
someone within an organization from using WPAD to subvert the
organization's WWW browsers instead makes all of those WWW browsers
always go outside of the organization, to those nice people in
Brazil.
The correct fix is of course to fix Internet Explorer, which is where the flaw actually lies:
Internet Explorer should always use fully qualified domain names, properly terminated with dots, to locate the content HTTP servers, preventing the DNS client from doing any behind-the-scenes searching of its own at all, however it is configured. The WPAD algorithm is already using a search path. A second one in the DNS client is not needed, and using fully-qualified domain names is the correct, obvious, well-known, and documented, means of achieving this.
So Internet Explorer should be searching, effectively (The fully-qualified domain names should be passed to DNS lookup, even if they aren't used in the HTTP transaction itself.):
Internet Explorer should perform SOA
DNS lookups on
each suffix that it appends, and stop after it has reached a
domain name that has one.
So in the above example, since division.example.co.uk.
doesn't have a SOA
resource record but
example.co.uk. does, the search list becomes:
Microsoft would do well to learn from Stuart Cheshire, pioneer of Zero Configuration, which also tries to automatically discover things given just a domain name to start with, much like WPAD, and which uses fully-qualified domain names to avoid these very problems, as he explains.