Accepting the Q factor
http://www.w3.org/2000/09/xmldsig# is the namespace for the schema for XML Signatures - one of the many, many schemas you end up accessing if you do XML Schema based completion for WS-SecurityPolicy (2005) (part of our WSDL policy editor in the Eclipse plugins for Sonic ESB Workbench). Why is this one special? For the following reason -
If you access http://www.w3.org/2000/09/xmldsig# from Mozilla Firefox you will get back the schema at http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd (through an HTTP re-direct response code 303) but if you use Java's java.net.URL.openConnection() (basically through HttpURLConnection) you get an HTML page and not the Schema (XML) which our Schema loader does not particularly appreciate.
It took a while for me to understand why the same URL is behaving differently. Using Eclipse 's TCP/IP Monitor I captured the headers sent by my code and used LiveHTTPHeaders for Firefox.
This is what Firefox sends -
GET /2000/09/xmldsig HTTP/1.1
Host: www.w3.org
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
and this is what it receives -
HTTP/1.x 303 See Other
Date: Mon, 26 Feb 2007 11:50:20 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.5
WWW-Authenticate: Basic realm="W3CACL"
Location: http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd
Keep-Alive: timeout=2, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
But when HttpURLConnection sends the request this is what it sends -
GET /2000/09/xmldsig HTTP/1.1
User-Agent: Java/1.4.2_12
Host: www.w3.org
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
and this is what it receives -
HTTP/1.1 303 See Other
Date: Mon, 26 Feb 2007 11:56:47 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.5
WWW-Authenticate: Basic realm="W3CACL"
Location: http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/Overview.html
Keep-Alive: timeout=2, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
Notice the difference in the Location header
Firefox : Location: http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd
Java URLConnection : Location: http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/Overview.html
The problem turns out to be in the Accept header set by Java URLConnection by default (or I guess the Sun HttpURLConnection implementation).
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Note no text/xml as in Firefox. Although there is a */* its 'q' value is lower than text/html and the nice server at www.w3.org uses this to change its output to suit what is best accepted by the user-agent. I guess since they are the standards organization they should do this :-). Fixing the accept header fixes this behaviour. Is there something I am missing in my understanding of how URLConnection works?