Recently I found out how to write a script in python that accesses a webpage that requires basic http authentication. You know, when you try to go to a webpage with your browser and a username and password screen pops up. Well to download one of these webpages with a script you have to provide the username and password in the http headers of the request.
Here's how I did it in Python:
import urllib2
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password("Realm here", "host", "username", "password")
opener = urllib2.build_opener(auth_handler)
handle = opener.open("URL you want to download")
file = handle.read()
This is very similar to the authentication example using urllib2
given in the Python Library Reference.
Here's
code in PHP that seems to do this.
Here's a description of how urllib2 works. This is taken out of the urllib2.
_doc_ variable:
urllib2
An extensible library for opening URLs using a variety of protocols.
The simplest way to use this module is to call the urlopen function, which accepts a string containing a URL or a Request object (described below). It opens the URL and returns the results as file-like object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do all the actual work. Each Handler implements a particular protocol or option. The OpenerDirector is a composite object that invokes the Handlers needed to open the requested URL. For example, the HTTPHandler performs HTTP GET and POST requests and deals with non-error returns. The HTTPRedirectHandler automatically deals with HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler deals with digest authentication.
urlopen(url, data=None) -- basic usage is that same as original urllib. pass the url and optionally data to post to an HTTP URL, and get a file-like object back. One difference is that you can also pass a Request instance instead of URL. Raises a URLError (subclass of IOError); for HTTP errors, raises an HTTPError, which can also be treated as a valid response.
build_opener -- function that creates a new OpenerDirector instance. will install the default handlers. accepts one or more Handlers as arguments, either instances or Handler classes that it will instantiate. if one of the argument is a subclass of the default handler, the argument will be installed instead of the default.
install_opener -- installs a new opener as the default opener.
objects of interest:
OpenerDirector --
Request -- an object that encapsulates the state of a request. the state can be a simple as the URL. it can also include extra HTTP headers, e.g. a User-Agent.
BaseHandler --
exceptions:
URLError-- a subclass of IOError, individual protocols have their own specific subclass
HTTPError-- also a valid HTTP response, so you can treat an HTTP error as an exceptional event or valid response
internals:
BaseHandler and parent
_call_chain conventions
Example usage:
import urllib2
# set up authentication info
authinfo = urllib2.HTTPBasicAuthHandler()
authinfo.add_password('realm', 'host', 'username', 'password')
proxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
# install it
urllib2.install_opener(opener)
f = urllib2.urlopen('http://www.python.org/')