Open a network object denoted by a URL for reading. If the URL does not
have a scheme identifier, or if it has file: as its scheme
identifier, this opens a local file (without universal newlines);
otherwise it opens a socket to a server somewhere on the network. If the
connection cannot be made the IOError exception is raised. If all
went well, a file-like object is returned. This supports the following
methods: read(), readline(), readlines(), fileno(),
close(), info(), getcode() and geturl(). It also
has proper support for the iterator protocol. One caveat: the
read() method, if the size argument is omitted or negative, may not
read until the end of the data stream; there is no good way to determine
that the entire stream from a socket has been read in the general case.
Except for the info(), getcode() and geturl() methods,
these methods have the same interface as for file objects — see section
File Objects in this manual. (It is not a built-in file object,
however, so it can’t be used at those few places where a true built-in file
object is required.)
The info() method returns an instance of the class
mimetools.Message containing meta-information associated with the
URL. When the method is HTTP, these headers are those returned by the server
at the head of the retrieved HTML page (including Content-Length and
Content-Type). When the method is FTP, a Content-Length header will be
present if (as is now usual) the server passed back a file length in response
to the FTP retrieval request. A Content-Type header will be present if the
MIME type can be guessed. When the method is local-file, returned headers
will include a Date representing the file’s last-modified time, a
Content-Length giving file size, and a Content-Type containing a guess at the
file’s type. See also the description of the mimetools module.
The geturl() method returns the real URL of the page. In some cases, the
HTTP server redirects a client to another URL. The urlopen() function
handles this transparently, but in some cases the caller needs to know which URL
the client was redirected to. The geturl() method can be used to get at
this redirected URL.
The getcode() method returns the HTTP status code that was sent with the
response, or None if the URL is no HTTP URL.
If the url uses the http: scheme identifier, the optional data
argument may be given to specify a POST request (normally the request type
is GET). The data argument must be in standard
application/x-www-form-urlencoded format; see the urlencode()
function below.
The urlopen() function works transparently with proxies which do not
require authentication. In a Unix or Windows environment, set the
http_proxy, or ftp_proxy environment variables to a URL that
identifies the proxy server before starting the Python interpreter. For example
(the '%' is the command prompt):
% http_proxy="http://www.someproxy.com:3128"
% export http_proxy
% python
...
The no_proxy environment variable can be used to specify hosts which
shouldn’t be reached via proxy; if set, it should be a comma-separated list
of hostname suffixes, optionally with :port appended, for example
cern.ch,ncsa.uiuc.edu,some.host:8080.
In a Windows environment, if no proxy environment variables are set, proxy
settings are obtained from the registry’s Internet Settings section.
In a Mac OS X environment, urlopen() will retrieve proxy information
from the OS X System Configuration Framework, which can be managed with
Network System Preferences panel.
Alternatively, the optional proxies argument may be used to explicitly specify
proxies. It must be a dictionary mapping scheme names to proxy URLs, where an
empty dictionary causes no proxies to be used, and None (the default value)
causes environmental proxy settings to be used as discussed above. For
example:
# Use http://www.someproxy.com:3128 for HTTP proxying
proxies = {'http': 'http://www.someproxy.com:3128'}
filehandle = urllib.urlopen(some_url, proxies=proxies)
# Don't use any proxies
filehandle = urllib.urlopen(some_url, proxies={})
# Use proxies from environment - both versions are equivalent
filehandle = urllib.urlopen(some_url, proxies=None)
filehandle = urllib.urlopen(some_url)
Proxies which require authentication for use are not currently supported;
this is considered an implementation limitation.
The context parameter may be set to a ssl.SSLContext instance to
configure the SSL settings that are used if urlopen() makes a HTTPS
connection.
Changed in version 2.3: Added the proxies support.
Changed in version 2.6: Added getcode() to returned object and support for the
no_proxy environment variable.
Changed in version 2.7.9: The context parameter was added. All the neccessary certificate and hostname
checks are done by default.