Python Programming/Internet

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search
Previous: Databases Index Next: Networks

The urllib module which is bundled with python can be used for web interaction. This module provides a file-like interface for web urls.

[edit] Getting page text as a string

An example of reading the contents of a webpage

import urllib
pageText = urllib.urlopen("http://www.spam.org/eggs.html").read()
print pageText

Get and post methods can be used, too.

import urllib
params = urllib.urlencode({"plato":1, "socrates":10, "sophokles":4, "arkhimedes":11})
 
# Using GET method
pageText = urllib.urlopen("http://international-philosophy.com/greece?%s" % params).read()
print pageText
 
# Using POST method
pageText = urllib.urlopen("http://international-philosophy.com/greece", params).read()
print pageText

[edit] Downloading files

To save the content of a page on the internet directly to a file, you can read() it and save it as a string to a file object, or you can use the urlretrieve function:

import urllib
urllib.urlretrieve("http://upload.wikimedia.org/wikibooks/en/9/91/Python_Programming.pdf", "pythonbook.pdf")

This will download the file from here and save it to a file "pythonbook.pdf" on your hard drive.

[edit] Other functions

The urllib module includes other functions that may be helpful when writing programs that use the internet:

>>> plain_text = "This isn't suitable for putting in a URL"
>>> print urllib.quote(plain_text)
This%20isn%27t%20suitable%20for%20putting%20in%20a%20URL
>>> print urllib.quote_plus(plain_text)
This+isn%27t+suitable+for+putting+in+a+URL

The urlencode function, described above converts a dictionary of key-value pairs into a query string to pass to a URL, the quote and quote_plus functions encode normal strings. The quote_plus function uses plus signs for spaces, for use in submitting data for form fields. The unquote and unquote_plus functions do the reverse, converting urlencoded text to plain text.

Previous: Databases Index Next: Networks