Extract list of URLs in a web page : Web Page « Network « Python

Home
Python
1.2D
2.Application
3.Buildin Function
4.Class
5.Data Structure
6.Data Type
7.Database
8.Development
9.Dictionary
10.Event
11.Exception
12.File
13.Function
14.GUI Pmw
15.GUI Tk
16.Language Basics
17.List
18.Math
19.Network
20.String
21.System
22.Thread
23.Tuple
24.Utility
25.XML
Python » Network » Web PageScreenshots 
Extract list of URLs in a web page
Extract list of URLs in a web page


"""Extract list of URLs in a web page

This program is part of "Dive Into Python", a free Python book for
experienced programmers.  Visit http://diveintopython.org/ for the
latest version.
"""

__author__ = "Mark Pilgrim ([email protected])"
__version__ = "$Revision: 1.2 $"
__date__ = "$Date: 2004/05/05 21:57:19 $"
__copyright__ = "Copyright (c) 2001 Mark Pilgrim"
__license__ = "Python"

from sgmllib import SGMLParser

class URLLister(SGMLParser):
  def reset(self):
    SGMLParser.reset(self)
    self.urls = []

  def start_a(self, attrs):
    href = [for k, v in attrs if k=='href']
    if href:
      self.urls.extend(href)

if __name__ == "__main__":
  import urllib
  usock = urllib.urlopen("http://diveintopython.org/")
  parser = URLLister()
  parser.feed(usock.read())
  parser.close()
  usock.close()
  for url in parser.urls: print url

           
       
Related examples in the same category
1.Obtain Web Page Information
2.Download from a websiteDownload from a website
java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.