Skip to content

Latest commit

 

History

History
55 lines (40 loc) · 1.16 KB

web-data-handling.md

File metadata and controls

55 lines (40 loc) · 1.16 KB

Web Data Handling



Make a Request

import urllib.request ## urllib module is used to send request and receive response from a server. It can used to get html / JSON / XML data from an api.

webData = request.urlopen("http://www.google.com") ## It opens a connection to google.com and returns an object of class http.client.HTTPResponse

Read

.read() return the HTML data of the webpage.

Get Code

.getcode() returns the status code of the connection establishment.

HTML Parsing

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def error(self, message):
        pass
        
parser = MyHTMLParser()
f = open("check.html")
if f.mode == 'r':  # file successfully opened
    contents = f.read()
    parser.feed(contents)

JSON Parsing

import json
json_data = json.loads(response)

XML Parsing

import xml.dom.minidom
doc = minidom.parse("check.xml")