Randoom a Michael Friis production

Posted
30 May 2010 @ 3pm

Categories
Python, Scraping, Uncategorized

Tagged
, , , ,

You're reading Randoom, a Michael Friis production

Screen scraping flight data from Amadeus checkmytrip.com

checkmytrip.com let’s you input an airplane flight booking reference and your surname in return for a flight itinerary. This is useful for building all sorts of services to travellers. Unfortunately Amadeus doesn’t have an API, nor are their url’s restful. Using Python, mechanize, htm5lib and BeautifulSoup, you can get at the data pretty easy though.

It is somewhat unclear whether Amadeus approve of people scraping their site, related debate here (check the comments).

I’m not a very good Python programmer (yet!) and the script below could probably be improved quite a lot:

import re
import mechanize
import html5lib
from BeautifulSoup import BeautifulSoup

br = mechanize.Browser()
re1 = br.open("http://www.checkmytrip.com")
br.select_form(nr=2)
br["REC_LOC"] = "BOOKREF"
br["DIRECT_RETRIEVE_LASTNAME"] = "LASTNAME"
re2 = br.submit()
html = re2.read()
doc = html5lib.parse(html)
soup =  BeautifulSoup(doc.toxml())
flightdivs = soup.findAll('div', { "class" : "divtableFlightConf" } )
for div in flightdivs:
    table = div.table
    daterow = table.findChildren("tr")[2]
    datecell = daterow.findChildren("td")[1].string.lstrip().rstrip()
    maincell = table.findChildren("tr")[3]
    timetable = maincell.table.findChildren("tr")[0].td.table
    times =  timetable.findAll("td", {"class" : "nowrap"})
    dtime = times[0].string.lstrip().rstrip()
    atime = times[1].string.lstrip().rstrip()
    airports = timetable.findAll("input", {"name" : "AIRPORT_CODE"})
    aairport = airports[0]['value'].lstrip().rstrip()
    dairport = airports[1]['value'].lstrip().rstrip()
    flight = table.findAll("td", {"id" : "segAirline_0_0"})[0].string.lstrip().rstrip()
    print '--'
    print 'date: ' + datecell
    print 'departuretime: ' + dtime
    print 'arrivaltime: ' + atime
    print 'departureairport: ' + dairport
    print 'arrivalairport: ' + aairport
    print 'flight: ' + flight

1 Comment

Posted by
Ari
31 May 2010 @ 11am

Great!

Thanks for the help :)


Leave a Comment