python parsing with beautiful soup

mubi201 · Mar 31, 2013

I have a question regarding HTML parsing with BeautifulSoup. The website I am trying to parse is this one: http://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html?page=1&pageSize=40At first I needed to write a function that would give me all h3-tags and all p-tags. I did that as follows:\[code\] from bs4 import BeautifulSoup import urllib2 website=urllib2.urlopen("http://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html","r") def parseUsingSoup2(content): list1=soup.findAll('h3') list2=soup.findAll('p') return list1+list2 parseUsingSoup2(website)\[/code\]The next part of the problem asks for a list of events (there is only one event though on the website) with 4 tuples: the time slot, the title, the type and the description.I don't really know how to start with that. My first attempt was this:\[code\] def GeneratingListofEvents(content): event={} list=['time', 'title', 'feature', 'description'] for item in list: \[/code\]However, I have no idea if this is heading in the right direction, and I haven't managed to retrieve for instance the time from the HTML document without typing it manually. Thank you in advance.

python parsing with beautiful soup

mubi201

New Member