Using XPath to parse HTML NFL.com

NodCom · Oct 5, 2012

http://www.nfl.com/widget/gc/2011/tabs/cat-post-boxscore?gameId=2012093000I am looking to scrape the data from pages like the link above (that is, game level NFL data).NFL.com has a handy JSON API that makes a lot of this data accessible. That is, for games 2010 and later. For earlier games, I am going to have to parse the HTML of pages similar to the one above.I've been trying to scrape this using Xpath. However, I have found it difficult to differentiate between the table headers which are table rows of class "thd2" and the data which are tables rows of class "tbdy1"If anyone knows how to loop through this data and extract the data, the table headers and get them into an array, I'd like to see your approach!\[code\]$curl = curl_init('http://www.nfl.com/widget/gc/2011/tabs/cat-post-boxscore?gameId=2012093000');curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');$html = curl_exec($curl);curl_close($curl);$dom = new DOMDocument();@$dom->loadHTML($html);$xpath = new DOMXPath($dom);$tables = $xpath->query('//table[1]/tbody/td');var_dump($tables);\[/code\]

Using XPath to parse HTML NFL.com

NodCom

New Member