what are the advantages and disadvantages of the following libraries?PHP Simple HTML DOM ParserQPphpQueryFrom the above i've used QP and it failed to parse invalid HTML, and simpleDomParser, that does a good job, but it kinda leaks memory because of the object model. But you may keep that under control by calling $object->clear(); unset($object); when you dont need an object anymore.Are there any more scrapers? What are your experiences with them? I'm going to make this a community wiki, may we'll build a useful list of libraries that can be useful when scraping.i did some tests based Byron's answer: <? include("lib/simplehtmldom/simple_html_dom.php"); include("lib/phpQuery/phpQuery/phpQuery.php"); echo "<pre>"; $html = file_get_contents("http://stackoverflow.com/search?q=favorite+programmer+cartoon"); $data['pq'] = $data['dom'] = $data['simple_dom'] = array(); $timer_start = microtime(true); $dom = new DOMDocument(); @$dom->loadHTML($html); $x = new DOMXPath($dom); foreach($x->query("//a") as $node) { $data['dom'][] = $node->getAttribute("href"); } foreach($x->query("//img") as $node) { $data['dom'][] = $node->getAttribute("src"); } foreach($x->query("//input") as $node) { $data['dom'][] = $node->getAttribute("name"); } $dom_time = microtime(true) - $timer_start; echo "dom: \t\t $dom_time . Got ".count($data['dom'])." items \n"; $timer_start = microtime(true); $doc = phpQuery::newDocument($html); foreach( $doc->find("a") as $node) { $data['pq'][] = $node->href; } foreach( $doc->find("img") as $node) { $data['pq'][] = $node->src; } foreach( $doc->find("input") as $node) { $data['pq'][] = $node->name; } $time = microtime(true) - $timer_start; echo "PQ: \t\t $time . Got ".count($data['pq'])." items \n"; $timer_start = microtime(true); $simple_dom = new simple_html_dom(); $simple_dom->load($html); foreach( $simple_dom->find("a") as $node) { $data['simple_dom'][] = $node->href; } foreach( $simple_dom->find("img") as $node) { $data['simple_dom'][] = $node->src; } foreach( $simple_dom->find("input") as $node) { $data['simple_dom'][] = $node->name; } $simple_dom_time = microtime(true) - $timer_start; echo "simple_dom: \t $simple_dom_time . Got ".count($data['simple_dom'])." items \n"; echo "</pre>";and got dom: 0.00359296798706 . Got 115 items PQ: 0.010568857193 . Got 115 items simple_dom: 0.0770139694214 . Got 115 items