RSS Feeds and image extraction indepth

jukosch

New Member
I have spent time trying to solve this problem and this is as far as ive got. basically im trying to pull images from rss feeds. i use magpie to process the feeds as shown below.. this snippet is within a class\[code\]function getImagesUrl($str) { $a = array(); $pos = 0; $topos; $init = 1; while($init) { $pos = strpos($str, "img", $pos); if($pos != FALSE) { $topos = strpos($str, ">", $pos); $imagetag = substr($str, $pos, ($topos - $pos)); $url = $this->getImageUrl($imagetag); $pos = $topos; array_push($a, $url); } else { $init = 0; } } return $a;}/* * get the full url inside src atribute in <img>*/function getImageUrl($image) { $p = strpos($image, "src="http://stackoverflow.com/questions/3793768/, 0); $p+= 5; // remove o src=" $tp = strpos($image, '" ', $p); $str = substr($image, $p, ($tp - $p)); return $str;} \[/code\]using the above functions... i call them this way... so far this outputs the data i'll paste later on\[code\] @$rss = fetch_rss($rsso->url); if (@$rss) { $items=$rss->items; foreach ($items as $item ) { if (isset($item['title'])&&isset($item['description'])) { $hash=md5($this->es($item['title']).$this->es($item['description'])); $content = $item['content']; foreach($content as $c) { // get the images on content $arr = $this->getImagesUrl($c); print_r($arr); }\[/code\]here is an example of output\[code\] 1. Array ( [0] => http://api.tweetmeme.com/imagebutton.gif?url=http://mashable.com/2010/09/25/trailmeme/ [1] => http://cdn.mashable.com/wp-content/plugins/wp-digg-this/i/gbuzz-feed.png [2] => http://mashable.com/wp-content/plugins/wp-digg-this/i/fb.jpg [3] => http://mashable.com/wp-content/plugins/wp-digg-this/i/diggme.png [4] => http://ec.mashable.com/wp-content/uploads/2009/01/bizspark2.gif [5] => http://cdn.mashable.com/wp-content/uploads/2010/09/web.png [6] => http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png [7] => http://cdn.mashable.com/wp-content/uploads/2009/02/bizspark.jpg [8] => http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/0/di [9] => [10] => http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/1/di [11] => [12] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:D7DqB2pKExk [13] => [14] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:V_sGLiPBpWU [15] => [16] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:F7zBnMyn0Lo [17] => [18] => http://feeds.feedburner.com/~ff/Mashable?d=qj6IDK7rITs [19] => [20] => http://feeds.feedburner.com/~ff/Mashable?d=_e0tkf89iUM [21] => [22] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:gIN9vFwOqvQ [23] => [24] => http://feeds.feedburner.com/~ff/Mashable?d=yIl2AUoC8zA [25] => [26] => http://feeds.feedburner.com/~ff/Mashable?d=P0ZAIrC63Ok [27] => [28] => http://feeds.feedburner.com/~ff/Mashable?d=I9og5sOYxJI [29] => [30] => http://feeds.feedburner.com/~ff/Mashable?d=CC-BsrAYo0A [31] => [32] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:_cyp7NeR2Rw [33] => [34] => http://feeds.feedburner.com/~r/Mashable/~4/0N_mvMwPHYk )\[/code\]is there a way i can filter out the correct url for image? for example.... i would like to strip out urls with no extensions of "jpg,png,gif" etc. secondly, i would like to scrap urls with eg bizspark, digg, facebook, tweet, twitter etc. anybody found any easier way of doing this? please help me out
 
Back
Top