Parsing HTML and replacing strings

robftw23 · Sep 13, 2012

I have a large quantity of partial HTML stored in a CMS database.I'm looking for a way to go through the HTML and find any \[code\]<a></a>\[/code\] tags that don't have a title and add a title to them based on the contents of the tags.So if I had \[code\]<a href="http://stackoverflow.com/questions/3900954/somepage">some text</a>\[/code\] I'd like to modify the tag to look like:\[code\]<a title="some text" href="http://stackoverflow.com/questions/3900954/somepage"></a>\[/code\]Some tags already have a title and some anchor tags have nothing between them.So far I've managed to make some progress with php and regex.But I can't seem to be able to get the contents of the anchors, it just displays either a 1 or a 0.\[code\]<?php$file = "test.txt";$handle = fopen("$file", "r");$theData = http://stackoverflow.com/questions/3900954/fread($handle, filesize($file));$line = explode("\r\n", $theData);$regex = '/^.*<a ((?!title).)*$/'; //finds all lines that don't contain an anchor with a title$regex2 = '/<a .*><\/a>/'; //finds all lines that have nothing between the anchors$regex3 = '/<a.*?>(.+?)<\/a>/'; //finds the contents of the anchorsforeach ($line as $lines){ if (!preg_match($regex2, $lines) && preg_match($regex, $lines)){ $tags = $lines; $contents = preg_match($regex3, $tags); $replaced = str_replace("<a ", "<a title=\"$contents\" ", $lines); echo $replaced ."\r\n"; } else { echo $lines. "\r\n"; }}?>\[/code\]I understand regex is probably not the best way to parse HTML so any help or alternate suggestions would be greatly appreciated.

Parsing HTML and replacing strings

robftw23

New Member