php parser: Determine if string found via regex is inside an anchor tag

robLOL

New Member
Edited. I know HTML should not be parsed with regex. I am asking for help. How can I find an arbitrary string in a mix of tags and text and then determine if it is inside an anchor?I have an interactive glossary in my WordPress site. Part of its functionality is searching the content of a post for a glossary term (a text string). If found, the term is wrapped in a link to a custom taxonomy entry that contains the definition. I like how it works, but one hitch is that if the term is already part of a link, the glossary parser hijacks the current link, by inserting a link within the link. The parser is purely regex based, there isn't DOM parsing. I know that HTML should not be parsed with regex. But currently the function is just searching for a specific text string, its not trying to do anything with tags at all.But is there a relatively fast (in terms of processing) and reliable way I can check if the found string is inside an anchor tag? Obviously this would not always be the case, as the word could be seemingly be inside any tag. The glossary parser would not add a link in this case. I know this feature would use a DOM parser, but I'm unsure where to go from here.The parser:\[code\]function glossary_parse($content){ //Run the glossary parser if (((!is_page() && get_option('glossaryOnlySingle') == 0) OR (!is_page() && get_option('glossaryOnlySingle') == 1 && is_single()) OR (is_page() && get_option('glossaryOnPages') == 1))){ $glossary_index = get_children(array( 'post_type' => 'glossary', 'post_status' => 'publish', )); $current_title = get_the_title(); if ($glossary_index){ $timestamp = time(); foreach($glossary_index as $glossary_item){ $timestamp++; $glossary_title = $glossary_item->post_title; if ($current_title == $glossary_title) { continue; } $glossary_search = '/\b'.$glossary_title.'s*?\b(?=([^"]*"[^"]*")*[^"]*$)/i'; $glossary_replace = '<a'.$timestamp.'>$0</a'.$timestamp.'>'; if (get_option('glossaryFirstOnly') == 1) { $content_temp = preg_replace($glossary_search, $glossary_replace, $content, 1); } else { $content_temp = preg_replace($glossary_search, $glossary_replace, $content); } $content_temp = rtrim($content_temp); $link_search = '/<a'.$timestamp.'>('.$glossary_item->post_title.'[A-Za-z]*?)<\/a'.$timestamp.'>/i'; if (get_option('glossaryTooltip') == 1) { $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '" onmouseover="tooltip.show(\'' . addslashes($glossary_item->post_excerpt) . '\');" onmouseout="tooltip.hide();">$1</a>'; } else { $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '">$1</a>'; } $content_temp = preg_replace($link_search, $link_replace, $content_temp); $content = $content_temp; } } } return $content;}\[/code\]
 
Top