HTML parser using perl

lightyear

New Member
I'm trying to parse the html file using perl script. I'm trying to grep all the text with html tag \[code\]p\[/code\]. If I view the source code the data is written in this format.\[quote\] \[code\]<p>\[/code\] Metrics are all virtualization specific and are prioritized and grouped as follows: \[code\]</p>\[/code\]\[/quote\]Here is the following code. \[code\]use HTML::TagParser();use URI::Fetch;//my @list = $html->getElementsByTagName( "p" ); foreach my $elem ( @list ) { my $tagname = $elem->tagName; my $attr = $elem->attributes; my $text = $elem->innerText; push (@array,"$text"); foreach $_ (@array) { # print "$_\n"; print $html_fh "$_\n"; chomp ($_); push (@array1, "$_"); } } }$end = $#array1+1;print "Elements in the array: $end\n";close $html_fh;\[/code\]The problem that I'm facing is that the output which is generated is 4.60 Mb and lot of the array elements are just repetition sentences. How can I avoid such repetition? Is there any other efficient way to grep the lines which I'm interested. Can anybody help me out with this issue?
 
Back
Top