HTML parser using perl

lightyear · Dec 9, 2012

I'm trying to parse the html file using perl script. I'm trying to grep all the text with html tag \[code\]p\[/code\]. If I view the source code the data is written in this format.\[quote\] \[code\]<p>\[/code\] Metrics are all virtualization specific and are prioritized and grouped as follows: \[code\]</p>\[/code\]\[/quote\]Here is the following code. \[code\]use HTML::TagParser();use URI::Fetch;//my @list = $html->getElementsByTagName( "p" ); foreach my $elem ( @list ) { my $tagname = $elem->tagName; my $attr = $elem->attributes; my $text = $elem->innerText; push (@array,"$text"); foreach $_ (@array) { # print "$_\n"; print $html_fh "$_\n"; chomp ($_); push (@array1, "$_"); } } }$end = $#array1+1;print "Elements in the array: $end\n";close $html_fh;\[/code\]The problem that I'm facing is that the output which is generated is 4.60 Mb and lot of the array elements are just repetition sentences. How can I avoid such repetition? Is there any other efficient way to grep the lines which I'm interested. Can anybody help me out with this issue?

HTML parser using perl

lightyear

New Member