Parsing html in Python 2.7 with regex - don't really understand that

Christo

New Member
Sorry for being kinda dumb, but I really need help in Python.\[code\]['<a href="http://stackoverflow.com/questions/14045250/needs to be cut out">Foo to BAR</a>', '<a href="http://stackoverflow.com/questions/14045250/this also needs to be cut out">BAR to Foo</a>']\[/code\]So I have this tuple, and I need to cut out what's inside that href attribute and what's inside \[code\]<a>\[/code\] tag - basically, I want to get a tuple that looks like:\[code\][["needs to be cut out", "Foo to BAR"], ["this also needs to be cut out", "BAR to Foo"]]\[/code\]inside href attribute there are a lot of special symbols for example, \[code\]<a href="http://stackoverflow.com/questions/14045250/?a=p.stops&direction_id=23600&interval=1&t=wml&l=en">\[/code\]As I think, there's too much trouble in using HTML parsers if I really don't need to try to parse the object tree but only need a few url's and words from the webpage. But I can't really understand how to form regexes. Regexes that I formed seem to be completely wrong. So I'm asking if somebody could help me with it.
 
Back
Top