Parsing Hyperlinks from a webpage

EremAVapSmerm · Sep 26, 2012

I have written following code to parse hyperlinks from a given page.\[code\] WebClient web = new WebClient(); string html = web.DownloadString("http://www.msdn.com"); string[] separators = new string[] { "<a ", ">" }; List<string> hyperlinks= html.Split(separators, StringSplitOptions.None).Select(s => { if (s.Contains("href")) return s; else return null; }).ToList();\[/code\]Although string split still has to be tweaked to return urls perfectly. My question is there some Data Structure, something on the line of XmlReader or so, which could read HTML strings efficiently.Any suggestion for improving above code would also be helpful.Thanks for your time.

Parsing Hyperlinks from a webpage

EremAVapSmerm

New Member