ooklbs1978
New Member
For the past few days I have been trying to scrape a website but so far with no luck.The situation is as following:The website I am trying to scrape requires data from a form submitted previously. I have recognized the variables that are required by the web app and have investigated what HTTP headers are sent by the original web app.Since I have pretty much zero knowledge in ASP.net, thought I'd just ask whether I am missing something here.I have tried different methods (CURL, get contents and the Snoopy class), here's my code of the curl method:\[code\]<?php$url = 'http://www.urltowebsite.com/Default.aspx';$fields = array('__VIEWSTATE' => 'averylongvar', '__EVENTVALIDATION' => 'anotherverylongvar', 'A few' => 'other variables');$fields_string = http_build_query($fields);$curl = curl_init($url);curl_setopt_array( $curl, array ( CURLOPT_RETURNTRANSFER => true, CURLOPT_SSL_VERIFYPEER => 0, // Not supported in PHP CURLOPT_SSL_VERIFYHOST => 0, // at this time. CURLOPT_HTTPHEADER => array ( 'Content-type: application/x-www-form-urlencoded; charset=utf-8', 'Set-Cookie: ASP.NET_SessionId='.uniqid().'; path: /; HttpOnly' ), CURLOPT_POST => true, CURLOPT_POSTFIELDS => $fields_string, CURLOPT_FOLLOWLOCATION => 1 ));$response = curl_exec($curl);curl_close($curl);echo $response;?>\[/code\]The following headers were requested:
- Request URL:http://www.urltowebsite.com/default.aspx
- Request MethodOST
- Status Code: 200 OK
- Accept:application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5
- Content-Type:application/x-www-form-urlencoded
- User-Agent:Mozilla/5.0 (Macintosh; U;Intel Mac OS X 10_6_4; en-us)AppleWebKit/533.18.1 (KHTML, likeGecko) Version/5.0.2 Safari/533.18.5
- A lot of form fields
- Cache-Controlrivate
- Content-Length:30168
- Content-Type:text/html; charset=utf-8
- Date:Thu, 09 Sep 2010 17:22:29 GMT
- Server:Microsoft-IIS/6.0
- X-Aspnet-Version:2.0.50727
- X-Powered-By:ASP.NET