php curl + post + multipart/form
I am trying to scrap this web site https://www.machinemart.co.uk/, I need to add an product to cart to get certain data. The site uses a post request to add products to cart.
URL of the product that i am trying to add: https://www.machinemart.co.uk/p/clarke-amf-panel-for-kc6-and-kc10/
Here an example of the header and body request:
Request Headers:
Host: www.machinemart.co.uk
Agent: Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.machinemart.co.uk/p/clarke-amf-panel-for-kc6-and-kc10/
Content-Type: multipart/form-data; boundary=---------------------------209892343219764726031397980914
Content-Length: 1224
Cookie: ASP.NET_SessionId=f3p1y0ib1za3vch4exbue4l3;
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
-------------------------------------
Request Body
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="__RequestVerificationToken"
2MpklOgIiis94EbGlIIoF5N_bzOrFegTpV_YEHTlZysKZrGxeAwBaFg5S4xtnGi8Jth5CEGRn9ETlK_g55jb6k9DcHGO-RR-LXug2roZEQg1
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="ProductId"
4aaad9a5-ad65-4842-a24a-5f455b263933
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="ProductSku"
010629550
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="Quantity"
1
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="SubmitButton"
Home delivery
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="ufprt"
DA81438F81A7BE767B068EED46F4A4CAC24A05FC23BEAFE8B1A4B536FA6EC79AA5C17510979DB132CAC8C33C62E03A07E766C55C45DAE114A63B816F7CADEE9AB165197FBCF088E0FEBAAD9E6D8145291AB9984B8764A82C56C33D9D20394A22D1E148BF3EF97DC02EC48E5C4C491B3368B66B0A750BA6815B049A13F590BC8D6A6F05D3B96F81E0308742BD37D92E81
-----------------------------209892343219764726031397980914--
I am able to get the needed data for each product (id, sku, token, ufp) and also I tried to get the session id. This is the code:
$data = '---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="__RequestVerificationToken"rnrn'.$request_dataToken.'rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="ProductId"rnrn'.$request_dataProductId.'rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="ProductSku"rnrn'.$request_dataProductSku.'rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="Quantity"rnrn1rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="SubmitButton"rnrnHome deliveryrn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="ufprt"rnrn'.$request_dataUFPRT.'rn---------------------------17064761399835087311752471201';
//Get the session id
$session_id = '';
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $request_url);
curl_setopt($curl_handle, CURLOPT_POST, TRUE);
curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl_handle, CURLOPT_HEADER, 1);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($curl_handle);
// get cookies
$cookies = array();
preg_match_all('/Set-Cookie:(s{0,}.*)$/im', $response, $cookies);
curl_close($curl_handle);
foreach ($cookies[1] as $cookie){
if(preg_match('/Id=([a-z0-9]+);/', $cookie, $out)){
$session_id = $out[1];
}
}
//Try to add the product with the session id
$curl_handle = curl_init();
curl_setopt($curl_handle,CURLOPT_COOKIE,'ASP.NET_SessionId='.$session_id);
curl_setopt($curl_handle, CURLOPT_URL, $request_url);
curl_setopt($curl_handle, CURLOPT_POST, TRUE);
curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
// curl_setopt($curl_handle, CURLOPT_HEADER, 1);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($curl_handle);
curl_close($curl_handle);
//Get the cart page -- Actually empty returns
$curl_handle = curl_init();
curl_setopt($curl_handle,CURLOPT_COOKIE,'ASP.NET_SessionId='.$session_id);
curl_setopt($curl_handle, CURLOPT_URL, $request_urlCart);
//curl_setopt($curl_handle, CURLOPT_POST, TRUE);
//curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
//curl_setopt($curl_handle, CURLOPT_HEADER, 1);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($curl_handle);
curl_close($curl_handle);
print $response;
Here is an example of the output of variable $data:
---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="__RequestVerificationToken"rnrnx9RTuiKcC0IJR0OwNFicu6XxPXoOt5dtgaXVEdQxpwRGGv52fdv7IJ9zcjkz1HnYaLX5yaz2e0pXCZFLi_judYwQLT--vCg2_xUMzsaT5Rc1rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="ProductId"rnrn4aaad9a5-ad65-4842-a24a-5f455b263933rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="ProductSku"rnrn010629550rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="Quantity"rnrn1rn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="SubmitButton"rnrnHome deliveryrn---------------------------17064761399835087311752471201rnContent-Disposition: form-data; name="ufprt"rnrnF5CA6BAC0C5C12E3B885CE69FE5E0D24480EA23E895AD3DA72BFDF6832B56CD8A70F1183BFA03F61AD353FA86DCDD71CA105A86A0274A27152E68A66449191BD8167B6E06A2982B326BBC1E47C7C9AB3984A7BB17ECB9E153496542F7DE8B00D97FEFAE8B6120A6C3B87CAA74E875E68BE894586468FD0704B11346A6E1BC902BC538D64CA23DD87068DCA52CC5AC19Frn---------------------------17064761399835087311752471201
I tried add headers to request, send parameters as array... But still does not works, what I am doing bad ?
A solution for you although i have to say that code needs some extra work in order to be more solid (i mark the error prone parts with comments in the code)
1.Put any product page url (not from the product listing) there could be a solution from the product listing also but this example works through the product page.
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,"https://www.machinemart.co.uk/p/clarke-ctj2qlp-2-tonne-quick-lift-low-profile/"); //type your url here as explained above
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch,CURLOPT_COOKIEJAR,$_SERVER['DOCUMENT_ROOT'].'/extra path here/'.$_COOKIE['PHPSESSID'].'.txt');
curl_setopt($ch,CURLOPT_COOKIEFILE,$_SERVER['DOCUMENT_ROOT'].'/extra path here/'.$_COOKIE['PHPSESSID'].'.txt');
$data = curl_exec($ch);
curl_close($ch);
The previous fragment will get the url and also set a COOKIE
2.Next we will get data from the form by utilizing the PHP DOM library, all the required fields in order for the cart functionality to work are inside a form element. In my example i assume that the 2nd form element is actually the product form, but your must carefully check which one is the correct
libxml_use_internal_errors(true);
$siteData = new DOMDocument();
$siteData->loadHTML($data);
$forms = $siteData->getElementsByTagName("form");
$inputs = $forms->item(1)->getElementsByTagName("input");
$search = array();
for($i=0;$i<$inputs->length;$i++){
if($inputs->item($i)->getAttribute("class")!="greyBtn"){
$search[$inputs->item($i)->getAttribute("name")] = $inputs->item($i)->getAttribute("value");
}
}
$submitURL = "https://www.machinemart.co.uk".$forms->item(1)->getAttribute("action");
code gets all the names and values and add them to an array named $search, also creates an other variable named $submitURL which is the url that the curl in step 3 will use as argument.
3.We call curl again providing as target url the $submitURL variable and the $search array as post parameters.
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,$submitURL);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_POSTFIELDS,http_build_query($search));
curl_setopt($ch,CURLOPT_POST, true);
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch,CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
curl_setopt($ch,CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch,CURLOPT_COOKIEJAR,LOG_DIR.'/'.$_COOKIE['PHPSESSID'].'.txt');
curl_setopt($ch,CURLOPT_COOKIEFILE,LOG_DIR.'/'.$_COOKIE['PHPSESSID'].'.txt');
$data = curl_exec($ch);
curl_close($ch);
echo $data;
$data variable holds the page as a result, if you echo it (although ugly, since css are missing) you will see that the products are in the cart.
链接地址: http://www.djcxy.com/p/48796.html