How to Emulate a Get Request Exactly Like a Web Browser

How can I emulate a get request exactly like a web browser?

Are you sure the curl module honors ini_set('user_agent',...)? There is an option CURLOPT_USERAGENT described at http://docs.php.net/function.curl-setopt.

Could there also be a cookie tested by the server? That you can handle by using CURLOPT_COOKIE, CURLOPT_COOKIEFILE and/or CURLOPT_COOKIEJAR.

edit: Since the request uses https there might also be error in verifying the certificate, see CURLOPT_SSL_VERIFYPEER.

$url="https://new.aol.com/productsweb/subflows/ScreenNameFlow/AjaxSNAction.do?s=username&f=firstname&l=lastname";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
var_dump($result);

How to use curl to get a GET request exactly same as using Chrome?

If you need to set the user header string in the curl request, you can use the -H option to set user agent like:

curl -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36" http://stackoverflow.com/questions/28760694/how-to-use-curl-to-get-a-get-request-exactly-same-as-using-chrome

Updated user-agent form newest Chrome at 02-22-2021


Using a proxy tool like Charles Proxy really helps make short work of something like what you are asking. Here is what I do, using this SO page as an example (as of July 2015 using Charles version 3.10):

  1. Get Charles Proxy running
  2. Make web request using browser
  3. Find desired request in Charles Proxy
  4. Right click on request in Charles Proxy
  5. Select 'Copy cURL Request'

Copy cURL Request example in Charles 3.10.2

You now have a cURL request you can run in a terminal that will mirror the request your browser made. Here is what my request to this page looked like (with the cookie header removed):

curl -H "Host: stackoverflow.com" -H "Cache-Control: max-age=0" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36" -H "HTTPS: 1" -H "DNT: 1" -H "Referer: https://www.google.com/" -H "Accept-Language: en-US,en;q=0.8,en-GB;q=0.6,es;q=0.4" -H "If-Modified-Since: Thu, 23 Jul 2015 20:31:28 GMT" --compressed http://stackoverflow.com/questions/28760694/how-to-use-curl-to-get-a-get-request-exactly-same-as-using-chrome

How can I emulate a web browser http request from code?

You should use Fiddler to capture the request that you want to simulate.
You need to look at the inspectors > raw.
This is an example of a request to the fiddler site from chrome

GET http://fiddler2.com/ HTTP/1.1
Host: fiddler2.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36
Referer: https://www.google.be/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,nl;q=0.6

You can then set each one of these headers in your webrequest (see http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx).

WebRequest request = (HttpWebRequest)WebRequest.Create("http://www.test.com");      
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36";

How to use Python requests to fake a browser visit a.k.a and generate User Agent?

Provide a User-Agent header:

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

FYI, here is a list of User-Agent strings for different browsers:

  • List of all Browsers

As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:

fake-useragent

Up to date simple useragent faker with real world database

Demo:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

How to simulate a full browser's request to an HTML document?

Browsers do use multiple connections, in order to speed up the downloading (parallel downloading of resources). They limit however the number of connections to the same host, which is one of the reasons for the existence of content-delivery networks.

The order of CSS and Script files in the header do matter, as scripts block parallel downloading (unless the script is not defered).

Also browsers parse HTML while they receive it (again to speed up things) - this is a reason if you put a script in the head that tries to manipulate DOM elements not yet loaded, you'll get an error.

But all of this are browser implementation details that may not be important for your task.
Best - look at the source code of some headless browser to find what's going on.

How to simulate exact browser request using PHP script?

Use postman. It will automatically create the php code from your request.

Step 1:
copy request as cUrl(cmd) using your browser (I use Chrome)

Step 2:
Click import on postman. Select raw text. paste copied request. imort it.

Step 3:
Select "code"

Step 4:
Select your preferred language.

Sample Image

It will generate your code automatically.

How can I emulate a popular web browser when I visit this website?/Why am I getting a 403 error?

You can use an URLConnection and set the User-Agent:

URL server = new URL("http://www.ace-spades.com/play");
URLConnection connection = server.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2");

Basically, you could subclass JEditorPane and override getStream(URL page) to add the User-Agent string.

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;

import javax.swing.JEditorPane;

public class UserAgentEditorPane extends JEditorPane {

private static final long serialVersionUID = 1L;

private String userAgent;

public UserAgentEditorPane(URL url, String userAgent) throws IOException {
super(url);
this.userAgent = userAgent;
}

@Override
protected InputStream getStream(URL page) throws IOException {
URLConnection conn = page.openConnection();
conn.setRequestProperty("User-Agent", userAgent);
setContentType(conn.getContentType());
return conn.getInputStream();
}

}

PHP: How to get website with cURL and act like a real browser?

There is no difference between a cURL request and the request that a browser makes, apart from the HTTP headers it requests, and that a browser has JavaScript running on the client.

The only thing that identifies an HTTP client is its headers -- typically the user agent string -- and seeing as you have set the user agent to exactly the same as the browser, there must be other checks in place.

By default, cURL doesn't send any default Accept header, whereas browsers request pages with this header to show the capabilities of the browser. I expect the web server will be checking on something like this.

Copy HTTP request as cURL

Take a look at the screenshot above of Chrome Developer Tools. It allows you to copy the whole request as a cURL request, including all the headers that were sent from Chrome, for testing in the terminal.

Try to match all the headers exactly from within your PHP, and I'm sure the web server will not be able to identify you as a script.



Related Topics



Leave a reply



Submit