Session cookies not passing properly while scraping
Try using selenium to fetch the cookies from that site first according to how @furas suggested in comments. You can then use those cookies within headers while issuing get requests to grab the required response and result. I found success using the following approach:
import time
import requests
from selenium import webdriver
def get_cookies():
with webdriver.Chrome() as driver:
driver.get('https://www.walmart.com/store/5939-bellevue-wa/search?query=tillamook')
time.sleep(10)
driver_cookies = driver.get_cookies()
cookie = {c['name']:c['value'] for c in driver_cookies}
return cookie
headers = {
'accept': '*/*',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36',
'referer': 'https://www.walmart.com/store/5939-bellevue-wa/search?query=tillamook',
}
params = {
'query': 'tillamook',
'stores': '5939',
'cat_id': '0',
'ps': '24',
'offset': '24',
'prg': 'desktop',
'zipcode': '98006',
'stateOrProvinceCode': 'WA',
}
with requests.session() as session:
r = session.get(
'https://www.walmart.com/store/electrode/api/search',
headers=headers,
params=params,
cookies=get_cookies()
)
print(r.status_code) # Returns 200
print(r.json())
Python requests - Session not capturing response cookies
You can try this:
with requests.Session() as s:
s.get('https://www.website.co.uk/login')
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
The get request will set the required cookies.
Python requests Session does not have cookies enabled
It seems the problem is Javascript support, looking at the source upon first entry, we see this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<title>SafeMLS® Error</title>
<link id="logincss" rel="stylesheet" href="https://cdn.clareitysecurity.net/css/login.css" />
<script src="https://cdn.clareitysecurity.net/js/remember.min.js" type="text/javascript"></script>
</head>
<body>
<script type="text/javascript" src="https://cdn.clareitysecurity.net/sys/maxebrd/googletrack.js"></script>
<!--
LocalAddr: 172.16.17.42
LocalName: clt-web-pt01-a.safemls.net
ServerName: idp.maxebrd.safemls.net
-->
<script type="text/javascript">
if (isCookieEnabled() == false) {
alert("Error. Your browser does not have cookies enabled. This login page will not function without cookie support.");
document.location.href = "/idp/nocookies.jsp";
} else {
document.location.href = "https://maxebrdi.paragonrels.com/";
}
</script>
</body>
</html>
Since requests
has no Javascript, we have to manually apply whatever the page needs to load correctly.
We see that it's redirecting us to "https://maxebrdi.paragonrels.com/", which probably sets the correct cookies for us to use on the login page. Fortunately, requests.Session()
does that and the redirections for us by default.
import requests
headers = {
"Accept": "*/*",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0"
}
with requests.Session() as session:
username = "username"
password = "password"
data = {
"j_username": username,
"password": password,
"j_password": password,
"j_logintype": "sso" #seems to be constant
}
#first set cookies
session.get("https://maxebrdi.paragonrels.com/", headers=headers)
#then do login
result = session.post("https://idp.maxebrd.safemls.net/idp/Authn/UserPassword", headers=headers, data=data)
print(result.text)
This returns the "No User Found" message(since the password is incorrect).
I suggest you use Javascript disabling extensions, clear the page's cookies and re-visit so you can see the webpage just like requests
does, also keeping a look at the "Network" tab to see what requests are being made from your browser and replicating it in your script.
Meet cookie error when crawl website that use php session
I am not certainly sure what you are trying to obtain from that website but I will try to help.
First page with results can be obtained through this url:
https://db.aa419.org/fakebankslist.php?psearch=essa&Submit=GO&start=1
Value 1 for start key indicates the first result that apears on page. Since there are 19 results on each page to view second page you need to switch '1' to '21' :
https://db.aa419.org/fakebankslist.php?psearch=essa&Submit=GO&start=21
The second thing is that your requests should be made using GET method.
I checked the response of page 3 and found it's some random page irrelevant to my query words "sites"
I believe this is related to broken search engine of the website.
I hope this code helps:
#crawl page 1-5
s = requests.Session()
for i in range(0, 5):
url = 'https://db.aa419.org/fakebankslist.php?psearch=essa&Submit=GO start='+str(1+i*20)
response = s.get(url)
cookies= s.cookies #update cookie for each page
print('For page ', i+1, 'with results from', 1+i*20, 'to', i*20+20, ', cookies are:', str(cookies))
Cookie authentication error using Python requests
Try including the cookies separately from the headers, like this:
import requests
es_headers = {
'kbn-version': "5.5.0",
'Content-Type': "application/json",
}
session = requests.Session()
session.cookies.update({'Cookie': "session_2=eyJhbGciOi....(long string)"})
r = session.post(url, timeout=15, data=json.dumps(body), headers=es_headers)
hope this helps
Python Requests - Cookies error
Cookie values must be str
objects, but binascii.hexlify()
returns a bytes
object:
>>> import binascii
>>> x = 1
>>> binascii.hexlify(str(x).encode('ascii')+b'-admin')
b'312d61646d696e'
Decode that first:
cookies = {
'PHPSESSID': binascii.hexlify(b'%d-admin' % x).decode('ascii')
}
Related Topics
Argparse with Required Subparser
Filtering a List Based on a List of Booleans
Convert Bytes to Floating Point Numbers
Using Numpy Vectorize on Functions That Return Vectors
Python Runtimewarning: Overflow Encountered in Long Scalars
Python Parsing Bracketed Blocks
How to Set a Default Value for a Wtforms Selectfield
How to Add a Timeout to a Function in Python
Read Unicode Characters from Command-Line Arguments in Python 2.X on Windows
Python JSON.Loads Fails with 'Valueerror: Invalid Control Character At: Line 1 Column 33 (Char 33)'
How to Include a Python Package with Hadoop Streaming Job
How to Use a String as a Keyword Argument
Printing Tuple with String Formatting in Python
Securely Erasing Password in Memory (Python)