Add params to given URL in Python
There are a couple of quirks with the urllib
and urlparse
modules. Here's a working example:
try:
import urlparse
from urllib import urlencode
except: # For Python 3
import urllib.parse as urlparse
from urllib.parse import urlencode
url = "http://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}
url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)
url_parts[4] = urlencode(query)
print(urlparse.urlunparse(url_parts))
ParseResult
, the result of urlparse()
, is read-only and we need to convert it to a list
before we can attempt to modify its data.
How to add custom parameters to an URL query string with Python?
You can use urlsplit()
and urlunsplit()
to break apart and rebuild a URL, then use urlencode()
on the parsed query string:
from urllib import urlencode
from urlparse import parse_qs, urlsplit, urlunsplit
def set_query_parameter(url, param_name, param_value):
"""Given a URL, set or replace a query parameter and return the
modified URL.
>>> set_query_parameter('http://example.com?foo=bar&biz=baz', 'foo', 'stuff')
'http://example.com?foo=stuff&biz=baz'
"""
scheme, netloc, path, query_string, fragment = urlsplit(url)
query_params = parse_qs(query_string)
query_params[param_name] = [param_value]
new_query_string = urlencode(query_params, doseq=True)
return urlunsplit((scheme, netloc, path, new_query_string, fragment))
Use it as follows:
>>> set_query_parameter("/scr.cgi?q=1&ln=0", "SOMESTRING", 1)
'/scr.cgi?q=1&ln=0&SOMESTRING=1'
Python requests call with URL using parameters
you will need to URL encode the URL you are sending to the API.
The reason for this is that the ampersands are interpretted by the server as markers for parameters for the URL https://extraction.import.io/query/extractor/XXX?
This is why they are getting stripped in the url:
http://www.example.co.uk/items.php?sortby=Price_LH
Try the following using urllib.quote(row_dict['url'])
:
import requests
import json
import urllib
row_dict = {
'url': u'http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35',
'crawler_id': u'zzz'}
url_call = 'https://extraction.import.io/query/extractor/{0}?_apikey={1}&url={2}'.format(
row_dict['crawler_id'], auth_key, urllib.quote(row_dict['url']))
r = requests.get(url_call)
rr = json.loads(r.content)
Adding Parameter to URL to Iterate in Python
You can use a for
loop, with the range
function to create a list of ids:
#/bin/python3
import requests
base_url = "https://example.com/api/"
headers = {"Accept": "application/json", "Authorization": "Bearer 123456"}
ids = [1,2,3,4] # or range(5)
for id in ids:
response = requests.request("POST", base_url + str(id), headers=headers)
print(response.text)
(working example: https://replit.com/@LukeStorry/67988932)
Python: How to only URL Encode a specific URL Parameter?
split the string on the &q=/
part and only encode the last string
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
encoded = parse.quote_plus(url.split("&q=/")[1])
encoded_url = f"{url.split('&q=/')[0]}&q=/{encoded}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22
Note that there's a difference between this and the requested output, but you have an url encoded space (%20
) at the end
EDIT
Comment shows a different need for the encoding, so the code needs to change a bit. The code below only encodes the part after &q=
. Basically, first split the url and the parameters, then iterate through the parameters to find the q=
parameter, and encode that part. Do some f-string and join magic and you get an url that has the q
parameter encoded. Note that this might have issues if an &
is present in the part that needs to be encoded.
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
newparameters = []
for parameter in parameters.split("&"):
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22&utm_source=test1&cpc=123&gclid=abc123
EDIT 2
Trying to solve the edge case where there's a &
character in the string to be encoded, as this messes up the string.split("&")
.
I tried using urllib.parse.parse_qs()
but this has the same issue with the &
character. Docs for reference.
This question is a nice example of how edge cases can mess up simple logic and make it overly complicated.
The RFC3986 also didn't specify any limitations on the name of the query string, otherwise that could've been used to narrow down possible errors even more.
updated code
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/&"TE&eeST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
# addition to handle & in the querystring.
# it reduces errors, but it can still mess up if there's a = in the part to be encoded.
split_parameters = []
for index, parameter in enumerate(parameters.split("&")):
if "=" not in parameter:
# add this part to the previous entry in split_parameters
split_parameters[-1] += f"&{parameter}"
else:
split_parameters.append(parameter)
newparameters = []
for parameter in split_parameters:
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%26%22TE%26eeST%22&utm_source=test1&cpc=123&gclid=abc123
Retrieving parameters from a URL
This is not specific to Django, but for Python in general. For a Django specific answer, see this one from @jball037
Python 2:
import urlparse
url = 'https://www.example.com/some_path?some_key=some_value'
parsed = urlparse.urlparse(url)
captured_value = urlparse.parse_qs(parsed.query)['some_key'][0]
print captured_value
Python 3:
from urllib.parse import urlparse
from urllib.parse import parse_qs
url = 'https://www.example.com/some_path?some_key=some_value'
parsed_url = urlparse(url)
captured_value = parse_qs(parsed_url.query)['some_key'][0]
print(captured_value)
parse_qs
returns a list. The [0]
gets the first item of the list so the output of each script is some_value
Here's the 'parse_qs' documentation for Python 3
Python : how to set parameters for python request
Parameters that you pass into requests are specific to the URL
you are making the request to. Whatever parameters you specify has a reason for its existence and they can often be located in the API
documentation.
In this case(as provided by @chillie), they represent:
cat_id - Category on Walmart Search. (e.g. 0 (default) is all departments, 976759_976787 is 'Cookies', etc.). Either a query or a cat_id parameter is required.
ps- Determines the number of items per page. There are scenarios where Walmart overrides the ps value. By default Walmart returns 40 results.
Offset - offset value is often used to increment by x each api call, (ex. offset = x+1000, offset = x+2000, offset = x+3000, etc) until all pages retrieved.
Related Topics
How to Use a Multiprocessing.Manager()
How to Pass a User Defined Argument in Scrapy Spider
Serialize Python Dictionary to Xml
Why Don't Methods Have Reference Equality
List() Uses Slightly More Memory Than List Comprehension
Getting the Docstring from a Function
How to Convert a Numpy Array to Pil Image Applying Matplotlib Colormap
Python Method for Reading Keypress
Runtimeerror: Main Thread Is Not in Main Loop
Find P-Value (Significance) in Scikit-Learn Linearregression
Getting Individual Colors from a Color Map in Matplotlib
Python: Best Way to Add to Sys.Path Relative to the Current Running Script
Add Column to Dataframe with Constant Value
Pandas: Looking Up the List of Sheets in an Excel File