Add Params to Given Url in Python

Add params to given URL in Python

There are a couple of quirks with the urllib and urlparse modules. Here's a working example:

try:
import urlparse
from urllib import urlencode
except: # For Python 3
import urllib.parse as urlparse
from urllib.parse import urlencode

url = "http://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}

url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)

url_parts[4] = urlencode(query)

print(urlparse.urlunparse(url_parts))

ParseResult, the result of urlparse(), is read-only and we need to convert it to a list before we can attempt to modify its data.

How to add custom parameters to an URL query string with Python?

You can use urlsplit() and urlunsplit() to break apart and rebuild a URL, then use urlencode() on the parsed query string:

from urllib import urlencode
from urlparse import parse_qs, urlsplit, urlunsplit

def set_query_parameter(url, param_name, param_value):
"""Given a URL, set or replace a query parameter and return the
modified URL.

>>> set_query_parameter('http://example.com?foo=bar&biz=baz', 'foo', 'stuff')
'http://example.com?foo=stuff&biz=baz'

"""
scheme, netloc, path, query_string, fragment = urlsplit(url)
query_params = parse_qs(query_string)

query_params[param_name] = [param_value]
new_query_string = urlencode(query_params, doseq=True)

return urlunsplit((scheme, netloc, path, new_query_string, fragment))

Use it as follows:

>>> set_query_parameter("/scr.cgi?q=1&ln=0", "SOMESTRING", 1)
'/scr.cgi?q=1&ln=0&SOMESTRING=1'

Python requests call with URL using parameters

you will need to URL encode the URL you are sending to the API.

The reason for this is that the ampersands are interpretted by the server as markers for parameters for the URL https://extraction.import.io/query/extractor/XXX?

This is why they are getting stripped in the url:

http://www.example.co.uk/items.php?sortby=Price_LH

Try the following using urllib.quote(row_dict['url']):

import requests
import json
import urllib

row_dict = {
'url': u'http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35',
'crawler_id': u'zzz'}
url_call = 'https://extraction.import.io/query/extractor/{0}?_apikey={1}&url={2}'.format(
row_dict['crawler_id'], auth_key, urllib.quote(row_dict['url']))
r = requests.get(url_call)
rr = json.loads(r.content)

Adding Parameter to URL to Iterate in Python

You can use a for loop, with the range function to create a list of ids:

#/bin/python3
import requests

base_url = "https://example.com/api/"
headers = {"Accept": "application/json", "Authorization": "Bearer 123456"}
ids = [1,2,3,4] # or range(5)
for id in ids:
response = requests.request("POST", base_url + str(id), headers=headers)
print(response.text)

(working example: https://replit.com/@LukeStorry/67988932)

Python: How to only URL Encode a specific URL Parameter?

split the string on the &q=/ part and only encode the last string

from urllib import parse

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
encoded = parse.quote_plus(url.split("&q=/")[1])
encoded_url = f"{url.split('&q=/')[0]}&q=/{encoded}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22

Note that there's a difference between this and the requested output, but you have an url encoded space (%20) at the end


EDIT

Comment shows a different need for the encoding, so the code needs to change a bit. The code below only encodes the part after &q=. Basically, first split the url and the parameters, then iterate through the parameters to find the q= parameter, and encode that part. Do some f-string and join magic and you get an url that has the q parameter encoded. Note that this might have issues if an & is present in the part that needs to be encoded.

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
newparameters = []
for parameter in parameters.split("&"):
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22&utm_source=test1&cpc=123&gclid=abc123

EDIT 2

Trying to solve the edge case where there's a & character in the string to be encoded, as this messes up the string.split("&").

I tried using urllib.parse.parse_qs() but this has the same issue with the & character. Docs for reference.

This question is a nice example of how edge cases can mess up simple logic and make it overly complicated.

The RFC3986 also didn't specify any limitations on the name of the query string, otherwise that could've been used to narrow down possible errors even more.

updated code

from urllib import parse

url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/&"TE&eeST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")

# addition to handle & in the querystring.
# it reduces errors, but it can still mess up if there's a = in the part to be encoded.
split_parameters = []
for index, parameter in enumerate(parameters.split("&")):
if "=" not in parameter:
# add this part to the previous entry in split_parameters
split_parameters[-1] += f"&{parameter}"
else:
split_parameters.append(parameter)

newparameters = []
for parameter in split_parameters:
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)

output

https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%26%22TE%26eeST%22&utm_source=test1&cpc=123&gclid=abc123

Retrieving parameters from a URL

This is not specific to Django, but for Python in general. For a Django specific answer, see this one from @jball037

Python 2:

import urlparse

url = 'https://www.example.com/some_path?some_key=some_value'
parsed = urlparse.urlparse(url)
captured_value = urlparse.parse_qs(parsed.query)['some_key'][0]

print captured_value

Python 3:

from urllib.parse import urlparse
from urllib.parse import parse_qs

url = 'https://www.example.com/some_path?some_key=some_value'
parsed_url = urlparse(url)
captured_value = parse_qs(parsed_url.query)['some_key'][0]

print(captured_value)

parse_qs returns a list. The [0] gets the first item of the list so the output of each script is some_value

Here's the 'parse_qs' documentation for Python 3

Python : how to set parameters for python request

Parameters that you pass into requests are specific to the URL you are making the request to. Whatever parameters you specify has a reason for its existence and they can often be located in the API documentation.

In this case(as provided by @chillie), they represent:

cat_id - Category on Walmart Search. (e.g. 0 (default) is all departments, 976759_976787 is 'Cookies', etc.). Either a query or a cat_id parameter is required.

ps- Determines the number of items per page. There are scenarios where Walmart overrides the ps value. By default Walmart returns 40 results.

Offset - offset value is often used to increment by x each api call, (ex. offset = x+1000, offset = x+2000, offset = x+3000, etc) until all pages retrieved.



Related Topics



Leave a reply



Submit