Google Search from a Python App

Google search from python app

Jeremy Banks is right. If you write dictionary[str(lineToRead)].append(str(i)) without first initializing a value for dictionary[str(lineToRead)] you will get an error.

It looks like you have an additional bug. The value of lineToRead will always be mouse, since you have already looped through and closed your input file before searching for anything. Likely, you want to loop thru every word in inputFile (i.e. cat, dog, bird, mouse)

To fix this, we can write the following (assuming you want to keep a list of query strings as values in the dictionary for each search term):

for line in inputFile.read().splitlines(): # loop through each line in input file
lineToRead = line
dictionary[str(lineToRead)] = [] #initialize to empty list
for i in gs.top_urls():
print i # check to make sure this is printing out url's
compare2 = i
if compare in compare2: # compare the two url's
dictionary[str(lineToRead)].append(str(i)) #write out query string to dictionary key & append the urls
inputfile.close()

You can delete the for loop you wrote for 'testing' the inputFile.

Automate google play search items in a list

I've written a little demo that may help you to achieve your goal. I used requests and Beautiful Soup. It's not exactly what you wanted but it can be adapted easily.

import requests
import bs4

company_name = "airbnb"
def get_company(company_name):
r = requests.get("https://play.google.com/store/search?q="+company_name)
soup = bs4.BeautifulSoup(r.text, "html.parser")
subtitles = soup.findAll("a", {'class':"subtitle"})
dev_urls = []
for title in subtitles:
try:
text = title.attrs["title"].lower()
#Sometimes there is a subtitle without any text on GPlay
#Catchs the error
except KeyError:
continue
if company_name in text:
url = "https://play.google.com" + title.attrs["href"]
dev_urls.append(url)
return dev_urls

def get_company_apps_url(dev_url):
r = requests.get(dev_url)
soup = bs4.BeautifulSoup(r.text, "html.parser")
titles = soup.findAll("a", {"class":"title"})
return ["https://play.google.com"+title.attrs["href"] for title in titles]

def get_app_category(app_url):
r = requests.get(app_url)
soup = bs4.BeautifulSoup(r.text, "html.parser")
developer_name = soup.find("span", {"itemprop":"name"}).text
app_name = soup.find("div", {"class":"id-app-title"}).text
category = soup.find("span", {"itemprop":"genre"}).text
return (developer_name, app_name, category)

dev_urls = get_company("airbnb")
apps_urls = get_company_apps_url(dev_urls[0])
get_app_category(apps_urls[0])

>>> get_company("airbnb")
['https://play.google.com/store/apps/developer?id=Airbnb,+Inc']
>>> get_company_apps_url("https://play.google.com/store/apps/developer?id=Airbnb,+Inc")
['https://play.google.com/store/apps/details?id=com.airbnb.android']
>>> get_app_category("https://play.google.com/store/apps/details?id=com.airbnb.android")
('Airbnb, Inc', 'Airbnb', 'Travel & Local')

My script with google

dev_urls = get_company("google")
apps_urls = get_company_apps_url(dev_urls[0])
for app in apps_urls:
print(get_app_category(app))

('Google Inc.', 'Google Duo', 'Communication')
('Google Inc.', 'Google Translate', 'Tools')
('Google Inc.', 'Google Photos', 'Photography')
('Google Inc.', 'Google Earth', 'Travel & Local')
('Google Inc.', 'Google Play Games', 'Entertainment')
('Google Inc.', 'Google Calendar', 'Productivity')
('Google Inc.', 'YouTube', 'Media & Video')
('Google Inc.', 'Chrome Browser - Google', 'Communication')
('Google Inc.', 'Google Cast', 'Tools')
('Google Inc.', 'Google Sheets', 'Productivity')

How can I get Google search results with Python 3?

The maximum number of results that can be obtained from this api is 8 results for each query.

You get it by adding a "&rsz=large" to the url:

url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s&rsz=large' % query    

There is another useful argument "start=" which let you move in the result set.
So basically you may loop on you can ask 1st block of 8 results, second block and so on(start=1, start=8 and so on).

url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&%s&start=%d' % (query, i)    

In any case please note that this api is deprecated (https://developers.google.com/web-search/docs/)

How to scrape all App Store apps on a Google Play Search

I'm getting following output from the url

from bs4 import BeautifulSoup
import requests

url='https://play.google.com/store/search?q=weather%20app'
req=requests.get(url)

soup = BeautifulSoup(req.content, 'html.parser')

cards= soup.find_all("div",class_="vU6FJ p63iDd")

for card in cards:
app_name= card.find("div",class_="WsMG1c nnK0zc").text
company = card.find("div",class_="KoLSrc").text
print("Name: " + app_name)
print("Company: " + company)

Output:

Name: Weather app
Company: Accurate Weather Forecast & Weather Radar Map
Name: AccuWeather: Weather Radar
Company: AccuWeather
Name: Weather Forecast - Accurate Local Weather & Widget
Company: Weather Forecast & Widget & Radar
Name: 1Weather Forecasts & Radar
Company: OneLouder Apps
Name: MyRadar Weather Radar
Company: ACME AtronOmatic LLC
Name: Weather data & microclimate : Weather Underground
Company: Weather Underground
Name: Weather & Widget - Weawow
Company: weawow weather app
Name: Weather forecast
Company: smart-pro android apps
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: The Mobile Mind Shift: Engineer Your Business to Win in the Mobile Moment
Company: Julie Ask
Name: Together: The Healing Power of Human Connection in a Sometimes Lonely World
Company: Vivek H. Murthy
Name: The Meadow
Company: James Galvin
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: Chaos Theory
Company: Introbooks Team
Name: Survival Training: Killer Tips for Toughness and Secret Smart Survival Skills
Company: Wesley Jones
Name: Kiasunomics 2: Economic Insights for Everyday Life
Company: Ang Swee Hoon
Name: Summary of We Are The Weather by Jonathan Safran Foer
Company: QuickRead
Name: Learn Swift by Building Applications: Explore Swift programming through iOS app development
Company: Emil Atanasov
Name: Weather Hazard Warning Application in Car-to-X Communication: Concepts, Implementations, and Evaluations
Company: Attila Jaeger
Name: Mobile App Development with Ionic, Revised Edition: Cross-Platform Apps with Ionic,
Angular, and Cordova
Company: Chris Griffith
Name: Good Application Makes a Good Roof Better: A Simplified Guide: Installing Laminated
Asphalt Shingles for Maximum Life & Weather Protection
Company: ARMA Asphalt Roofing Manufacturers Association
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: Space Physics and Aeronomy, Space Weather Effects and Applications
Company: Book 5
Name: How to Build Android Apps with Kotlin: A hands-on guide to developing, testing, and
publishing your first apps with Android
Company: Alex Forrester
Name: Android 6 for Programmers: An App-Driven Approach, Edition 3
Company: Paul J. Deitel

Google Search Web Scraping with Python

You can always directly scrape Google results. To do this, you can use the URL https://google.com/search?q=<Query> this will return the top 10 search results.

Then you can use lxml for example to parse the page. Depending on what you use, you can either query the resulting node tree via a CSS-Selector (.r a) or using a XPath-Selector (//h3[@class="r"]/a)

In some cases the resulting URL will redirect to Google. Usually it contains a query-parameter qwhich will contain the actual request URL.

Example code using lxml and requests:

from urllib.parse import urlencode, urlparse, parse_qs

from lxml.html import fromstring
from requests import get

raw = get("https://www.google.com/search?q=StackOverflow").text
page = fromstring(raw)

for result in page.cssselect(".r a"):
url = result.get("href")
if url.startswith("/url?"):
url = parse_qs(urlparse(url).query)['q']
print(url[0])

A note on google banning your IP: In my experience, google only bans
if you start spamming google with search requests. It will respond
with a 503 if Google thinks you are bot.

how to create a django app that performs a google search

following @berkeeb answer I changed the code in this way.

I created a forms.py file with:

from django import forms

class SearchForm(forms.Form):
search = forms.CharField(required=True, max_length=255, label="search")

in my template (home.html) I used:

<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">

<div class="container text-center">
<h1>my search engine</h1>
<a href="{% url 'about' %}">About page</a>
<br>
<br>
<form action="{% url 'search' %}">
<label for="search_text">Your search: </label>
<input id="search_text" type="text" name="search_text">
<input type="submit" value="Search">
</form>
</div>

and finally in the search function I wrote:

def search(request):
form = SearchForm(request.GET)
search_text = form.data["search_text"] # now you can access input
urls = searchWeb(num=5, stop=5, query_string=search_text)

threads = [threading.Thread(target=getSavePage, args=(url,)) for url in urls]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
return render(request, "engine/search.html", {"search": urls})

basically I had to remove the validation part of the form as I kept receiving status unknown.



Related Topics



Leave a reply



Submit