Access to Table Objects on Webpage Using Python Selenium

Access to table objects on Webpage Using Python Selenium

To scrape the Virk, Post, By, Web and Mail information you can use the Locator Strategies:

Code Block:

driver = webdriver.Chrome(executable_path=r'C:\WebDriver\ChromeDriver\chromedriver.exe', options=options)
driver.get("https://www.danskeark.dk/find-arkitekt?display_view=block_3&field_company_region=All")
virks = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@headers='view-title-1-table-column']")))]
posts = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@headers='view-field-company-zip-table-column']")))]
bys = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@headers='view-field-company-town-table-column']")))]
webs = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@headers='view-field-company-website-1-table-column']")))]
mails = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@headers='view-field-company-email-1-table-column']")))]
for i,j,k,l,m in zip(virks, posts, bys, webs, mails):
print(f"Virk:{i} Post:{j} By:{k} Web:{l} Mail:{m}")
driver.quit()

Console Output:

Virk:& Wainø IVS Post:2400 By:København NV Web: Mail:tomas@ogwaino.dk
Virk:1:1 landskab ApS Post:2500 By:Valby Web:www.1til1landskab.dk Mail:info@1til1landskab.dk
Virk:2r arkitekter ApS Post:2100 By:København Ø Web:www.2r-arkitekter.dk Mail:rs@a2rk.dk
Virk:3XN A/S Post:1437 By:København K Web:www.3xn.dk Mail:3xn@3xn.dk
Virk:A-PLAN Arkitekter Post:4800 By:Nykøbing F Web:www.a-plan.dk Mail:ka@a-plan.dk
Virk:a-tjek skive aps Post:7860 By:Spøttrup Web:www.a-tjek.dk Mail:skive@a-tjek.dk
Virk:A.BECH ApS Post:2200 By:København N Web: Mail:andreas@a-bech.com
Virk:A.R.K. Rådgivning v/Steen Stougård Post:8240 By:Risskov Web: Mail:risark@webspeed.dk
Virk:A1 Tegnestue ApS Post:5210 By:Odense NV Web:www.a1tegnestue.dk Mail:ja@a1tegnestue.dk
Virk:Aaberg Arkitekter Post:2450 By:København SV Web:www.aabergarkitekter.dk Mail:jens@aabergarkitekter.dk
Virk:Aagaard Landskab Post:2800 By:Kongens Lyngby Web:www.aagaardlandskab.dk Mail:afa@aagaardlandskab.dk
Virk:AART architects DK A/S Post:8000 By:Aarhus C Web:www.aart.dk Mail:aart@aart.dk
Virk:Abildskov Arkitekter Post:2720 By:Vanløse Web: Mail:jan.abildskov@petersen.dk
Virk:Abrahamsen - Arkitekt & Bygherrerådgivning Post:4600 By:Køge Web:www.aogb.dk Mail:ha@aogb.dk
Virk:Adam Trier Jacobsen Arkitekt & Designer ApS Post:2920 By:Charlottenlund Web: Mail:arkitekt@AdamTrier.dk
Virk:ADEPT ApS Post:2200 By:København N Web:www.adept.dk Mail:mail@adept.dk
Virk:Adham Architects I/S Post:9490 By:Pandrup Web:www.adhamarchitects.dk Mail:Info@adhamarchitects.dk
Virk:AG5 A/S Post:1304 By:København K Web:www.ag5.dk Mail:info@ag5.dk
Virk:AI A/S Post:1432 By:København K Web:www.ai.dk Mail:ai@ai.dk
Virk:Aim-Byliv Post:2000 By:Frederiksberg Web:www.aim-byliv.dk Mail:asal@aim-byliv.dk
Virk:Aj Arkitekten v/ Jan Ravn Post:8920 By:Randers NV Web:www.ajark.dk Mail:jr@ajark.dk
Virk:AK83 Arkitekter A/S Post:2635 By:Ishøj Web:www.ak83.dk Mail:ak83@ak83.dk
Virk:Albjerg & Buchardt Arkitekter ApS Post:1159 By:København K Web:www.abarkitekter.dk Mail:nanna@abarkitekter.dk
Virk:Alex Poulsen Arkitekter A/S Post:2200 By:København N Web:www.alexpoulsen.dk Mail:info@alexpoulsen.dk
Virk:Alex Rosendal`s Tegnestue Post:2100 By:København Ø Web:www.ar-tegnestue.dk Mail:alex.rosendal@mail.tele.dk
Virk:AMAS ARKITEKTER Post:2830 By:Virum Web:amasark.dk Mail:amasark@gmail.com
Virk:Anders Barslund Post:2830 By:Virum Web:andersbarslund.com Mail:hello@andersbarslund.com
Virk:Anders Brix arkitekt maa mdd professor Post:2800 By:Kongens Lyngby Web:www.andersbrix.dk Mail:ab@andersbrix.dk
Virk:Anders Jørgensen Arkitekter A/S Post:1169 By:København K Web:www.ajas.eu Mail:anders@ajas.eu
Virk:Andersen & Sigurdsson Arkitekter Post:1850 By:Frederiksberg C Web:www.a-s.dk Mail:halli@a-s.dk
Virk:Anette Meldgaard arkitekt maa Post:2800 By:Kongens Lyngby Web:www.anettemeldgaard.dk Mail:ama@anettemeldgaard.dk
Virk:Animulas Post:8220 By:Brabrand Web:animulas.com Mail:concierge@animulas.com
Virk:Anna Mette Exner Arkitektur ApS Post:8220 By:Brabrand Web:www.exnerarkitektur.dk Mail:am@exnerarkitektur.dk
Virk:Anne Stausholm Landskabsarkitekter Post:4000 By:Roskilde Web:www.annestausholm.dk Mail:afs@annestausholm.dk
Virk:aNNeKS ApS Post:4200 By:Slagelse Web:www.anneks.org Mail:mso@anneks.org
Virk:ANNOARK ApS Post:3660 By:Stenløse Web:www.annoark.dk Mail:anmo@annoark.dk
Virk:ANS Arkitektfirma Post:8643 By:Ans By Web:www.ansarkitektfirma.dk Mail:ans-arkitektfirma@post.tele.dk
Virk:AQDO, Anne Qvist Design Office Post:8000 By:Aarhus C Web:www.aqdo.dk Mail:aq@aqdo.dk
Virk:Ar-Kon ApS Post:8380 By:Trige Web:www.ar-kon.dk Mail:post@ar-kon.dk
Virk:arch wiberg Post:3050 By:Humlebæk Web:archwiberg.dk Mail:pw@archwiberg.dk
Virk:Archifield Arkitekterne ApS Post:5600 By:Faaborg Web:www.archifield.dk Mail:info@archifield.dk
Virk:Archinet KS Post:8700 By:Horsens Web: Mail:archinet@archinet.dk
Virk:Architect Mads Max Ibenfeldt Post:3150 By:Hellebæk Web:madsibenfeldt.com Mail:madsmaxibenfeldt@gmail.com
Virk:Archtrojborg Post:8320 By:Mårslet Web:arch-trojborg.dk Mail:archtrojborg@gmail.com
Virk:ArcHus Arkitektfirma ApS Post:8560 By:Kolind Web:www.new-world.dk Mail:mail@new-world.dk
Virk:ARCnordic A/S Post:3400 By:Hillerød Web:www.arcnordic.dk Mail:mail@arcnordic.dk
Virk:Arcvision ApS Post:8660 By:Skanderborg Web:www.arcvision.dk Mail:britta@arcvision.dk
Virk:Arde ApS Post:7400 By:Herning Web:www.arde.dk Mail:mail@arde.dk
Virk:Ardess ApS Post:8000 By:Aarhus C Web:www.ardess.dk Mail:ps@ardess.dk
Virk:Arends Arkitekter IVS Post:2820 By:Gentofte Web:www.arends.dk Mail:pa@arends.dk
Virk:ARK+ Post:7100 By:Vejle Web:www.ark-plus.dk Mail:arkplus.nordic@gmail.com
Virk:Arkikon ApS Post:8500 By:Grenaa Web:www.arkikon.dk Mail:info@arkikon.dk
Virk:Arkimentor ApS Post:6040 By:Egtved Web:www.arkimentor.dk Mail:hsn@arkimentor.dk
Virk:Arkiplus Post:4180 By:Sorø Web:www.arkiplus.dk Mail:info@arkiplus.dk
Virk:Arkitekt Bjarne Korsgaard Post:2830 By:Virum Web: Mail:Bjarne.Korsgaard@gmail.com
Virk:arkitekt Daniel Nielsen Post:2000 By:Frederiksberg Web:www.arkitektdn.dk Mail:daniel@arkitektdn.dk
Virk:Arkitekt Esben Colding Broe Post:7700 By:Thisted Web: Mail:esbenark@gmail.com
Virk:Arkitekt Jarl ApS Post:7100 By:Vejle Web: Mail:arkitektjarl@outlook.dk
Virk:Arkitekt Jesper Brask ApS Post:3400 By:Hillerød Web:brask-leonhardt.dk Mail:jb@brask-leonhardt.dk
Virk:Arkitekt Kristine Jensens Tegnestue Post:8000 By:Aarhus C Web:www.kristinejensen.dk Mail:kj@kristinejensen.dk
Virk:Arkitekt Lars Remfeldt ApS Post:2791 By:Dragør Web: Mail:remfeldt@mail.tele.dk
Virk:Arkitekt Lise Juel ApS Post:3100 By:Hornbæk Web:www.lisejuel.dk Mail:lj@lisejuel.dk
Virk:Arkitekt MAA Anker Ravn Knudsen Post:6630 By:Rødding Web:www.ankerravnknudsen.dk Mail:tegnestue@ankerravnknudsen.dk
Virk:Arkitekt MAA Birthe Just Post:2820 By:Gentofte Web:www.bj-ark.dk Mail:mail@bj-ark.dk
Virk:Arkitekt MAA Boe Fischer Post:5230 By:Odense M Web:www.boefischer.dk Mail:arkboe@gmail.com
Virk:Arkitekt MAA Christoffer Storm Post:2610 By:Rødovre Web: Mail:cstorm@mail.dk
Virk:Arkitekt MAA Finn Strabo Post:3150 By:Hellebæk Web:www.strabo.dk Mail:strabo@mail.tele.dk
Virk:Arkitekt MAA Jan Harboe Post:1455 By:København K Web:www.janharboe.dk Mail:janharboe@janharboe.dk
Virk:Arkitekt MAA Jens Høg Post:4621 By:Gadstrup Web: Mail:jenshogtegnestue@gmail.com
Virk:Arkitekt MAA Jens Lind Post:2100 By:København Ø Web:www.jens-lind.dk Mail:jens@jenslind.dk
Virk:Arkitekt MAA Jens Stensgaard Post:8660 By:Skanderborg Web: Mail:jens@stensgaard.dk
Virk:Arkitekt MAA John Kronborg Christensen Post:6430 By:Nordborg Web: Mail:johnkron@post7.tele.dk
Virk:Arkitekt MAA Keld Wohlert Post:2680 By:Solrød Strand Web: Mail:vw@tegnestuen-wohlert.dk
Virk:Arkitekt MAA Knud Erik Møller Post:9800 By:Hjørring Web:www.kem-arkitekter.dk Mail:kem@kem-arkitekter.dk
Virk:Arkitekt MAA Mathilde Petri Post:2830 By:Virum Web: Mail:mp@mathildepetri.dk
Virk:Arkitekt MAA Morten Kjelstrup Post:1054 By:København K Web: Mail:mk@morten-kjelstrup.dk
Virk:Arkitekt MAA Niels Vestergaard Post:8960 By:Randers SØ Web: Mail:nv.tegnestue@mail.dk
Virk:Arkitekt maa Pierre Devriel Post:4500 By:Nykøbing Sj Web:www.tegnestuen-nordkyst.com Mail:pierred@live.dk
Virk:Arkitekt MAA Steffen M.Søndergaards Tegnestue ApS Post:6600 By:Vejen Web: Mail:SMS.ARK@HOTMAIL.COM
Virk:Arkitekt maa Steffen Søby aps Post:5700 By:Svendborg Web:www.soeby.dk Mail:steffen@soeby.dk
Virk:Arkitekt MAA Tom Sjørup Post:3670 By:Veksø Sjælland Web: Mail:tomstegnestue@outlook.dk
Virk:Arkitekt MAA Torben Baltsen Post:2720 By:Vanløse Web:www.torbenbaltsen.dk Mail:arkitekt@torbenbaltsen.dk
Virk:Arkitekt MAA Tummas Niclasen Post:2830 By:Virum Web:www.niclasen.eu Mail:arkitekt@niclasen.eu
Virk:Arkitekt Michael Kornbeck Post:1432 By:København K Web:www.kornbeckbonde.dk Mail:mk@kornbeckbonde.dk
Virk:Arkitekt Stefan Vesti Brorsen Post:2300 By:København S Web:asvb.dk Mail:s@asvb.dk
Virk:arkitekt thomas riis aps Post:3905 By:Nuussuaq Web: Mail:thomas@riis.gl
Virk:Arkitekt Thomas Thomsen DanskeArk Post:7000 By:Fredericia Web:www.tt-arkitekt.dk Mail:birger@tt-arkitekt.dk
Virk:Arkitektanpartsselskabet Ole Fabricius Post:6760 By:Ribe Web:www.ole-fabricius.dk Mail:arkitekt@ole-fabricius.dk
Virk:Arkitekter Syd ApS Post:6270 By:Tønder Web:www.arkitektersyd.dk Mail:hc@arkitektersyd.dk
Virk:Arkitekterne Bahn v/Erik Bahn Post:4070 By:Kirke Hyllinge Web:www.arkitekterne-bahn.dk Mail:ark.bahn@mail.tele.dk
Virk:Arkitekterne Bahn v/Leif Bahn Post:4300 By:Holbæk Web: Mail:lbahn.ark@mail.tele.dk
Virk:Arkitekterne Fuglehuset Post:4320 By:Lejre Web: Mail:hanne@engvang.dk
Virk:Arkitekterne Holst v/Michael Holst Post:1202 By:København K Web: Mail:mh@arkitekterne-holst.dk
Virk:Arkitekterne KØGE A/S Post:4600 By:Køge Web:www.arkk.dk Mail:gc@arkk.dk
Virk:Arkitekterne Vejen A/S Post:6100 By:Haderslev Web:www.arkitekternevejen.dk Mail:dion@arkitekternevejen.dk
Virk:Arkitektfirma A/S Hune & Elkjær Post:8000 By:Aarhus C Web:www.h-e.dk Mail:ark@h-e.dk
Virk:ARKITEKTFIRMA BYDER ApS Post:2820 By:Gentofte Web:www.byder.dk Mail:post@byder.dk
Virk:Arkitektfirma Christen Justesen A/S Post:9990 By:Skagen Web:www.christenjustesen.dk Mail:arkitekt@christenjustesen.dk
Virk:Arkitektfirma Claus Jensen ApS Post:8000 By:Aarhus C Web:www.cj-arkitekter.dk Mail:claus@cj-arkitekter.dk
Virk:Arkitektfirma Knud Erik Holst MAA / DA Post:4200 By:Slagelse Web:www.arkitekt-holst.dk Mail:maa@arkitekt-holst.dk

How to extract data from both th and td tags using Selenium in Python?

You can extract data from all the 7 columns by using * or name() in the xpath. The xpath would be something like below.

rows = driver.find_elements_by_xpath("//table/tbody/tr")

cols = row.find_elements_by_xpath("./*") # Gets all the columns element within the element row. Use a Dot in the xpath to find elements within an element.
Or
cols = row.find_elements_by_xpath("./*[name()='th' or name()='td']") # Gets all the column elements with tag name "th" or "td" within the element row.

Try like below:

# Get the rows
rows = driver.find_elements_by_xpath("//table/tbody/tr")

# Iterate over the rows
for row in rows:
# Get all the columns for each row.
# cols = row.find_elements_by_xpath("./*")
cols = row.find_elements_by_xpath("./*[name()='th' or name()='td']")
temp = [] # Temproary list
for col in cols:
temp.append(col.text)
print(temp)
['']
['State', 'Filed week ended', 'Initial Claims', 'Reflecting Week Ended', 'Continued Claims', 'Covered Employment', 'Insured Unemployment Rate']
['Alabama', '01/04/2020', '4,578', '12/28/2019', '18,523', '1,923,741', '0.96']
['Alabama', '01/11/2020', '3,629', '01/04/2020', '21,143', '1,923,741', '1.10']
['Alabama', '01/18/2020', '2,483', '01/11/2020', '17,402', '1,923,741', '0.90']
...

Python Selenium: How do I print the values from a website in a text file?

To grab the six values from the website TULSASPCA and print them in a text file you need to induce WebDriverWait for the visibility_of_all_elements_located() and then using List Comprehension you can create a list and subsequently create a DataFrame and finally export the values to a TEXT file excluding the Index using the following Locator Strategies:

Code Block:

driver.get("https://tulsaspca.org/")
driver.execute_script("window.scrollTo(0, 250)")
# read into a DataFrame
df = pd.DataFrame([my_elem.get_attribute("data-to") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='number']")))])
# Exporting as TEXT file excluding the Index
df.to_csv("C:\\Data_Files\\output_files\\new_text_marks.txt", index=False)
driver.quit()

Snapshot of Output Text file:

panda_output_text

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

PS: You may like to drop the first row from the DataFrame

How to print the hidden product names using Selenium and Python

The product names are contained within the alt attribute of the <img> elements.



Solution

To extract the product names using List Comprehension you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get("https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dcomputers-intl-ship&field-keywords=&ref=nb_sb_noss&crid=JKZZMSQ4Q71E&sprefix=%2Ccomputers-intl-ship%2C114")
    print([my_elem.get_attribute("alt") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span[data-component-type='s-product-image']>a>div>img")))])
  • Using XPATH:

    driver.get("https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dcomputers-intl-ship&field-keywords=&ref=nb_sb_noss&crid=JKZZMSQ4Q71E&sprefix=%2Ccomputers-intl-ship%2C114")
    print([my_elem.get_attribute("alt") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@data-component-type='s-product-image']/a/div/img")))])
  • Console Output:

    ['Apple Pencil (2nd Generation)', 'Sceptre 24" Professional Thin 75Hz 1080p LED Monitor 2x HDMI VGA Build-in Speakers, Machine Black (E248W-19203R Series)', 'Roku Streaming Stick 4K 2021 | Streaming Device 4K/HDR/Dolby Vision with Roku Voice Remote and TV Controls', 'Original HP 67XL Black High-yield Ink Cartridge | Works with HP DeskJet 1255, 2700, 4100 Series, HP ENVY 6000, 6400 Series...', 'Seagate Portable 2TB External Hard Drive Portable HDD – USB 3.0 for PC, Mac, PlayStation, & Xbox - 1-Year Rescue Service (...', 'Logitech MK270 Wireless Keyboard and Mouse Combo for Windows, 2.4 GHz Wireless, Compact Mouse, 8 Multimedia and Shortcut K...', 'Original HP 67 Black/Tri-color Ink Cartridges (2-pack) | Works with HP DeskJet 1255, 2700, 4100 Series, HP ENVY 6000, 6400...', 'HP 24mh FHD Monitor - Computer Monitor with 23.8-Inch IPS Display (1080p) - Built-In Speakers and VESA Mounting - Height/T...', 'iPhone Charger, TAKAGI Lightning Cable 3PACK 6FT Nylon Braided USB Charging Cable High Speed Data Sync Transfer Cord Compa...', 'Original HP 63XL Black High-yield Ink Cartridge | Works with HP DeskJet 1112, 2130, 3630 Series; HP ENVY 4510, 4520 Serie...', 'Roku Express 4K+ 2021 | Streaming Media Player HD/4K/HDR with Smooth Wireless Streaming and Roku Voice Remote with TV Cont...', 'Logitech C920x HD Pro Webcam, Full HD 1080p/30fps Video Calling, Clear Stereo Audio, HD Light Correction, Works with Skyp...']

How to get text from a html using Selenium and Python which has two elements with the same classname where I need to extract both

As per the HTML:

<div class='mesage-in'> cool text here </div>
<div class='mesage-in'> bad text here </div>

The following line line of code:

texto = navegador.find_element_by_class_name('message-in').text

will always identify the first matching element, extract the text and assign it to texto. So when you try to print texto, the text of the very first element i.e. cool text here is printed.



Solution

You can get all elements with same classname i.e. mesage-in and put on a list as follows:

from selenium.webdriver.common.by import By
texto = navegador.find_elements(By.CLASS_NAME, 'message-in')

Now you can print the desired texts with respect to their index as follows:

  • To print cool text here:

    print(texto[0].text) # prints-> cool text here
  • To print bad text here:

    print(texto[1].text) # prints-> bad text here


Outro

You can also crate a list of the texts using List Comprehension and print them as follows:

texto = [my_elem.text for my_elem in driver.find_elements(By.CLASS_NAME, "message-in")]
print(texto[0]) # prints-> cool text here
print(texto[1]) # prints-> bad text here

Scraping and writing the table into dataframe shows me TypeError

To scrape all the data from all the columns you need to induce WebDriverWait for the visibility_of_element_located() of the <table> element, extract the outerHTML, read the outerHTML using read_html() and you can use the following Locator Strategies:

  • Code Block:

    driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
    WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
    data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#sites"))).get_attribute("outerHTML")
    df = pd.read_html(data)
    print(df)
    driver.quit()
  • Console Output:

    [  FAMI-QS Number                             Site Name              City  ... Status Certified from Expiry date
    0 FAM-1293 AmTech Ingredients albert lea ... Valid 2020-10-08 2023-10-07
    1 FAM-0841 3F FEED & FOOD S L vizcolozano ... Valid 2020-04-17 2023-04-16
    2 FAM-1361 5N Plus Additives GmbH eisenhüttenstadt ... Valid 2020-10-01 2023-09-30
    3 FAM-1301-01 A & V Corp. Limited xiamen ... Valid 2020-09-09 2023-09-08
    4 FAM-1146 A. + E. Fischer-Chemie GmbH & Co. KG wiesbaden ... Valid 2020-06-05 2023-06-04
    5 FAM-1589 A.M FOOD CHEMICAL CO LIMITED jinan ... Valid 2020-01-07 2023-01-06
    6 FAM-0613-01 A.W.P. S.r.l crevalcore ... Valid 2020-02-27 2023-02-07
    7 FAM-0867 AB AGRI POLSKA Sp. z o.o. smigiel ... Valid 2020-08-03 2023-03-19
    8 FAM-1510-02 AB Vista marlborough ... Valid 2020-04-16 2023-04-15
    9 FAM-1510-01 AB Vista * rotterdam ... Valid 2020-04-16 2023-04-15

    [10 rows x 7 columns]]


Related Topics



Leave a reply



Submit