Web Scraping Python fails to load the url on button.click() - python-3.x

The CSV file contains the names of the countries used. However, after Argentina, it fails to recover the url. And it returns a empty string.
country,country_url
Afghanistan,https://openaq.org/#/locations?parameters=pm25&countries=AF&_k=tomib2
Algeria,https://openaq.org/#/locations?parameters=pm25&countries=DZ&_k=dcc8ra
Andorra,https://openaq.org/#/locations?parameters=pm25&countries=AD&_k=crspt2
Antigua and Barbuda,https://openaq.org/#/locations?parameters=pm25&countries=AG&_k=l5x5he
Argentina,https://openaq.org/#/locations?parameters=pm25&countries=AR&_k=962zxt
Australia,
Austria,
Bahrain,
Bangladesh,
The country.csv looks like this:
Afghanistan,Algeria,Andorra,Antigua and Barbuda,Argentina,Australia,Austria,Bahrain,Bangladesh,Belgium,Bermuda,Bosnia and Herzegovina,Brazil,
The code used is:
driver = webdriver.Chrome(options = options, executable_path = driver_path)
url = 'https://openaq.org/#/locations?parameters=pm25&_k=ggmrvm'
driver.get(url)
time.sleep(2)
# This function opens .csv file that we created at the first stage
# .csv file includes names of countries
with open('1Countries.csv', newline='') as f:
reader = csv.reader(f)
list_of_countries = list(reader)
list_of_countries = list_of_countries[0]
print(list_of_countries) # printing a list of countries
# Let's create Data Frame of the country & country_url
df = pd.DataFrame(columns=['country', 'country_url'])
# With this function we are generating urls for each country page
for country in list_of_countries[:92]:
try:
path = ('//span[contains(text(),' + '\"' + country + '\"' + ')]')
# "path" is used to filter each country on the website by
# iterating country names.
next_button = driver.find_element_by_xpath(path)
next_button.click()
# Using "button.click" we are get on the page of next country
time.sleep(2)
country_url = (driver.current_url)
# "country_url" is used to get the url of the current page
next_button.click()
except:
country_url = None
d = [{'country': country, 'country_url': country_url}]
df = df.append(d)
I've tried increasing the sleep time, not sure what is leading to this?

The challenge you face is that the country list is scrollalble:
A bit convenient that your code stops working when they're not displayed.
It's a relatively easy solution - You need to scroll it into view. I've made a quick test with your code to confirm it's working. I removed the CSV part, hard coded a country that's further down the list and I've the parts to make it scroll to view:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
def ScrollIntoView(element):
actions = ActionChains(driver)
actions.move_to_element(element).perform()
url = 'https://openaq.org/#/locations?parameters=pm25&_k=ggmrvm'
driver = webdriver.Chrome()
driver.get(url)
driver.implicitly_wait(10)
country = 'Bermuda'
path = ('//span[contains(text(),' + '\"' + country + '\"' + ')]')
next_button = driver.find_element_by_xpath(path)
ScrollIntoView(next_button) # added this
next_button.click()
time.sleep(2)
country_url = (driver.current_url)
print(country_url) # added this
next_button.click()
This is the output from the print:
https://openaq.org/#/locations?parameters=pm25&countries=BM&_k=7sp499
You happy to merge that into your solution? (just say if you need more support)
If it helps a reason you didn't notice for yourself is that try was masking a NotInteractableException. Have a look at how to handle errors here
try statements are great and useful - but it's also good to track when the occur so you can fix them later. Borrowing some code from that link, you can try something like this in your catch:
except:
print("Unexpected error:", sys.exc_info()[0])

Related

Output from web scraping with bs4 returns empty lists

I am trying to scrape specific information from a website of 25 pages but when I run my code i get empty lists. My output is supposed to be dictionary with the specific information scraped. Please any help would be appreciated.
# Loading libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import mitosheet
# Assigning column names using class_ names
name_selector = "af885_1iPzH"
old_price_selector = "f6eb3_1MyTu"
new_price_selector = "d7c0f_sJAqi"
discount_selector = "._6c244_q2qap"
# Placeholder list
data = []
# Looping over each page
for i in range(1,26):
url = "https://www.konga.com/category/phones-tablets-5294?brand=Samsung&page=" +str(i)
website = requests.get(url)
soup = BeautifulSoup(website.content, 'html.parser')
name = soup.select(name_selector)
old_price = soup.select(old_price_selector)
new_price = soup.select(new_price_selector)
discount = soup.select(discount_selector)
# Combining the elements into a zipped list to be able to pull the data simultaneously
for names, old_prices, new_prices, discounts in zip(name, old_price, new_price, discount):
dic = {"Phone Names": names.getText(),"New Prices": new_prices.getText(),"Old Prices": old_prices.getText(),"Discounts": discounts.getText()}
data.append(dic)
data
I tested the below and it works for me getting 40 name values.
I wasn't able to get the values using beautiful soup but directly through selenium.
If you decide to use Chrome and PyCharm as I have then:
Open Chrome. Click on three dots near top right. Click on Settings then About Chrome to see the version of your Chrome. Download the corresponding driver here. Save the driver in the PyCharm PATH folder
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
# Assigning column names using class_ names
name_selector = "af885_1iPzH"
# Looping over each page
for i in range(1, 27):
url = "https://www.konga.com/category/phones-tablets-5294?brand=Samsung&page=" +str(i)
driver.get(url)
xPath = './/*[#class="' + name_selector + '"]'
name = driver.find_elements(By.XPATH, xPath)

Stuck in loop <> Code doesn't want to pull anything except row 1

I am stuck in loop, I don't know what to change to make my code work normally...
problem is with CSV file, my file contains list of domains (freedommortgage.com, google.com, amd.com etc.) so when I run code, everything is fine at start, but then it keeps sending me same results all over:
the monthly total visits to freedommortgage.com is 1.10M
So here is my line:
import csv
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib
from captcha2upload import CaptchaUpload
import time
# setting the firefox driver
def init_driver():
driver = webdriver.Firefox(executable_path=r'C:\Users\muki\Desktop\similarweb_scrapper-master\geckodriver.exe')
driver.implicitly_wait(10)
return driver
# solving the captcha (with 2captcha.com)
def captcha_solver(driver):
captcha_src = driver.find_element_by_id('recaptcha_challenge_image').get_attribute("src")
urllib.urlretrieve(captcha_src, "captcha.jpg")
captcha = CaptchaUpload("4cfd308fd703d40291a7e250d743ca84") # 2captcha API KEY
captcha_answer = captcha.solve("captcha.jpg")
wait = WebDriverWait(driver, 10)
captcha_input_box = wait.until(
EC.presence_of_element_located((By.ID, "recaptcha_response_field")))
captcha_input_box.send_keys(captcha_answer)
driver.implicitly_wait(10)
captcha_input_box.submit()
# inputting the domain in similar web search box and finding necessary values
def lookup(driver, domain, short_method):
# short method - inputting the domain in the url
if short_method:
driver.get("https://www.similarweb.com/website/" + domain)
else:
driver.get("https://www.similarweb.com")
attempt = 0
# trying 3 times before quiting (due to second refresh by the website that clears the search box)
while attempt < 1:
try:
captcha_body_page = driver.find_elements_by_class_name("block-page")
driver.implicitly_wait(10)
if captcha_body_page:
print("Captcha ahead, solving the captcha, it may take a few seconds")
captcha_solver(driver)
print("Captcha solved! the program will continue shortly")
time.sleep(20) # to prevent second refresh affecting the upcoming elements finding after captcha solved
# for normal method, inputting the domain in the searchbox instead of url
if not short_method:
input_element = driver.find_element_by_id("js-swSearch-input")
input_element.click()
input_element.send_keys(domain)
input_element.submit()
wait = WebDriverWait(driver, 10)
time.sleep(10)
total_visits = wait.until(
EC.presence_of_element_located((By.XPATH, "//span[#class='engagementInfo-valueNumber js-countValue']")))
total_visits_line = "the monthly total visits to %s is %s" % (domain, total_visits.text)
time.sleep(10)
print('\n' + total_visits_line)
except TimeoutException:
print("Box or Button or Element not found in similarweb while checking %s" % domain)
attempt += 1
print("attempt number %d... trying again" % attempt)
# main
if __name__ == "__main__":
with open('bigdomains.csv', 'rt') as f:
reader = csv.reader(f)
driver = init_driver()
for row in reader:
domain = row[0]
lookup(driver, domain, True) # user need to give as a parameter True or False, True will activate the
# short method, False will take the normal method
(Sorry for the long line of code, but I have to present everything, even tho focus is on the LAST PART of the code)
My question is simple:
Why does it keep taking row number 1 domain, and ignoring the row2 row3 row4, etc...?
Time = delay has to be 10, or more, to avoid captcha issue on this website
if anyone would try to run this, you have to edit name of csv file, and to have few domains in it in format google.com (not www.google.com) of course.
Looks like you're always accessing the same index everytime with:
domain = row[0]
Index 0 is the first item, hence why you keep getting the same value.
This post explains an alternative way to use a for loop in Python.
Accessing the index in 'for' loops?

Is it possible to move through a HTML Table and grab the data within w/ BeautifulSoup4?

So for a project, I'm working on creating an API to interface with my School's course-finder and I'm struggling to grab the data from the a HTML table they store the data in without using Selenium. I was able to pull the HTML data initially using Selenium but my Instructor says he would prefer if I used BeautifulSoup4 & MechanicalSoup libraries. I got as far as submitting a search and grabbing the HTML table the data is stored in. I'm not sure how to iterate through the data stored in the HTML table as I did with my Selenium code below.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
Chrome_Options = Options()
Chrome_Options.add_argument("--headless") #allows program to run without opening a chrome window
driver = webdriver.Chrome()
driver.get("https://winnet.wartburg.edu/coursefinder/") #sets the Silenium driver
select = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder1_FormView1_DropDownList_Term"))
term_options = select.options
#for index in range(0, len(term_options) - 1):
# select.select_by_index(index)
lst = []
DeptSelect = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder1_FormView1_DropDownList_Department"))
DeptSelect.select_by_visible_text("History") #finds the desiered department
search = driver.find_element_by_name("ctl00$ContentPlaceHolder1$FormView1$Button_FindNow")
search.click() #sends query
table_id = driver.find_element_by_id("ctl00_ContentPlaceHolder1_GridView1")
rows = table_id.find_elements_by_tag_name("tr")
for row in rows: #creates a list of lists containing our data
col_lst = []
col = row.find_elements_by_tag_name("td")
for data in col:
lst.append(data.text)
def chunk(l, n): #class that partitions our lists neatly
print("chunking...")
for i in range(0, len(l), n):
yield l[i:i + n]
n = 16 #each list contains 16 items regardless of contents or search
uberlist = list(chunk(lst, n)) #call chunk fn to partion list
with open('class_data.txt', 'w') as handler: #output of scraped data
print("writing file...")
for listitem in uberlist:
handler.write('%s\n' % listitem)
driver.close #ends and closes Silenium control over brower
This is my Soup Code and I'm wondering how I can take the data from the HTML in a similar way I did above with my Selenium.
import mechanicalsoup
import requests
from lxml import html
from lxml import etree
import pandas as pd
def text(elt):
return elt.text_content().replace(u'\xa0', u' ')
#This Will Use Mechanical Soup to grab the Form, Subit it and find the Data Table
browser = mechanicalsoup.StatefulBrowser()
winnet = "http://winnet.wartburg.edu/coursefinder/"
browser.open(winnet)
Searchform = browser.select_form()
Searchform.choose_submit('ctl00$ContentPlaceHolder1$FormView1$Button_FindNow')
response1 = browser.submit_selected() #This Progresses to Second Form
dataURL = browser.get_url() #Get URL of Second Form w/ Data
dataURL2 = 'https://winnet.wartburg.edu/coursefinder/Results.aspx'
pageContent=requests.get(dataURL2)
tree = html.fromstring(pageContent.content)
dataTable = tree.xpath('//*[#id="ctl00_ContentPlaceHolder1_GridView1"]')
rows = [] #initialize a collection of rows
for row in dataTable[0].xpath(".//tr")[1:]: #add new rows to the collection
rows.append([cell.text_content().strip() for cell in row.xpath(".//td")])
df = pd.DataFrame(rows) #load the collection to a dataframe
print(df)
#XPath to Table
#//*[#id="ctl00_ContentPlaceHolder1_GridView1"]
#//*[#id="ctl00_ContentPlaceHolder1_GridView1"]/tbody
Turns out I was able passing the wrong thing when using MechanicalSoup. I was able to pass the new page's contents to a variable called table had the page use .find('table') to retrieve the table HTML rather than the full page's HTML. From there just used table.get_text().split('\n') to make essentially a giant list of all of the rows.
I also dabble with setting form filters which worked as well.
import mechanicalsoup
from bs4 import BeautifulSoup
#Sets StatefulBrowser Object to winnet then it it grabs form
browser = mechanicalsoup.StatefulBrowser()
winnet = "http://winnet.wartburg.edu/coursefinder/"
browser.open(winnet)
Searchform = browser.select_form()
#Selects submit button and has filter options listed.
Searchform.choose_submit('ctl00$ContentPlaceHolder1$FormView1$Button_FindNow')
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$TextBox_keyword', "") #Keyword Searches by Class Title. Inputting string will search by that string ignoring any stored nonsense in the page.
#ACxxx Course Codes have 3 spaces after them, THIS IS REQUIRED. Except the All value for not searching by a Department does not.
Searchform.set("ctl00$ContentPlaceHolder1$FormView1$DropDownList_Department", 'All') #For Department List, it takes the CourseCodes as inputs and displays as the Full Name
Searchform.set("ctl00$ContentPlaceHolder1$FormView1$DropDownList_Term", "2020 Winter Term") # Term Dropdown takes a value that is a string. String is Exactly the Term date.
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$DropDownList_MeetingTime', 'all') #Takes the Week Class Time as a String. Need to Retrieve list of options from pages
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$DropDownList_EssentialEd', 'none') #takes a small string signialling the EE req or 'all' or 'none'. None doesn't select and option and all selects all coruses w/ a EE
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$DropDownList_CulturalDiversity', 'none')# Cultural Diversity, Takes none, C, D or all
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$DropDownList_WritingIntensive', 'none') # options are none or WI
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$DropDownList_PassFail', 'none')# Pass/Faill takes 'none' or 'PF'
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$CheckBox_OpenCourses', False) #Check Box, It's True or False
Searchform.set('ctl00$ContentPlaceHolder1$FormView1$DropDownList_Instructor', '0')# 0 is for None Selected otherwise it is a string of numbers (Instructor ID?)
#Submits Page, Grabs results and then launches a browser for test purposes.
browser.submit_selected()# Submits Form. Retrieves Results.
table = browser.get_current_page().find('table') #Finds Result Table
print(type(table))
rows = table.get_text().split('\n') # List of all Class Rows split by \n.

Python, Selenium, Pandas DataFrame and Excel

I am having trouble piecing together the last part of a puzzle. The entire code is shown below, which includes a non-essential username and password to a site where I am scraping data.
After looping through part numbers from an Excel file using
pd.read_excel()
Selenium is used to scrape various items of the website in question; the code then writes these values to the output window successfully.
As opposed to writing the data to an output window, I aim to write to the same Excel file I am pulling data from, writing it to the appropriate columns.
In the final for loop of the code, I initially tried to write the variables (which were printing to the screen) to Excel by appending
.to_excel('filePathHere')
to the variable in question. As an example, I attempted
description.to_excel('pathToFile/output.xlsx')
Which yield an error of EOL while scanning string literal (<string>, line 1)
I then thought, maybe this variable needs to be converted to a DataFrame, so I then tried
description_DataFrame = pd.DataFrame(description)
description_DataFrame.to_excel('pathToFile/output.xlsx')
which resulted in the same error message.
I am not even sure if this is the correct logic to write each item to the existing (or new) file. If it is, I found an explanation on how to deal with long strings here: StackOverFlow EOL Error but none of my data constitutes as long strings, so I can't see how that applies.
I then start to think I might need to create a dictionary, and then append to it.
So I then removed any attempts from above and tried:
description = []
description.append(mfg_part)
mfg_part.to_excel('pathToFile/output.xlsx')
Which still give me the same EOL error.
I am not to sure what is wrong, and why I can't write the variables mfg_part, mfg_OEM, description to their respective columns in the loaded Excel file.
Any hints / tips would be greatly appreciated.
complete working code, printing to the screen is as follows:
import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
import itertools
# load in manufacture part number from a collection of components, via an Excel file
mfg_id_list = pd.read_excel("C:/Users/James/Documents/Python Scripts/jupyterNoteBooks/ScrapingData/MasterQuoteTemplate.xls")['Model']
# Create a dictionary to store product and price
# While the below works just fine, we want to create en empty pandas dataframe, so we can output to Excel later
productInfo = {}
chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.get("https://www.tessco.com/login")
userName = "FirstName.SurName321123#gmail.com"
password = "PasswordForThis123"
#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID")))
elem.send_keys(userName)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "password")))
elem.send_keys(password)
#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
for i in mfg_id_list:
#Expand the search bar
searchBar = wait10.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#searchBar input")))
#Enter information into the search bar
#If cell is not blank
if len(str(i)) != 0:
searchBar.send_keys(Keys.CONTROL, 'a')
searchBar.send_keys(i)
driver.find_element_by_css_selector('a.inputButton').click()
time.sleep(5)
try:
# wait for the products information to be loaded
products = wait10.until(EC.presence_of_all_elements_located((By.XPATH,"//div[#class='CoveoResult']")))
#isProductsThere = driver.find_element_by_xpath("//div[#class='CoveoResult']")
if products:
# iterate through all products in the search result and add details to dictionary
for product in products:
# get product info such as OEM, Description and Part Number
productDescr = product.find_element_by_xpath(".//a[#class='productName CoveoResultLink hidden-xs']").text
mfgPart = product.find_element_by_xpath(".//ul[#class='unlisted info']").text.split('\n')[3]
mfgName = product.find_element_by_tag_name("img").get_attribute("alt")
# There are multiple classes, some are "class sale" or else.
#We will locate by CSS
price = product.find_element_by_css_selector("div.price").text.split('\n')[1]
# add details to dictionary
productInfo[mfgPart, mfgName, productDescr] = price
# prints the searched products information
for (mfg_part, mfg_OEM, description), price in productInfo.items():
mfg_id = mfg_part.split(': ')[1]
if mfg_id == i:
#Here is where I would write to an Excel file
#And where I made attempts as described above
print('________________________________________________')
print('Part #:', mfg_id)
print('Company:', mfg_OEM)
print('Description:', description)
print('Price:', price)
print('________________________________________________')
#time.sleep(5)
#driver.close()
else:
mfg_id = "Not on Tessco"
mfg_OEM = "Not on Tessco"
description = "Not on Tessco"
price = "Not on Tessco"
#driver.close()
print("Item was not found on Tessco.com")
except Exception as e:
print('________________________________________________')
print(e)
mfg_id = "Not on Tessco"
mfg_OEM = "Not on Tessco"
description = "Not on Tessco"
price = "Not on Tessco"
#driver.close()
print("Item was not found on Tessco.com")
print('________________________________________________')
driver.close()

Web Scraping with Try: Except: in For Loop

I have written the code below attempting to practice web-scraping with Python, Pandas, etc. In general I have four steps I am trying to follow to achieve my desired output:
Get a list of names to append to a base url
Create a list of player specific urls
Use the player urls to scrape tables
add the player name to the table I scraped to keep track of which player belongs to which stats - so in each row of the table add a column with the players name who was used to scrape the table
I was able to get #'s 1 and 2 working. The components of #3 seem to work, but i believe i have something wrong with my try: except because if i run just the line of code to scrape a specific playerUrl the tables DF populates as expected. The first player scraped has no data so I believe I am failing with the error catching.
For # 4 i really havent been able to find a solution. How do i add the name to the list as it is iterating in the for loop?
Any help is appreciated.
import requests
import pandas as pd
from bs4 import BeautifulSoup
### get the player data to create player specific urls
res = requests.get("https://www.mlssoccer.com/players?page=0")
soup = BeautifulSoup(res.content,'html.parser')
data = soup.find('div', class_ = 'item-list' )
names=[]
for player in data:
name = data.find_all('div', class_ = 'name')
for obj in name:
names.append(obj.find('a').text.lower().lstrip().rstrip().replace(' ','-'))
### create a list of player specific urls
url = 'https://www.mlssoccer.com/players/'
playerUrl = []
x = 0
for name in (names):
playerList = names
newUrl = url + str(playerList[x])
print("Gathering url..."+newUrl)
playerUrl.append(newUrl)
x +=1
### now take the list of urls and gather stats tables
tbls = []
i = 0
for url in (playerUrl):
try: ### added the try, except, pass because some players have no stats table
tables = pd.read_html(playerUrl[i], header = 0)[2]
tbls.append(tables)
i +=1
except Exception:
continue
There are lots of redundancy in your script. You can clean them up complying the following. I've used select() instead of find_all() to shake of the verbosity in the first place. To get rid of that IndexError, you can make use of continue keyword like I've shown below:
import requests
import pandas as pd
from bs4 import BeautifulSoup
base_url = "https://www.mlssoccer.com/players?page=0"
url = 'https://www.mlssoccer.com/players/'
res = requests.get(base_url)
soup = BeautifulSoup(res.text,'lxml')
names = []
for player in soup.select('.item-list .name a'):
names.append(player.get_text(strip=True).replace(" ","-"))
playerUrl = {}
for name in names:
playerUrl[name] = f'{url}{name}'
tbls = []
for url in playerUrl.values():
if len(pd.read_html(url))<=2:continue
tables = pd.read_html(url, header=0)[2]
tbls.append(tables)
print(tbls)
You can do couple of things to improve your code and get the step # 3 and 4 done.
(i) When using the for name in names loop, there is no need to explicitly use the indexing, just use the variable name.
(ii) You can save the player's name and its corresponding URL as a dict, where the name is the key. Then in step 3/4 you can use that name
(iii) Construct a DataFrame for each parsed HTML table and just append the player's name to it. Save this data frame individually.
(iv) Finally concatenate these data frames to form a single one.
Here is your code modified with above suggested changes:
import requests
import pandas as pd
from bs4 import BeautifulSoup
### get the player data to create player specific urls
res = requests.get("https://www.mlssoccer.com/players?page=0")
soup = BeautifulSoup(res.content,'html.parser')
data = soup.find('div', class_ = 'item-list' )
names=[]
for player in data:
name = data.find_all('div', class_ = 'name')
for obj in name:
names.append(obj.find('a').text.lower().lstrip().rstrip().replace(' ','-'))
### create a list of player specific urls
url = 'https://www.mlssoccer.com/players/'
playerUrl = {}
x = 0
for name in names:
newUrl = url + str(name)
print("Gathering url..."+newUrl)
playerUrl[name] = newUrl
### now take the list of urls and gather stats tables
tbls = []
for name, url in playerUrl.items():
try:
tables = pd.read_html(url, header = 0)[2]
df = pd.DataFrame(tables)
df['Player'] = name
tbls.append(df)
except Exception as e:
print(e)
continue
result = pd.concat(tbls)
print(result.head())

Resources