Scraper cannot run suddenly and show "driver is not defined"

Scraper cannot run suddenly and show "driver is not defined" - python-3.x

I am running a scraper that get the data from Nowgoal. It was running fine till this morning. I run it again without any changes in the program, it showed me an error "driver is not defined".
However, I have already defined as follows:
options = webdriver.ChromeOptions() #initialise webdriver options
driver = webdriver.Chrome(resource_path('./drivers/chromedriver.exe'),options=options)
I am not sure what exactly is the problem, the error directed to the last line of the program where I quit the driver as follow:
driver.quit()
It happened a few times, I closed all the IDLE and opened it again, it worked. But now, no matter what it got me the same error.
Below is the detail code and the Link_output file is here to get the main URLs.
#######Main script start here - open the file where the urls has been stored-make sure Link_output is in same folder as script running######
read_workbook = load_workbook(filename="Link_output.xlsx")
read_sheet = read_workbook.active
for row in range(2,read_sheet.max_row+1):
for column in "BDG": #Here you can add or reduce the columns
cell_name = "{}{}".format(column, row)
main_urls.append(read_sheet[cell_name].value)
#we have URL ready
print('Urls we are going to scrape : ' ,main_urls)
#filter out the dictionary based on bookmakers entered in config file- we will have the bookmakers available in my_dictionay_button
wanted_bookmaker_lst = bkmaker_param.split(',')
for maker in wanted_bookmaker_lst:
for k,v in main_dictionary_button.items():
if k.startswith(maker):my_dictionary_button[k]=v
#now loop through each URL
for file_key ,main_url in enumerate(main_urls):
#start the new workbook for each new URL
workbook = Workbook()
#Error flag clear - first time action flag also cleared
i=0
error_flag =0
file_key += 1
#Main url -print here-every third value is the link
if file_key % 3 == 0:
print(main_url)
# first we will enter into main_url - urls generally open with Crown tab - so we will click through each button of the bookmakers
for bookmaker ,odds_url_button in my_dictionary_button.items():
if i == 0 and error_flag == 0 :#first time action
#start the driver for the first time
driver = webdriver.Chrome(resource_path('./drivers/chromedriver.exe'),options=options)
#driver = webdriver.Chrome(executable_path = driver_path,options=options )
try:
driver.get(main_url) #Get the main url
except TimeoutException:
driver.get(main_url) #in case of timeout error - try again
time.sleep(5)
try:
driver.find_element_by_xpath(odds_url_button).click() #click on the first bookmaker button
driver.switch_to.window(driver.window_handles[0]) #in case any pop up is opening due to any reason - swtich to main window
except NoSuchElementException: #In case button is not found
print('Button not found')
lst_of_reattempt.append(driver.current_url) #Get the current url for which we were not able to find the button
saved_button_for_halftime = odds_url_button #save the button for later reattempt
driver.quit()
i+=1 #Firt time actions are over
error_flag == 1 #initialise the error count
continue
i+=1
elif error_flag == 1: #if previous one went into error
if odds_url_button == '//*[#id="htBtn"]/a': #In case the error happened while clicking on half time button
half_time = 1
revised_url = get_url_from_button(saved_button_for_halftime,main_url,half_time)# Get the revised url
userAgent = ua.random #change user agent everytime browser went into error
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(resource_path('./drivers/chromedriver.exe'),options=options) #trigger driver
#driver = webdriver.Chrome(executable_path = driver_path,options=options )
try:
driver.get(revised_url) #Revised URL open
time.sleep(5)
except TimeoutException:
driver.get(revised_url) #In case of timeout- reattempt
time.sleep(5)
error_flag = 0 #disable error flag - so we can proceed as usual
else:
revised_url = get_url_from_button(odds_url_button,main_url)
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(resource_path('./drivers/chromedriver.exe'),options=options)
#driver = webdriver.Chrome(executable_path = driver_path,options=options )
try:
driver.get(revised_url)
except TimeoutException:
driver.get(revised_url)
error_flag = 0
else: #In case of no error
driver.find_element_by_xpath(odds_url_button).click()#Click on next button
driver.switch_to.window(driver.window_handles[0]) #in case any pop up is opening due to any reason - swtich to main window
i+=1
time.sleep(random.randint(5,7)) #sleep for random amount of time - to make the script robust
htmlSource = driver.page_source #Get the html code
soup = bs4.BeautifulSoup(htmlSource,'html.parser') #pass the page
#get the fixed data which is common and do not change for one book maker
title, home_team , away_team , place, weather ,tournament,m_date,m_time,data_first,data_second,data_third,final_score = get_fixed_data(soup)
#home team ranking
home_team_ranking = main_urls[file_key-3]
away_team_ranking = main_urls[file_key-2]
print('Title data :',title)
if title != 'No Data':#check if the data found or not
#create the folder path
print(m_date)
folder_month ,folder_day ,folder_year = m_date.split('-') #/
folder_hour ,folder_minute = m_time.split(':')
#fle_name = folder_day +folder_month + folder_year
#folder_name = folder_day +'_'+folder_month+'_' + folder_year
#convert the time to gmt
folder_time_string = folder_year +'-'+folder_month +'-'+folder_day +' '+ folder_hour+':'+folder_minute+':00'
#folder name change
folder_name =time.strftime("%d-%m-%Y", time.gmtime(time.mktime(time.strptime(folder_time_string, "%Y-%d-%m %H:%M:%S"))))
print(bookmaker)
#Output_file_format
try:
print('Creating directory')
os.mkdir(os.path.join(os.getcwd()+'\\'+folder_name))
except FileExistsError:
print('Directory already exist')
inter_file_name = 'Odds_nowgoal_'+str(title.replace('v/s','vs'))+'_'+folder_name+'.xlsx'
ola = os.path.join('\\'+folder_name,inter_file_name)
output_file_name = os.path.join(os.getcwd()+ola)
#sheet_title_first_table
sheet_title = '1X2 Odds_'+bookmaker
#add data to excel
excel_add_table(sheet_title,data_first,title,home_team , away_team , place, weather ,tournament,m_date,m_time,bookmaker,home_team_ranking,away_team_ranking,final_score)
#sheet_title_second_table
sheet_title = 'Handicap Odds_'+bookmaker
#add data to excel
excel_add_table(sheet_title,data_second,title,home_team , away_team , place, weather ,tournament,m_date,m_time,bookmaker,home_team_ranking,away_team_ranking,final_score)
#sheet_title_third_table
sheet_title = 'Over_Under Odds_'+bookmaker
#add data to excel
excel_add_table(sheet_title,data_third,title,home_team , away_team , place, weather ,tournament,m_date,m_time,bookmaker,home_team_ranking,away_team_ranking,final_score)
else :
lst_of_reattempt.append(home_team_ranking)
lst_of_reattempt.append(away_team_ranking)
lst_of_reattempt.append(driver.current_url) #add the url into list of reattempt
saved_button_for_halftime = odds_url_button #save the button when error happens - so we can convert it into URL and later reattempt
error_flag = 1
driver.quit() #Quit the driver in case of any error
driver.quit()

Related

How do i loop to one page to another in my web scraping project.. should scrap datas from all 250 pages..But it stops in the first page

#So this is a part of the scrapping code...but it does not gets looped more than the first page please help me to loop all through 250 pages of the etsy ecommerce website
URL = f'https://www.etsy.com/in-en/c/jewelry/earrings/ear-jackets-and-climbers?ref=pagination&page={page}'
try:
#Count for every page of website
URL = URL.format(page)
browser.get(URL)
print("Scraping Page:",page)
#xpath of product table
PATH_1 ='//*[#id="content"]/div/div[1]/div/div[3]/div[2]/div[2]/div[9]/div/div/div'
#getting total items
items = browser.find_element(By.XPATH, PATH_1)
items = items.find_elements(By.TAG_NAME, 'li' )
#available items in page
end_product = len(items)
#Count for every product of the page
for product in range(0,end_product):
print("Scarping reviews for product", product +1)
#clicking on product
try:
items[product].find_element(By.TAG_NAME, 'a').click()
except:
print('Product link not found')
#switch the focus of driver to new tab
windows = browser.window_handles
browser.switch_to.window(windows[1])
try:
PATH_2 = '//*[#id="reviews"]/div[2]/div[2]'
count = browser.find_element(By.XPATH, PATH_2)
#Number of review on any page
count = count.find_elements(By.CLASS_NAME, 'wt-grid wt-grid--block wt-mb-xs-0')
for r1 in range(1,len(count)+1):
dat1 = browser.find_element(By.XPATH ,
'//*[#id="reviews"]/div[2]/div[2]/div[1]/div[1]/p'.format(
r1)).text
if dat1[:dat1.find(',')-6] not in person:
try:
person.append(dat1[:dat1.find(',')-6])
date.append(dat1[dat1.find(',')-6:])
except Exception:
person.append("Not Found")
date.append("Not Found")
try:
stars.append(browser.find_element(By.XPATH ,
'//*[#id="reviews"]/div[2]/div[2]/div[1]/div[2]/div[1]/div/div/span/span[2]'.format(
r1)).text[0])
except Exception:
stars.append("No stars")
except Exception:
browser.close()
#swtiching focus to main tab
browser.switch_to.window(windows[0])
#export data after every product
#export_data()
except Exception as e_1:
print(e_1)
print("Program stoped:")
export_data()
browser.quit()
#defining the main function
def main():
logging.basicConfig(filename='solution_etsy.log', level=logging.INFO)
logging.info('Started')
if 'page.txt' in os.listdir(os.getcwd()):
with open('page.txt','r') as file1:
page = int(file1.read())
for i in range(1 ,250):
run_scraper(i,browser)
else:
for i in range(1,250):
with open('page.txt','w') as file:
file.write(str(i))
run_scraper(i,browser)
export_data()
print("--- %s seconds ---" % (time.time() - start_time))
logging.info('Finished')
# Calling the main function
if __name__ == '__main__':
main()
So in this code please help to loop from one page to another where do i apply the loop.

stud = 'https://www.etsy.com/in-en/c/jewelry/earrings/ear-jackets-and-climbers?ref=pagination&page={}'
from time import sleep
from tqdm.notebook import tqdm
for i in tqdm(range(1, 250)):
url_pages = stud.format(i)
browser.get(url_pages)
sleep(4) ## sleep of 4 will rest your code for 4 sec so that the entire page is loded you can adjust it according to your internet speed.
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser)
### funtion or anything that you want to apply from here
Then follow your steps as you wish this will load all the pages.
If it still not working, look how I have scraped data from multiple pages
from site similar to this site Github link to my solution : https://github.com/PullarwarOm/devtomanager.com-Web-Scraping/blob/main/devtomanager%20Final.ipynb
The scraped page is similar to page you are trying to scrape so go through this ipynb file. Thank you.

How to scrape data generated dynamically on selecting drop down menus one after other using Beautifulsoup and Selenium?

I want to scrape data from following website :
http://b2b.godrejinterio.com/GodrejInterio/dealer.aspx?id=29&menuid=2458&business=2
Here, data is dynamically generated on the same page itself without any change in URL.
Everytime you option from 1st dropdown menu, then only 2nd dropdown becomes active and allows you to select option from 2nd dropdown and so on for 3rd & 4th dropdown menu.
After selection of all the dropdown menus, you have to click on search button then only data gets generated on the same page.
I need to scrape data for all possible selections in one go. Below is the code which i tried but it wont work as desired. I am using python and tools as beautifulsoup & selenium. Help me with this!!
Mike67, I have used your suggestion and improved code, but still I am unable to iterate within option and save code to dataframe. Help me with this !!
Code :
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("C:/Users/Downloads/chromedriver")
rec=[]
driver.get("http://b2b.godrejinterio.com/GodrejInterio/dealer.aspx?id=29&menuid=2458&business=2")
# wait=WebDriverWait(driver,10)
time.sleep(2)
s1 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlRange"))
s1.select_by_value("Institutional Furniture")
# print(s1.options[0].text)
time.sleep(2)
# wait.until(EC.presence_of_all_element_located((By.ID,"ucInterioDealerLocatorNewRight_ddlRange")))
s22 = driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlSubRange")
s2 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlSubRange"))
all_options1 = s22.find_elements_by_tag_name("option")
for option1 in all_options1:
option1=option1.get_attribute("value")
print(option1)
if(option1=='0'):
continue
else:
s2.select_by_value(option1)
time.sleep(10)
# wait.until(EC.presence_of_all_element_located((By.ID,"ucInterioDealerLocatorNewRight_ddlSubRange")))
s33 = driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlState")
s3 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlState"))
all_options2 = s33.find_elements_by_tag_name("option")
for option2 in all_options2:
option2=option2.get_attribute("value")
print(option2)
s3.select_by_value(option2)
# print(s3.options[1].text)
time.sleep(10)
# wait.until(EC.presence_of_all_elements_located((By.ID,"ucInterioDealerLocatorNewRight_ddlState")))
s44 = driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlCity")
s4 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlCity"))
all_options3 = s44.find_elements_by_tag_name("option")
for option3 in all_options3:
option3=option3.get_attribute("value")
print(option3)
if(option3=='0'):
continue
else:
s4.select_by_value(option3)
# print(s4.options[1].text)
time.sleep(10)
# wait.until(EC.presence_of_all_elements_located((By.ID,"ucInterioDealerLocatorNewRight_ddlCity")))
s55 = driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlArea")
s5 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlArea"))
all_options4 = s55.find_elements_by_tag_name("option")
for option4 in all_options4:
option4=option4.get_attribute("value")
print(option4)
if(option4=='0'):
continue
else:
s5.select_by_value(option4)
# print(s4.options[1].text)
time.sleep(10)
s6=driver.find_element_by_id("ucInterioDealerLocatorNewRight_imgBtnSearch").click()
# for i in s6.find_all('div')
# print(type(s6))
# print(s4.content)
time.sleep(10)
# wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME,"dealer_search_maindiv")))
# r1 = driver.find_element_by_class_name("dealer_search_maindiv")
html=driver.page_source
# print(html)
soup=BeautifulSoup(html,'html5lib')
try:
cl=soup.find('div',attrs={'class':'dealer_search_maindiv'})
for i in range(0,10):
i=str(i)
idd= f"ucInterioDealerLocatorNewRight_dlDealer_ctl0{i}_tblDealer"
kwargs={'id': 'idd' }
kwargs['id'] = idd
d1=cl.find('table', kwargs)
data=";"
d2 = d1.find('table')
for d3 in d2.find_all('tr'):
j=d3.find('td').text
print(j)
data = data + j + ';'
print(data)
rec.append(data)
except:
print("no record for this selection")
continue
print("state done")
print("all subrange completed")
print(len(rec))
df=pd.DataFrame({'Record':rec})
driver.close()

If you call time.sleep in between each dropdown change, the page works:
driver.get("http://b2b.godrejinterio.com/GodrejInterio/dealer.aspx?id=29&menuid=2458&business=2")
time.sleep(2)
s1 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlRange"))
s1.select_by_value("Institutional Furniture")
print(s1.options[0].text)
time.sleep(2)
s2 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlSubRange"))
s2.select_by_value("Desking")
time.sleep(2)
s3 = Select(driver.find_element_by_id("ucInterioDealerLocatorNewRight_ddlState"))
s3.select_by_value("Delhi")
print(s3.options[0].text)
driver.find_element_by_id("ucInterioDealerLocatorNewRight_imgBtnSearch").click()

How to make selenium threads run (each thread with its own driver)

I have a python 3 script that needs to make thousands of requests to multiple different websites and check if their source code pass in some pre defined rules.
I am using selenium to make the requests because I need to get the source code after JS finishes it excecution, but due to the high number of urls I need to check, I am trying to make it run multiple threads simultaneously. Each thread creates and maintain an instance of webdriver to make the requests. The problem is after a while all threads go silent and simply stop executing, leaving just a single thread doing all the work. Here is the relevant part of my code:
def get_browser(use_firefox = True):
if use_firefox:
options = FirefoxOptions()
options.headless = True
browser = webdriver.Firefox(options = options)
browser.implicitly_wait(4)
return browser
else:
chrome_options = ChromeOptions()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome(chrome_options=chrome_options)
browser.implicitly_wait(4)
return browser
def start_validations(urls, rules, results, thread_id):
try:
log("thread %s started" % thread_id, thread_id)
browser = get_browser(thread_id % 2 == 1)
while not urls.empty():
url = "http://%s" % urls.get()
try:
log("starting %s" % url, thread_id)
browser.get(url)
time.sleep(0.5)
WebDriverWait(browser, 6).until(selenium_wait_reload(4))
html = browser.page_source
result = check_url(html, rules)
original_domain = url.split("://")[1].split("/")[0].replace("www.","")
tested_domain = browser.current_url.split("://")[1].split("/")[0].replace("www.","")
redirected_url = "" if tested_domain == original_domain else browser.current_url
results.append({"Category":result, "URL":url, "Redirected":redirected_url})
log("finished %s" % url, thread_id)
except Exception as e:
log("couldn't test url %s" % url, thread_id )
log(str(e), thread_id)
results.append({"Category":"Connection Error", "URL":url, "Redirected":""})
browser.quit()
time.sleep(2)
browser = get_browser(thread_id % 2 == 1)
except Exception as e:
log(str(e), thread_id)
finally:
log("closing thread", thread_id)
browser.quit()
def calculate_progress(urls):
progress_folder ="%sprogress/" % WEBROOT
if not os.path.exists(progress_folder):
os.makedirs(progress_folder)
initial_size = urls.qsize()
while not urls.empty():
current_size = urls.qsize()
on_queue = initial_size - current_size
progress = '{0:.0f}'.format((on_queue / initial_size * 100))
for progress_file in os.listdir(progress_folder):
file_path = os.path.join(progress_folder, progress_file)
if os.path.isfile(file_path) and not file_path.endswith(".csv"):
os.unlink(file_path)
os.mknod("%s%s" % (progress_folder, progress))
time.sleep(1)
if __name__ == '__main__':
while True:
try:
log("scraper started")
if os.path.isfile(OUTPUT_FILE):
os.unlink(OUTPUT_FILE)
manager = Manager()
rules = fetch_rules()
urls = manager.Queue()
fetch_urls()
results = manager.list()
jobs = []
p = Process(target=calculate_progress, args=(urls,))
jobs.append(p)
p.start()
for i in range(THREAD_POOL_SIZE):
log("spawning thread with id %s" % i)
p = Process(target=start_validations, args=(urls, rules, results, i))
jobs.append(p)
p.start()
time.sleep(2)
for j in jobs:
j.join()
save_results(results, OUTPUT_FILE)
log("scraper finished")
except Exception as e:
log(str(e))
As you can see, first I thought I could only have one instance of the browser, so I tried to run at least firefox and chrome in paralel, but this still leaves only a thread to do all the work.
Some times the driver crahsed and the thread stopped working even though it is inside a try/catch block, so I started to get a new instance of the browser everytime this happens,but it still didn't work. I also tried waiting a few seconds between creating each instance of the driver still with no results
here is a pastebin of one of the log files:
https://pastebin.com/TsjZdRYf
A strange thing that I noticed is that almost everytime the only thread that keeps running is the last one spawned (with id 3).
Thanks for your time and you help!
EDIT:
[1] Here is the full code: https://pastebin.com/fvVPwPVb
[2] custom selenium wait condition: https://pastebin.com/Zi7nbNFk

Am I allowed to curse on SO? I solved the problem, and I don't think this answer should exist on SO because nobody else will benefit from it. The problem was a custom wait condition that I had created. This class is in the pastebin that was added in edit 2, but I'll also add it here for convenince:
import time
class selenium_wait_reload:
def __init__(self, desired_repeating_sources):
self.desired_repeating_sources = desired_repeating_sources
self.repeated_pages = 0
self.previous_source = None
def __call__(self, driver):
while True:
current_source = driver.page_source
if current_source == self.previous_source:
self.repeated_pages = self.repeated_pages +1
if self.repeated_pages >= self.desired_repeating_sources:
return True
else:
self.previous_source = current_source
self.repeated_pages = 0
time.sleep(0.3)
The goal of this class was to make selenium wait because the JS could be loading additional DOM.
So, this class makes selenium wait a short time and check the code, wait a little again and check the code again. The class repeats this until the source code is the same 3 times in a row.
The problem is that there are some pages that have a js carrousel, so the source code is never the same. I thought that in cases like this the WebDriverWait second parameter would make it crash with a timeoutexception. I was wrong.

Python selenium skip page if message element not visible skipping

I am trying to grab a text element from a page. To get to this element my scrips clicks on two filters on the page. I need to crawl 5,000 pages. The script works, in terms of collecting the text element, however, after a certain number of pages it always returns a message "element not visible". I am assuming it's due to the fact that page didn't load in time, since I checked the pages where it breaks and the text element is there. (I have time.sleep(3) already implemented after every click). What can I use in my script to just skip that page if it doesn't load in time?
def yelp_scraper(url):
driver.get(url)
# get total number of restaurants
total_rest_loc = '//span[contains(text(),"Showing 1")]'
total_rest_raw = driver.find_element_by_xpath(total_rest_loc).text
total_rest = int(re.sub(r'Showing 1.*of\s','',total_rest_raw))
button1 = driver.find_element_by_xpath('//span[#class="filter-label filters-toggle js-all-filters-toggle show-tooltip"]')
button1.click()
time.sleep(1)
button2 = driver.find_element_by_xpath('//span[contains(text(),"Walking (1 mi.)")]')
button2.click()
time.sleep(2)
rest_num_loc = '//span[contains(text(),"Showing 1")]'
rest_num_raw = driver.find_element_by_xpath(rest_num_loc).text
rest_num = int(re.sub(r'Showing 1.*of\s','',rest_num_raw))
if total_rest==rest_num:
button3 = driver.find_element_by_xpath('//span[contains(text(),"Biking (2 mi.)")]')
button3.click()
time.sleep(2)
button4 = driver.find_element_by_xpath('//span[contains(text(),"Walking (1 mi.)")]')
button4.click()
time.sleep(2)
rest_num_loc = '//span[contains(text(),"Showing 1")]'
rest_num_raw = driver.find_element_by_xpath(rest_num_loc).text
rest_num = int(re.sub(r'Showing 1.*of\s','',rest_num_raw))
return(rest_num)
chromedriver = "/Applications/chromedriver" # path to the chromedriver executable
os.environ["webdriver.chrome.driver"] = chromedriver
chrome_options = Options()
# add headless mode
chrome_options.add_argument("--headless")
# turn off image loading
prefs = {"profile.managed_default_content_settings.images":2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chromedriver, chrome_options=chrome_options)
for url in url_list:
yelp_data[url] = yelp_scraper(url)
json.dump(yelp_data, open('../data/yelp_json/yelp_data.json', 'w'), indent="\t")
driver.close()

EXAMPLE:
from selenium.common.exceptions import NoSuchElementException
for item in driver.find_elements_by_class_name('item'):
try:
model = item.find_element_by_class_name('product-model')
price = item.find_element_by_class_name('product-display-price')
title = item.find_element_by_class_name('product-title')
url = item.find_element_by_class_name('js-detail-link')
items.append({'model': model, 'price': price, 'title': title, 'url': url})
print (model.text, price.text, title.text, url.get_attribute("href"))
c = (model.text, price.text, title.text, url.get_attribute("href"))
a.writerow(c)
except NoSuchElementException:
#here you can do what you want to do when an element is not found. Then it'll continue with the next one.
b.close()

calling a procedure when a button is pressed

i thought this would be easy but here i am!!
I was to call a procedure when the button is pressed and display the result on a label
class DSFRSapp(App):
def build(self):
self.root = FloatLayout()
i = Image(source='DSFRSLogo.png',
allow_stretch=True,
pos_hint = ({'center_x':0.5, 'y': .25}))
spinner = Spinner(
text='Pick a Station',
values=('Appledore','Axminster','Bampton','Barnstaple','Bere Alston','Bideford','Bovey Tracey','Braunton','Bridgwater','Brixham','Buckfastleigh','Budleigh Salterton','Burnham on sea','Camels Head','Castle Cary','Chagford','Chard','Cheddar','Chulmleigh','Colyton','Combe Martin','Crediton','Crewkerne','Crownhill','Cullompton','Dartmouth','Dawlish','Exeter Danes Castle','Exeter Middlemoor','Exmouth','Frome','Glastonbury','Greenbank','Hartland','Hatherleigh','Holsworthy','Honiton','Ilfracombe','Ilminster','Ivybridge','Kingsbridge','Kingston','Lundy Island','Lynton','Martock','Minehead','Modbury','Moretonhampstead','Nether Stowey','Newton Abbot','North Tawton','Okehampton','Ottery St Mary','Paignton','Plympton','Plymstock','Porlock','Princetown','Salcombe','Seaton','Shepton Mallet','Sidmouth','Somerton','South Molton','Street','Taunton','Tavistock','Teignmouth','Tiverton','Topsham','Torquay','Torrington','Totnes','USAR','Wellington','Wells','Williton','Wincanton','Witheridge','Wiveliscombe','Woolacombe','Yelverton','Yeovil'),
size_hint=(None, None),
size=(150, 44),
pos_hint = ({'center_x':0.5, 'y': 0.35}))
b = Button(text="Search For Incidents",size_hint=(None, None),
pos_hint =({'center_x':0.5, 'y': 0.25}),
size=(150, 44))
LblRes = Label(text="Results will display here",
pos_hint =({'center_x':0.5, 'y': 0.15}),
size_hint=(600,100),color=(1,1,1,1),font_size=35)
b.bind(on_press=FindIncident(Spinner.text))
self.root.add_widget(spinner)
self.root.add_widget(LblRes)
self.root.add_widget(i)
self.root.add_widget(b)
return
def FindIncident( sStation ):
webpage = request.urlopen("http://www.dsfire.gov.uk/News/Newsdesk/IncidentsPast7days.cfm?siteCategoryId=3&T1ID=26&T2ID=35")#main page
soup = BeautifulSoup(webpage)
incidents = soup.find(id="CollapsiblePanel1") #gets todays incidents panel
Links = [] #create list call Links
for line in incidents.find_all('a'): #get all hyperlinks
Links.append("http://www.dsfire.gov.uk/News/Newsdesk/"+line.get('href')) #loads links into Links list while making them full links
n = 0
e = len(Links)
if e == n: #if no links available no need to continue
print("No Incidents Found Please Try Later")
sys.exit(0)
sFound = False
while n < e: #loop through links to find station
if sFound: #if the station has been found stop looking
sys.exit(0)
webpage = request.urlopen(Links[n]) #opens link in list)
soup = BeautifulSoup(webpage) #loads webpage
if soup.find_all('p', text=re.compile(r'{}'.format(sStation))) == []:#check if returned value is found
#do nothing leaving blank gave error
a = "1" #this is pointless but stops the error
else:
print(soup.find_all('p', text=re.compile(r'{}'.format(sStation)))) #output result
WebLink = Links[n]
sFound = True # to avoid un needed goes through the loop process
n=n+1 # moves counter to next in list
if not sFound: #after looping process if nothing has been found output nothing found
print("nothing found please try again later")
return;
if __name__ =="__main__":
DSFRSapp().run()
so when the button is pressed to call FindIncident using the spinner text as the string. That way it can search for the station and instead of print i will put the results in the label, possibly with a link to the website too.
any help would be great!
Raif

b.bind(on_press=FindIncident(Spinner.text))
You need to pass a function as the argument to bind. You aren't passing FindIncident, you're calling it and passing the result...which is None.
Try
from functools import partial
b.bind(on_press=partial(FindIncident, spinner.text))
Also declare FindIncident as def FindIncident( sStation, *args ):, as bind will automatically pass extra arguments. You could also do the same thing with a lambda function.
Be careful with your code cases as well - at the moment you use Spinner.text, when you probably mean spinner.text, as you created a variable spinner.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Scraper cannot run suddenly and show "driver is not defined" - python-3.x

Related

How do i loop to one page to another in my web scraping project.. should scrap datas from all 250 pages..But it stops in the first page

How to scrape data generated dynamically on selecting drop down menus one after other using Beautifulsoup and Selenium?

How to make selenium threads run (each thread with its own driver)

Python selenium skip page if message element not visible skipping

calling a procedure when a button is pressed

Categories

Resources