Scraping stocks - output in tkinter - python-3.x

So here is my problem. I am trying to output my scraping results in a GUI using tkinter in python. The code i use works in the shell, but when i use it with tkinter it doesnt Here is my code.
import sys
from tkinter import *
from urllib.request import urlopen
import re
def stockSearch():
searchTerm = userInput.get()
url = ""+searchTerm+"&q1=1"
htmlfile = urlopen(url)
htmltext = str(
regex = '<span id="yfs_l84_'+searchTerm+'">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
outputStock = str(["The price of ", searchTerm, "is ", price])
sLabel2 = Label(sGui, text=outputStock).pack()
sGui = Tk()
userInput = StringVar()
sLabel = Label(sGui, text="Stocks List", fg="black")
sButton = Button(sGui, text="LookUp", command = stockSearch), y=400)
uEntry = Entry(sGui, textvariable=userInput).pack()
If i input a search for Google (GOOG) for example, I return this:
"The price of GOOG is []"
However, if i use the same code, but i print the result in a shell as opposed to using tkinter, i get the price as it should.
It appears your code isn't properly handling case. If you search for "goog" the value shows up. The problem is this line:
regex = '<span id="yfs_l84_'+searchTerm+'">(.+?)</span>'
If you type "GOOG", the regex becomes:
<span id="yfs_l84_GOOG'">(.+?)</span>
However, the html that is returned doesn't have that pattern. Doing a case-insensitive search should solve that problem:
pattern = re.compile(regex, flags=re.IGNORECASE)
Also, there's no need to create a new Label every time -- you can create the label once and then change the text each time you do a lookup.


Switching to pop-up window that cannot be found via windows_handle Selenium/Pytohn

I dug up my old code, used for Scopus scraping. It was created while I was learning programming. Now a window pops up on the Scopus site that I can't detect using windows_handle.
Scopus Pop-up window
import openpyxl
from selenium import webdriver
from import By
from import Select
import time
import pandas as pd
from openpyxl import load_workbook
DOI = []
TITLE = []
YEAR = []
DATES = []
chrome_driver_path = "C:\Development\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)
# searching details
search = input("Search documents: \n")
date = input("Do you want to specify dates?(Yes/No)")
if date.capitalize() == "Yes":
driver.find_element(By.CLASS_NAME, 'flex-grow-1').send_keys(search)
starting_date = input("Put starting year.")
to_date = input("Put end date.")
drop_menu_from = Select(driver.find_element(By.XPATH,
drop_menu_to = Select(driver.find_element(By.XPATH,
DATES = ["XXX", "YYY"]
driver.find_element(By.CLASS_NAME, 'flex-grow-1').send_keys(search)
doc_num = int(driver.find_element(By.XPATH,
",", ""))
driver.find_element((By.XPATH, "/html/body/div[11]/div[2]/div[1]/div[4]/div/div[2]/button")).click()
This is how the beginning of the code looks like. The last element
driver.find_element((By.XPATH, "/html/body/div[11]/div[2]/div[1]/div[4]/div/div[2]/button")).click()
should find on click on dismiss button. I do not know how to handle it.
I have tried finding element by driver.find_element, checking if the pop-up window can be detected and handled via windows_handle.
Actually that is not a popup because its code is contained in HTML of the page itself. Popups are either prompts of the browser (not contained in the HTML) or other browser windows (have a separate HTML).
I suggest to target the button by using the text contained in it, in this case we look for a button containing exactly "Dismiss"
driver.find_element(By.XPATH, '//button[text()="Dismiss"]').click()

How Can I Assign A Variable To All Of The Items In A List?

I'm following a guide and it's saying to print the first item from an html document that contains the dollar sign.
It seems to do it correctly, outputting a price to the terminal and it actually being present on the webpage. However, I don't want to have just that single listing, I want to have all of the listings and print them to the terminal.
I'm almost positive that you could do this with a for loop, but I don't know how to set that up correctly. Here's the code I have so far with a comment on line 14, and the code I'm asking about on line 15.
from bs4 import BeautifulSoup
import requests
import os
url = ''
result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
prices = doc.find_all(text="$")
#Print all prices instead of just the specified number?
parent = prices[0].parent
strong = parent.find("strong")
You could try the following:
from bs4 import BeautifulSoup
import requests
import os
url = ''
result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
prices = doc.find_all(text="$")
for price in prices:
parent = price.parent
strong = parent.find("strong")

Python 3.7 Issue with append function

I'm learning Python and decided to adapte code from an example to scrape Craigslist data to look at prices of cars.
I've created a Jupyter notebook and modified the code for my use. I recreated the same error when running the code in Spyder Python 3.7.
I'm running into an issue at line 116.
File "C:/Users/UserM/Documents/GitHub/learning/Spyder Python Craigslist Scrape", line 116
post_prices.append(post_price). I receive a "SynaxError: invalid syntax".
Any help appreciated. Thanks.
# -*- coding: utf-8 -*-
Created on Wed Oct 2 12:26:06 2019
#import get to call a get request on the site
from requests import get
#get the first page of the Chicago car prices
response = get('') #eliminate duplicates and show owner only sales
from bs4 import BeautifulSoup
html_soup = BeautifulSoup(response.text, 'html.parser')
#get the macro-container for the housing posts
posts = html_soup.find_all('li', class_= 'result-row')
print(type(posts)) #to double check that I got a ResultSet
print(len(posts)) #to double check I got 120 (elements/page
#grab the first post
post_one = posts[0]
#grab the price of the first post
post_one_price = post_one.a.text
#grab the time of the post in datetime format to save on cleaning efforts
post_one_time = post_one.find('time', class_= 'result-date')
post_one_datetime = post_one_time['datetime']
#title is a and that class, link is grabbing the href attribute of that variable
post_one_title = post_one.find('a', class_='result-title hdrlnk')
post_one_link = post_one_title['href']
#easy to grab the post title by taking the text element of the title variable
post_one_title_text = post_one_title.text
#the neighborhood is grabbed by finding the span class 'result-hood' and pulling the text element from that
post_one_hood = posts[0].find('span', class_='result-hood').text
#the price is grabbed by finding the span class 'result-price' and pulling the text element from that
post_one_hood = posts[0].find('span', class_='result-price').text
#build out the loop
from time import sleep
import re
from random import randint #avoid throttling by not sending too many requests one after the other
from warnings import warn
from time import time
from IPython.core.display import clear_output
import numpy as np
#find the total number of posts to find the limit of the pagination
results_num = html_soup.find('div', class_= 'search-legend')
results_total = int(results_num.find('span', class_='totalcount').text) #pulled the total count of posts as the upper bound of the pages array
#each page has 119 posts so each new page is defined as follows: s=120, s=240, s=360, and so on. So we need to step in size 120 in the np.arange function
pages = np.arange(0, results_total+1, 120)
iterations = 0
post_timing = []
post_hoods = []
post_title_texts = []
post_links = []
post_prices = []
for page in pages:
#get request
response = get(""
+ "s=" #the parameter for defining the page number
+ str(page) #the page number in the pages array from earlier
+ "&hasPic=1"
+ "&availabilityMode=0")
#throw warning for status codes that are not 200
if response.status_code != 200:
warn('Request: {}; Status code: {}'.format(requests, response.status_code))
#define the html text
page_html = BeautifulSoup(response.text, 'html.parser')
#define the posts
posts = html_soup.find_all('li', class_= 'result-row')
#extract data item-wise
for post in posts:
if post.find('span', class_ = 'result-hood') is not None:
#posting date
#grab the datetime element 0 for date and 1 for time
post_datetime = post.find('time', class_= 'result-date')['datetime']
post_hood = post.find('span', class_= 'result-hood').text
#title text
post_title = post.find('a', class_='result-title hdrlnk')
post_title_text = post_title.text
#post link
post_link = post_title['href']
#removes the \n whitespace from each side, removes the currency symbol, and turns it into an int
#test removed: post_price = int(post.a.text.strip().replace("$", ""))
post_price = int(float((post.a.text.strip().replace("$", ""))) #does this work??
iterations += 1
print("Page " + str(iterations) + " scraped successfully!")
print("Scrape complete!")
import pandas as pd
eb_apts = pd.DataFrame({'posted': post_timing,
'neighborhood': post_hoods,
'post title': post_title_texts,
'URL': post_links,
'price': post_prices})
Welcome to StackOverflow. Usually when you see syntax errors in already working code, it means that you've either messed up indentation, forgot to terminate a string somewhere, or missed a closing bracket.
You can tell this when a line of what looks to be ok code is throwing you a syntax error. This is because the line before isn't ended properly and the interpreter is giving you hints around where to look.
In this case, you're short a paranthesis in the line before.
post_price = int(float((post.a.text.strip().replace("$", "")))
should be
post_price = int(float((post.a.text.strip().replace("$", ""))))
or delete the extra paranthesis after float
post_price = int(float(post.a.text.strip().replace("$", "")))

How to make a text link in Python3.7 tkinter

I'm trying to make a RSS reader with GUI in Python 3.7 and everything is almost working but there is a problem, instead of that each line pointing to another link, each line moves to the exact same link.
What can I do to solve this?
The Code:
import feedparser
from tkinter import *
import webbrowser
def callback(event):
feed = feedparser.parse("")
feed_title = feed['feed']['title']
feed_entries = feed.entries
root = Tk()
text = Text(root)
for entry in feed.entries:
article_title = entry.title
article_link =
article_published_at = entry.published # Unicode string
article_published_at_parsed = entry.published_parsed # Time object
article_author =
content = entry.summary
article_tags = entry.tags
print ("{}[{}]".format(article_title, article_link))
print ("Published at {}".format(article_published_at))
print ("Published by {}".format(article_author))
print("Content {}".format(content))
link = Label(root, text="{}\n".format(article_title), fg="blue", cursor="hand2")
link.bind("<Button-1>" ,callback)
You are not passing the link in argument to the callback function, therefore when callback is triggered, it uses the current value of article_link which is the link of the last entry. To fix this you can pass the link to callback using a lambda function in the for loop:
def callback(event, article_link):
# ...
for entry in feed.entries:
# ...
link.bind("<Button-1>", lambda event, link=article_link: callback(event, link))

Iterate through list while parsing

I am trying to download worksheets for this workout, all the workouts are split on different days. All that needs to be done is add a new number at the end of the link. Here is my code.
import urllib
import urllib.request
from bs4 import BeautifulSoup
import re
import os
theurl = ""
urls = []
count = 1
while count <29:
urls.append(theurl + str(count))
count +=1
for url in urls:
thepage = urllib
thepage = urllib.request.urlopen(urls)
soup = BeautifulSoup(thepage,"html.parser")
init_data = open('/Users/paribaker/Desktop/scrapping/workout/4weekdata.txt', 'a')
workout = []
for data_all in soup.findAll('div',{'class':"b-workout-program-day-exercises"}):
for item in data_all.findAll('div',{'class':"b-workout-part--item"}):
for desc in item.findAll('div', {'class':"b-workout-part--description"}):
workout.append(desc.find('h4',{'class':"b-workout-part--exercise-count"}).text.strip("\n") +",\t")
workout.append(desc.find('strong',{'class':"b-workout-part--promo-title"}).text +",\t")
workout.append(desc.find('span',{'class':"b-workout-part--equipment"}).text +",\t")
for instr in item.findAll('div', {'class':"b-workout-part--instructions"}):
workout.append(instr.find('div',{'class':"b-workout-part--instructions--item workouts-sets"}).text.strip("\n") +",\t")
workout.append(instr.find('div',{'class':"b-workout-part--instructions--item workouts-reps"}).text.strip("\n") +",\t")
workout.append(instr.find('div',{'class':"b-workout-part--instructions--item workouts-rest"}).text.strip("\n"))
except AttributeError:
init_data.write("".join(map(lambda x:str(x), workout)))
The problem is that the server times out, I'm assuming its not iterating through the list properly or adding characters I do not need and crashing the server parser.
I have also tried to write another script that grabs all the link and put them in a text document, then reopen the text in this script and iterate through the text, but that also gave me the same error. What are your thoughts?
There's a typo here:
thepage = urllib.request.urlopen(urls)
You probably wanted:
thepage = urllib.request.urlopen(url)
Otherwise you are trying to open an array of urls rather than a single one.
