Scraping Billboard Data with Python; code doesn't crawl to previous chart

Scraping Billboard Data with Python; code doesn't crawl to previous chart - python-3.x

First of all, I'm a relative newbie to coding. My goal is to scrape at least the last decade of Billboard Hot 100 charts using the Python code below with billboard.py. My hiccup is I have tried a few variants of while loop statements and none have seemed to work to get me to the previous chart. I have an idea of how it should look from the billboard.py documentation but for whatever reason my code terminates prematurely or outputs an AttributeError: 'ChartEntry' object has no attribute 'previousDate'
Any advice on debugging this and/or corrective code is appreciated. Thank you.
import billboard
import csv
chart = billboard.ChartData('hot-100')
#chart = billboard.ChartData('hot-100', date=None, fetch=True, max_retries=5, timeout=25)
f = open('Hot100.csv', 'w')
headers = 'title, artist, peakPos, lastPos, weeks, rank, date\n'
f.write(headers)
while chart.previousDate:
date = chart.date
for chart in chart:
title = chart.title
artist = chart.artist
peakPos = str(chart.peakPos)
lastPos = str(chart.lastPos)
weeks = str(chart.weeks)
rank = str(chart.rank)
f.write('\"' + title + '\",\"' + artist.replace('Featuring', 'Feat.') + '\",' + peakPos + ',' + lastPos + ',' + weeks + ',' + rank + ',' + date + '\n')
chart = billboard.ChartData('hot-100', chart.previousDate)
f.close()

I figured it out. I had to change how my script was comprehending the for loop.
My revised code below
import billboard
import csv
chart = billboard.ChartData('hot-100')
#chart = billboard.ChartData('hot-100', date=None, fetch=True, max_retries=5, timeout=25)
f = open('hot-100.csv', 'w')
headers = 'title, artist, peakPos, lastPos, weeks, rank, date\n'
f.write(headers)
date = chart.date
while chart.previousDate:
date = chart.date
for song in chart:
title = song.title
artist = song.artist
peakPos = str(song.peakPos)
lastPos = str(song.lastPos)
weeks = str(song.weeks)
rank = str(song.rank)
f.write('\"' + title + '\",\"' + artist.replace('Featuring', 'Feat.') + '\",' + peakPos + ',' + lastPos + ',' + weeks + ',' + rank + ',' + date + '\n')
chart = billboard.ChartData('hot-100', chart.previousDate)
f.close()

Related

ValueError: Length of values (1) does not match length of index (50)

Hey there awesome peeps,
I am trying to retrieve some trend information based on some keywords that I have in a list (1000 keywords). In order to minimize the chance of getting blocked by Google I have a cutoff period of 50 and a 10 second pause. At the moment I get an error saying that my Length of value does not match the length of the index. This fails on the
df3['Trend'] = trends
If anyone can help I will really appreciate it.
Thanks!
!pip install pytrends
import pandas as pd
import json
import time
from pytrends.request import TrendReq
get_gsc_file = "/content/Queries.csv"
sortby = "Clicks"
cutoff = 50
pause = 10
timeframe = "today 3-m"
geo = "US"
df = pd.read_csv(get_gsc_file, encoding='utf-8')
df.sort_values(by=[sortby], ascending=False, inplace=True)
df = df[:cutoff]
d = {'Keyword': [], sortby:[], 'Trend': []}
df3 = pd.DataFrame(data=d)
keywords = []
trends = []
metric = df[sortby].tolist()
up = 0
down = 0
flat = 0
na = 0
for index, row in df.iterrows():
keyword = row['Top queries']
pytrends = TrendReq(hl='en-US', tz=360, retries=2, backoff_factor=0.1)
kw_list = [keyword]
pytrends.build_payload(kw_list, cat=0, timeframe=timeframe, geo=geo, gprop='')
df2 = pytrends.interest_over_time()
keywords.append(keyword)
try:
trend1 = int((df2[keyword][-5] + df2[keyword][-4] + df2[keyword][-3])/3)
trend2 = int((df2[keyword][-4] + df2[keyword][-3] + df2[keyword][-2])/3)
trend3 = int((df2[keyword][-3] + df2[keyword][-2] + df2[keyword][-1])/3)
if trend3 > trend2 and trend2 > trend1:
trends.append('UP')
up+=1
elif trend3 < trend2 and trend2 < trend1:
trends.append('DOWN')
down+=1
else:
trends.append('FLAT')
flat+=1
except:
trends.append('N/A')
na+=1
time.sleep(pause)
df3['Keyword'] = keywords
df3['Trend'] = trends
df3[sortby] = metric
def colortable(val):
if val == 'DOWN':
color="lightcoral"
elif val == 'UP':
color = "lightgreen"
elif val == 'FLAT':
color = "lightblue"
else:
color = 'white'
return 'background-color: %s' % color
df3 = df3.style.applymap(colortable)
total = len(trends)
print("Up: " + str(up) + " | " + str(round((up/total)*100,0)) + "%")
print("Down: " + str(down) + " | " + str(round((down/total)*100,0)) + "%")
print("Flat: " + str(flat) + " | " + str(round((flat/total)*100,0)) + "%")
print("N/A: " + str(na) + " | " + str(round((na/total)*100,0)) + "%")
df3

Pandas and adding column and data to a table

Any idea how to add the division(j) to each row?? I run the program and it runs through each division (division 1 through 5). I want to add what division it is to each row. I have the headers 'Name, Gender, State, Position, Grad, Club/HS, Rating, Commitment, Division' at the top of the table. Right now I don't know which division each row is because it is blank. Thanks for your help....
import pandas as pd
max_page_num = 10
with open('results.csv', 'a', newline='') as f:
f.write('Name, Gender, State, Position, Grad, Club/HS, Rating, Commitment, Division\n')
def division():
for j in range(1,5):
division = str(j)
for i in range(max_page_num):
print('page:', i)
graduation = str(2020)
area = "commitments" # "commitments" or "clubplayer"
gender = "m"
page_num = str(i)
source = "https://www.topdrawersoccer.com/search/?query=&divisionId=" + division + "&genderId=m&graduationYear=" + graduation + "&playerRating=&pageNo=" + page_num + "&area=" + area +""
all_tables = pd.read_html(source)
df = all_tables[0]
print('items:', len(df))
df.to_csv('results.csv', header=False, index=False, mode='a')
division()

Simply adding the column 'division' should do it if I understand correctly.
import pandas as pd
max_page_num = 10
with open('results.csv', 'a', newline='') as f:
f.write('Name, Gender, State, Position, Grad, Club/HS, Rating, Commitment, Division\n')
def division():
for j in range(1,5):
division = str(j)
for i in range(max_page_num):
print('page:', i)
graduation = str(2020)
area = "commitments" # "commitments" or "clubplayer"
gender = "m"
page_num = str(i)
source = "https://www.topdrawersoccer.com/search/?query=&divisionId=" + division + "&genderId=m&graduationYear=" + graduation + "&playerRating=&pageNo=" + page_num + "&area=" + area +""
all_tables = pd.read_html(source)
df = all_tables[0]
df['division'] = division
print('items:', len(df))
df.to_csv('results.csv', header=False, index=False, mode='a')
division()

Why is this while loop only doing the first loop?

I am trying to produce some mechanical stress info from windspeed data for a crane boom when it is between 0 and 90 degrees, with data from each angle saved into it's own file. I have the script working fine when doing just one file/angle, however when I try and use any sort of loop to do it for all angles it will create the files, but only the first has any data in it. I am a beginner and am not very savvy with Python, so I was hoping someone could spot something simple I have missed. I have included a short example file of the source data: Windspeed source file - cut down
import math
file = open("C:/Users/Jacob/Desktop/BOM Data/HD01D_Data_074272_999999999523840.txt", 'r')
boomDirection = 0
vaneSpeed = 120
maxShear = 75.97043478
maxVonMises = 500.0216811
while boomDirection < 91:
data_file = open("Bolt Stress - " + str(boomDirection) + " Degrees.csv", 'w')
line = file.readline()
line = file.readline()
while line != '':
try:
if len(line.split(','))>1:
windSpeedHigh = int(line.split(',')[19])
windSpeedLow = int(line.split(',')[22])
windDirection = int(line.split(',')[14])
relSpeedHigh = math.sin(math.radians((90-(boomDirection - windDirection))))*windSpeedHigh
relSpeedLow = math.sin(math.radians((90-(boomDirection - windDirection))))*windSpeedLow
VonMisesHigh = (maxVonMises/vaneSpeed)* relSpeedHigh
VonMisesLow = (maxVonMises/vaneSpeed)* relSpeedLow
data_file.write(str(round(VonMisesHigh,1)) + ('\n'))
data_file.write(str(round(VonMisesLow,1)) + ('\n'))
except ValueError:
pass
line = file.readline()
data_file.close()
boomDirection = boomDirection + 1

Bokeh Sliders for stacked vbar to increase segment size & HoverTool

I am aiming in the below code to make a stacked bar chart with bokeh, appended with sliders so I can increase or decrease the size of each bar segment and shift the others in turn.
My issue right now is that it will not update when running from a bokeh server. My guess is maybe bokeh does not run the calculations again after updating the source... Or I am getting a source conflict. (So far I have only implemented it for "Engineering". Wanted to get that to work before I sort the rest out.
Other things of note. I am using a depreciated technique of providing each glyph with bottom / top data as well as a source. This was done as it was the only way I could get the hovertool to show.
The only way I have got this to work was to redraw the graph completely, I would be ok with this option, but it was stacking the graphs on top of each other. Is there a way to clear all previous graphs in Bokeh? Obviously I would prefer a solution which just alters the data and doesn't completely redraw the graph.
from bokeh.plotting import figure, show, curdoc
from bokeh.models import NumeralTickFormatter
from bokeh.models import HoverTool
from bokeh.models import ColumnDataSource
from bokeh.layouts import widgetbox, column
from bokeh.models import CustomJS, Slider
from matplotlib import colors
import pandas as pd
import numpy as np
# Read Data
df=pd.read_csv('/home/mint/SAGD_Costs.csv')
# Master source
source = ColumnDataSource(df)
# Bar Tops Data
engtop = source.data['Engineering'][0]
equiptop = source.data['Engineering'][0] + source.data['Equipment'][0]
bulktop = source.data['Engineering'][0] + source.data['Equipment'][0] + source.data['Bulk_Materials'][0]
inditop = source.data['Engineering'][0] + source.data['Equipment'][0] + source.data['Bulk_Materials'][0] + source.data['Indirects'][0]
labtop = source.data['Engineering'][0] + source.data['Equipment'][0] + source.data['Bulk_Materials'][0] + source.data['Indirects'][0] + source.data['Labour'][0]
# Source for Stupid Hovertool
engsource = ColumnDataSource(data=dict(x=[0], y=[engtop], desc = ['Engineering']))
equipsource = ColumnDataSource(data=dict(x=[0], y=[equiptop-engtop], desc = ['Equipment']))
bulksource = ColumnDataSource(data=dict(x=[0], y=[bulktop-equiptop], desc = ['Bulk Materials']))
indisource = ColumnDataSource(data=dict(x=[0], y=[inditop-bulktop], desc = ['Indirects']))
labsource = ColumnDataSource(data=dict(x=[0], y=[labtop-inditop], desc = ['Labour']))
# HoverTool Label
hover = HoverTool(
tooltips=[
('Item', '#desc'),
('Cost', '#y{$ 0.00 a}'),
]
)
# Other Tools
TOOLS = 'box_zoom, box_select, resize, reset'
# Figure
p = figure(title="Capital Costs Breakdown", title_location="above", plot_width=600, plot_height=600, x_range=(-2, 2), tools=[TOOLS, hover])
# Plots
engbar = p.vbar(x=source.data['Year'][0], width=2, bottom=0,
top=engtop, alpha=0.75, color="darkslategrey", legend="Engineering", source=engsource)
equipbar = p.vbar(x=[source.data['Year'][0]], width=2, bottom=engtop,
top = equiptop, alpha=0.75, color="teal", legend="Equipment", source=equipsource)
bulkbar = p.vbar(x=[source.data['Year'][0]], width=2, bottom=equiptop,
top=bulktop, alpha=0.75, color="cyan", legend="Bulk Materials", source=bulksource)
indibar = p.vbar(x=[source.data['Year'][0]], width=2, bottom=bulktop,
top=inditop, alpha=0.75, color="powderblue", legend="Indirects", source=indisource)
labbar = p.vbar(x=[source.data['Year'][0]], width=2, bottom=inditop,
top=labtop, alpha=0.75, color="lavender", legend="Labour", source=labsource)
# Format
p.yaxis[0].formatter = NumeralTickFormatter(format="$0,000")
# Set up widgets
eng_slider = Slider(start=5000000, end=100000000, value=40000000, step=5000000, title="Engineering")
def update_data(attrname, old, new):
# Get the current slider values
a = eng_slider.value
# Generate the new curve
df['Engineering'][0] = a
source = ColumnDataSource(df)
#source.data = dict(x=x, y=y)
for w in [eng_slider]:
w.on_change('value', update_data)
# Set up layouts and add to document
inputs = widgetbox(eng_slider)
# Show!
curdoc().add_root(column(inputs, p))
curdoc().title = "Sliders"
Picture of Current Graph
Dataset

Not sure the etiquette on answering your own question... Its mostly fixed however the Hovertools are not working correctly. As the Hovertool is #y it is showing the total of the stack at each item. I want it to show the difference. Is it possible to calculate a value for the HoverTool?
In case my situation helps someone, The mistake I was making above is I was changing a value with the slider which had to be then passed through a calculation before being passed into the Glyph.
The correct way, is to do any calculations within the update function
If you come from Pandas & Matplotlib like me you may end up building in df column calls into your charts e.g. x = df['Column_name'][0]. When plotting with Glyphs in bokeh, I believe the correct way is to create a source with the data you want, so you can just pass x and y into your Glyph. See the: Master source, Get source data, Calculate Top & Bottom and New Sources from my code below.
# Read Data
df=pd.read_csv('/home/mint/SAGD_Costs.csv')
# Master source
source = ColumnDataSource(df)
# Get source data
a = source.data['Engineering'][0]
b = source.data['Equipment'][0]
c = source.data['Bulk_Materials'][0]
d = source.data['Indirects'][0]
e = source.data['Labour'][0]
# Calculate Top & Bottom
ab = 0
at = a
bb = a
bt = a + b
cb = a + b
ct = a + b + c
db = a + b + c
dt = a + b + c + d
eb = a + b + c + d
et = a + b + c + d + e
# New sources
engsource = ColumnDataSource(data=dict(x=[ab], y=[at], desc = ['Engineering']))
equipsource = ColumnDataSource(data=dict(x=[bb], y=[bt], desc = ['Equipment']))
bulksource = ColumnDataSource(data=dict(x=[cb], y=[ct], desc = ['Bulk Materials']))
indisource = ColumnDataSource(data=dict(x=[db], y=[dt], desc = ['Indirects']))
labsource = ColumnDataSource(data=dict(x=[eb], y=[et], desc = ['Labour']))
# HoverTool Label
hover = HoverTool(
tooltips=[
('Item', '#desc'),
('Cost', '#y{$ 0.00 a}'),
]
)
# Other Tools
TOOLS = 'box_zoom, box_select, resize, reset'
# Figure
p = figure(title="Capital Costs Breakdown", title_location="above", plot_width=600, plot_height=600, x_range=(-2, 2), tools=[TOOLS, hover])
# Plots
engbar = p.vbar(x=0, width=2, bottom = 'x',
top ='y', alpha=0.75, color="darkslategrey", legend="Engineering", source=engsource)
equipbar = p.vbar(x=0, width=2, bottom = 'x',
top = 'y', alpha=0.75, color="teal", legend="Equipment", source=equipsource)
bulkbar = p.vbar(x=0, width=2, bottom = 'x',
top ='y', alpha=0.75, color="cyan", legend="Bulk Materials", source=bulksource)
indibar = p.vbar(x=0, width=2, bottom = 'x',
top ='y', alpha=0.75, color="powderblue", legend="Indirects", source=indisource)
labbar = p.vbar(x=0, width=2, bottom = 'x',
top = 'y', alpha=0.75, color="lavender", legend="Labour", source=labsource)
# Format
p.yaxis[0].formatter = NumeralTickFormatter(format="$0,000")
# Set up widgets
eng_slider = Slider(start=5000000, end=100000000, value=40000000, step=5000000, title="Engineering")
equip_slider = Slider(start=5000000, end=100000000, value=40000000, step=5000000, title="Equipment")
bulk_slider = Slider(start=5000000, end=100000000, value=40000000, step=5000000, title="Bulk_Materials")
indi_slider = Slider(start=5000000, end=100000000, value=40000000, step=5000000, title="Indirects")
lab_slider = Slider(start=5000000, end=100000000, value=40000000, step=5000000, title="Labour")
def update_data(attrname, old, new):
# Get the current slider values
a = eng_slider.value
b = equip_slider.value
c = bulk_slider.value
d = indi_slider.value
e = lab_slider.value
# Calculate Top & Bottom
ab = 0
at = a
bb = a
bt = a + b
cb = a + b
ct = a + b + c
db = a + b + c
dt = a + b + c + d
eb = a + b + c + d
et = a + b + c + d + e
# New sources
engsource.data=dict(x=[ab], y=[at], desc = ['Engineering'])
equipsource.data=dict(x=[bb], y=[bt], desc = ['Equipment'])
bulksource.data=dict(x=[cb], y=[ct], desc = ['Bulk Materials'])
indisource.data=dict(x=[db], y=[dt], desc = ['Indirects'])
labsource.data=dict(x=[eb], y=[et], desc = ['Labour'])
for w in [eng_slider, equip_slider, bulk_slider, indi_slider, lab_slider]:
w.on_change('value', update_data)
# Set up layouts and add to document
inputs = widgetbox(eng_slider, equip_slider, bulk_slider, indi_slider, lab_slider)
# Show!
curdoc().add_root(column(inputs, p))
curdoc().title = "Sliders"

Import excel data and keep date time

thanks in advance for your help. i'm importing data from excel using openpyxl though i'd like to get strings into datetime, below is the code i'm using:
import openpyxl, pprint, datetime
print ('Opening workbook...')
wb= openpyxl.load_workbook('ACLogs_test_Conv2.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
print sheet
ACLogsData = {}
print ('Reading rows...')
for row in range(2, sheet.max_row +1):
pangalan = sheet['B' + str(row)].value
dates = sheet['D' + str(row)].value
time = sheet['E' + str(row)].value
ACLogsData.setdefault(pangalan,{})
ACLogsData[pangalan].setdefault(dates,{})
ACLogsData[pangalan][dates].setdefault(time)

use datetime.strptime()
FMT = '%H:%M' # Whatever format your times are in
for row in range(2, sheet.max_row +1):
pangalan = sheet['B' + str(row)].value
dates = sheet['D' + str(row)].value
time = datetime.strptime(sheet['E' + str(row)].value, FMT)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Scraping Billboard Data with Python; code doesn't crawl to previous chart - python-3.x

Related

ValueError: Length of values (1) does not match length of index (50)

Pandas and adding column and data to a table

Why is this while loop only doing the first loop?

Bokeh Sliders for stacked vbar to increase segment size & HoverTool

Import excel data and keep date time

Categories

Resources