I have an if else statement in my function that is not wokring the way i want it to. Mind you I am still learning python and all things programming.
I have a function to define a plot. Idea is to create a large python repo for data analysis. EDIT: i added a working make shift dataframe for you to try
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
#import numpy as np
#import os
#import dir_config as dcfg
#import data_config as datacfg
import matplotlib.dates as md
#import cartopy.crs as ccrs
data = {'dates': [20200901,20200902,20200903,20200904,20200905,20200906,20200907,20200908,20200909,20200910],
'depth': [1,2,3,4,5,6,7,8,9,10],
'cond': [30.1,30.2,30.3,30.6,31,31.1,31.0,31.4,31.1,30.9]
}
df = pd.DataFrame(data, columns = ['dates', 'depth', 'cond'])
df['pd_datetime'] = pd.to_datetime(df['dates'])
def ctd_plots_timeseries(time=[],cond=[], sal =[], temp=[], depth=[], density=[]):
#-----------
# CONDUCTIVITY PLOT
#-----------
if cond == []:
print("there is no data for cond")
pass
else:
plt.scatter(time,depth,s=15,c=cond,marker='o', edgecolor='none')
plt.show()
#-----------
# SALINITY (PSU) PLOT: I do not want this to plot at all due to its parameter being 'empty' in the function when called
#-----------
if sal == []:
print('there is no salinity data')
pass
else:
plt.scatter(time,depth,s=15,c=sal,marker='o', edgecolor='none')
plt.show()
ctd_plots_timeseries(depth = df['depth'], time = df['pd_datetime'], cond = df['cond'])
The idea here is that if there is no data in the cond value, do pass to not show the plot.
However ever time I run this, the plot shows, even thought there is no data for it.
When i call the function i put in plot_timeseries(time=time_data, depth=depth_data temp=temp_data)
my aim is for only the temp data in this example to show, not a cond graph with no variables.
what i have tried is
if cond != []:
plotting code
plt.show()
else:
print('there is no cond data')
pass
and
plotting code
if cond == []:
print('no cond data')
pass
else:
plt.show()
to no avail.
note that there are 4 other plots in this function i would like to do the same thing. thanks for any insight this community can give me.
UPDATE:
I changed the conditions in the function to def ctd_plots_timeseries(time=0,cond=0, sal =0, temp=0, depth=0, density=0):
and then changed conditional statement to
if cond != 0:
graphing code
else:
print('no data here')
i get the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I've simplified it. Try that:
def plots_timeseries(cond = []): # Single argument for clarity
if not cond:
print('there is no cond value')
else:
print('There is cond')
plots_timeseries()
# there is no cond value
So I figured out a working solution.
if len(cond) == 0:
print('there is no cond data')
else:
plt.scatter(time,depth,s=15,c=cond)
plt.show()
Lots of time and effort was put in to trying to solve this, and this solution was a test, after a good night sleep. Thanks for all the help. hope this helps someone else if that have a similar issue
Related
I am trying to grab some finanical data off a financial website. I wanna manipulate df['__this value__']. I did some research myself, and I understand the error fine, but I really have no idea how to fix it. This is how my code is like:
import requests
import bs4
import os
import pandas as pd
def worker_names(code=600110):
......
df = pd.DataFrame({'class': name_list})
worker_years(df)
def worker_years(df, code=600110, years=None):
if years is None:
years = ['2019', '2018', '2017', '2016']
url = 'http://quotes.money.163.com/f10/dbfx_'\
+ str(code) + '.html?date='\
+ str(years) + '-12-31,'\
+ str(years) + '-09-30#01c08'
......
df['{}-12-31'.format(years)] = number_list # this is where the problem is
df = df.drop_duplicates(subset=['class'], keep=False)
df.to_csv(".\\__fundamentals__\\{:0>6}.csv".format(code),
index=False, encoding='GBK')
print(df)
if __name__ == '__main__':
pd.set_option('display.max_columns', None)
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
fundamental_path = '.\\__fundamentals__'
stock_path = '.\\__stock__'
worker_names(code=600110)
Is there any ways that I can work around? please help! THX ALL!
your codes df['{}-12-31'.format(years)] = number_list demostrate a very good example of you can't make 2 variables on both side of the equation. Try this:
df_year = pd.DataFrame({'{}'.format(year): number_list})
df = pd.concat([df, df_year], axis=1)
work around with dataframe, there are many different ways to get the same result.
I'm pretty new to Python and this is my first time using Bokeh. I've followed a tutorial using NFL data to show graphs and I cannot get the graph to show on my machine. The script runs without error, but nothing shows. I'm sure I'm missing something very simple... but I just don't know what that is... Can someone please help me? Below is my code:
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, FactorRange, FixedTicker
from bokeh.io import output_notebook
from collections import Counter
from bokeh.transform import factor_cmap
from bokeh.palettes import Paired, Spectral
import itertools
pd.set_option('display.max_columns', 150)
output_notebook()
filename = '/Users/ksilva/Downloads/NFL Play by Play 2009-2017 (v4).csv'
df = pd.read_csv(filename,dtype={25: object, 51: object})
# print(df.shape)
# df['down'].isnull().sum()
pd.to_numeric(df['down'], errors='coerce').isnull().sum()
# print(df.loc[51])
# filter by team if desired
team = 'all'
if team == 'all':
team_df = df
else:
team_df = df.loc[df['posteam'] == team]
# drop rows will null in the 'down' column
team_df = team_df.loc[df['down'].notnull()]
all_play_types = Counter(team_df['PlayType'])
# print(team_df)
# print(all_play_types)
# list of downs I care about
downs = ['1','2','3','4']
# list of plays I care about
plays = ['Pass', 'Run', 'Punt', 'Field Goal']
# define x-axis categories to be used in the vbar plot
x = list(itertools.product(downs, plays))
# x = [('1', 'Pass'), ('1', 'Run'), ('1', 'Punt'), ..., ('4', 'Punt'), ('4', 'Field Goal')]
# create a list of Counters for each down--will include ALL PlayTypes for each down
plays_on_down = [Counter(team_df.loc[team_df['down'] == int(down)]['PlayType']) for down in downs]
# create a list of counts for each play in plays for each down in downs
counts = [plays_on_down[int(down)-1][play] for down, play in x]
# load the into the ColumnDataSource
source = ColumnDataSource(data=dict(x=x, counts=counts))
# get the figure ready
p = figure(x_range=FactorRange(*x), plot_height=350, plot_width=750, title='Play by Down',
toolbar_location=None, tools='')
# create the vbar
p.vbar(x='x', top='counts', width=0.9, source=source, line_color='white',
fill_color=factor_cmap('x', palette=Spectral[4], factors=plays, start=1, end=2))
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xaxis.axis_label = 'Down'
p.yaxis.axis_label = 'Number of Plays'
p.xgrid.grid_line_color = None
show(p)
For whatever reason, nothing happens when executed from the terminal.
Any help is greatly appreciated!
Thanks.
You are setting calling output_notebook. This activates a mode that only diplays in a Jupyter notebook. If you want to execute plain python scripts to generate HTML file output, you should use output_file.
I am having trouble with coding this as i am quite new, my code is supposed to take data, return average, as well as make a chart for the information and plot the peaks/valleys of the data
I don't know why it isn't returning and that fault is what makes the code faulty towards the end, the only part that shows error is the code that is supposed to find the peaks/valley values of my code. It also shows invalid syntax when I try to develop a variable (more specifically 'original_data')
EDIT: Thanks to Jono and Ken, I have fixed some of my code, but I checked the values of my lists and they only have one value stored in each, so its not printing all the peaks/valleys of the dataset i had. I getting KeyError: -331 and i can't find results as to how to fix it
# My Favorite Function
import os
clear = lambda: os.system('cls')
clear()
#import modules
import pandas as pd
import matplotlib.pyplot as plt
#key variables
data_set = pd.read_csv('C:/Users/sanderj/Documents/Work/Work_Experience/Day4.csv')
data = data_set["Data"]
peaks = []
valleys = []
#loop functions
for x in data:
if data[x] == data[0] and data[x] > data[x+1]:
peaks.append(x)
elif data[x] > data[x+1] and data[x] > data[x-1]:
peaks.append(x)
else:
continue
for x in data:
if data[x] == data[0] and data[x] < data[x+1]:
valleys.append(x)
elif data[x] < data[x+1] and data[x] < data[x-1]:
valleys.append(x)
else:
continue
#establishing points
a = peaks
b = valleys
plt.plot(a, b, 'ro')
plt.axis([ 0, 1024, -1000, 1000])
plt.title("Peaks and Valleys")
#final
clear()
plt.show()
I'm trying to use a slider with a callback in Bokeh using Python 3 to filter the rows of my ColumnDataSource objects (which originate from a DataFrame). More specifically, if a slider with options of 0 to 10000000 (in multiples of 1 million) returns a value N of say 2000000, then I want my plot to only show the data for, in this case, US counties where the population is >= 2000000. Below is my code. Everything works as I want it to except for the slider callback.
from bokeh.io import curdoc
from bokeh.layouts import layout
from bokeh.models import HoverTool, ColumnDataSource, Select, Slider
from bokeh.plotting import figure
TOOLS='pan,wheel_zoom,box_zoom,reset,tap,save,box_select,lasso_select'
source1 = ColumnDataSource(df[df.winner == 'Democratic'])
source2 = ColumnDataSource(df[df.winner == 'Republican'])
hover = HoverTool(
tooltips = [
('County Name', '#county'),
('Population', '#population'),
('Land Area', '#land_area'),
('Pop. Density', '#density'),
('Winning Party', '#winner'),
('Winning Vote %', '#winning_vote_pct'),
]
)
# Plot
plot = figure(plot_width=800, plot_height=450, tools=[hover, TOOLS],
title='2016 US Presidential Vote % vs. Population Density (by County)',
x_axis_label='Vote %', y_axis_label='Population Density (K / sq. mi.)')
y = 'density'
size = 'bokeh_size'
alpha = 0.5
c1 = plot.circle(x='pct_d', y=y, size=size, alpha=alpha, color='blue',
legend='Democratic-Won County', source=source1)
c2 = plot.circle(x='pct_r', y=y, size=size, alpha=alpha, color='red',
legend='Republican-Won County', source=source2)
plot.legend.location = 'top_left'
# Select widget
party_options = ['Show both parties', 'Democratic-won only', 'Republican-won only']
menu = Select(options=party_options, value='Show both parties')
# Slider widget
N = 2000000
slider = Slider(start=0, end=10000000, step=1000000, value=N, title='Population Cutoff')
# Select callback
def select_callback(attr, old, new):
if menu.value == 'Democratic-won only': c1.visible=True; c2.visible=False
elif menu.value == 'Republican-won only': c1.visible=False; c2.visible=True
elif menu.value == 'Show both parties': c1.visible=True; c2.visible=True
menu.on_change('value', select_callback)
# Slider callback
def slider_callback(attr, old, new):
N = slider.value
# NEED HELP HERE...
source1 = ColumnDataSource(df.loc[(df.winner == 'Democratic') & (df.population >= N)])
source2 = ColumnDataSource(df.loc[(df.winner == 'Republican') & (df.population >= N)])
slider.on_change('value', slider_callback)
# Arrange plots and widgets in layouts
layout = layout([menu, slider],
[plot])
curdoc().add_root(layout)
Here is a solution using CustomJSFilter and CDSView as suggest in the other answer by Alex. It does not directly use the data as supplied in the question, but is rather a general hint how this can be implemented:
from bokeh.layouts import column
from bokeh.models import CustomJS, ColumnDataSource, Slider, CustomJSFilter, CDSView
from bokeh.plotting import Figure, show
import numpy as np
# Create some data to display
x = np.arange(200)
y = np.random.random(size=200)
source = ColumnDataSource(data=dict(x=x, y=y))
plot = Figure(plot_width=400, plot_height=400)
# Create the slider that modifies the filtered indices
# I am just creating one that shows 0 to 100% of the existing data rows
slider = Slider(start=0., end=1., value=1., step=.01, title="Percentage")
# This callback is crucial, otherwise the filter will not be triggered when the slider changes
callback = CustomJS(args=dict(source=source), code="""
source.change.emit();
""")
slider.js_on_change('value', callback)
# Define the custom filter to return the indices from 0 to the desired percentage of total data rows. You could also compare against values in source.data
js_filter = CustomJSFilter(args=dict(slider=slider, source=source), code=f"""
desiredElementCount = slider.value * 200;
return [...Array(desiredElementCount).keys()];
""")
# Use the filter in a view
view = CDSView(source=source, filters=[js_filter])
plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6, view=view)
layout = column(slider, plot)
show(layout)
I hope this helps anyone who stumbles upon this in the future! Tested in bokeh 1.0.2
A quick solution with minimal change to your code would be:
def slider_callback(attr, old, new):
N = new # this works also with slider.value but new is more explicit
new1 = ColumnDataSource(df.loc[(df.winner == 'Democratic') & (df.population >= N)])
new2 = ColumnDataSource(df.loc[(df.winner == 'Republican') & (df.population >= N)])
source1.data = new1.data
source2.data = new2.data
When updating data sources, you should replace the data, not the whole object. Here I still create new ColumnDataSource as shortcut. A more direct way (but more verbose too) would be to create the dictionary from the filtered df's columns:
new1 = {
'winner': filtered_df.winner.values,
'pct_d': filtered_df.pct_d.values,
...
}
new2 = {...}
source1.data = new1
source2.data = new2
Note that there's another solution which would make the callback local (not server based) by using a CDSView with a CustomJSFilter. You can also write the other callback with a CDSView as well make the plot completely server-independent.
Im trying to code a function that plots the error of the composite trapezoidal rule against the step size.
Obviously this doesn't look to good since i'm just starting to learn these things.
Anyhow i managed to get the plot and everything, but i'm supposed to get a plot with slope 2, so i am in need of help to figure out where i did go wrong.
from scipy import *
from pylab import *
from matplotlib import *
def f(x): #define function to integrate
return exp(x)
a=int(input("Integrate from? ")) #input for a
b=int(input("to? ")) #inpput for b
n=1
def ctrapezoidal(f,a,b,n): #define the function ctrapezoidal
h=(b-a)/n #assign h
s=0 #clear sum1-value
for i in range(n): #create the sum of the function
s+=f(a+((i)/n)*(b-a)) #iterate th sum
I=(h/2)*(f(a)+f(b))+h*s #the function + the sum
return (I, h) #returns the approximation of the integral
val=[] #start list
stepsize=[]
error=[]
while len(val)<=2 or abs(val[-1]-val[-2])>1e-2:
I, h=ctrapezoidal(f,a,b,n)
val.append(I)
stepsize.append(h)
n+=1
for i in range(len(val)):
error.append(abs(val[i]-(e**b-e**a)))
error=np.array(error)
stepsize=np.array(stepsize)
plt.loglog(stepsize, error, basex=10, basey=10)
plt.grid(True,which="both",ls="steps")
plt.ylabel('error')
plt.xlabel('h')