How to visualize the list of lists? - python-3.x

I have two lists
l1 = [[['Arsenal F.C.']],[['Chelsea F.C.']],[['FC Barcelona']], [['FC Barcelona'], ['NFL']],[['Formula E']], [['Formula E'], ['NBA']], [['Hashtag United F.C.']],[['India National Cricket Team']], [['J1 League']],[['Liverpool F.C.']],[['Manchester United F.C.']], [['Manchester United F.C.'], ['LaLiga']], [['Manchester United F.C.'], ['LaLiga'], ['Real Madrid C.F.']]]
l2 = [2, 1, 5, 1, 4, 1, 1, 2, 1, 1, 3, 1, 1]
l1 has the name of the teams together and l2 has the frequency of occurrence of each team. I want to visualize this with something like a bar chart where x axis has team names and y-axis has the respective frequencies.My code looks like following:
fig,ax = plt.subplots(figsize=(30,12))
_ = ax.set_title("combination searched")
_ = ax.bar(l1,l2)
_ = ax.set_xlabel("teams")
_ = ax.set_ylabel("No of times combination is searched")
plt.show()
I also wanted to get teams as xticks but I got error while plotting
I got the following Error:
TypeError: the dtypes of parameters x (object) and width (float64) are incompatible

this is working for me now
l3 = [str(v) for v in unique]
plt.figure(figsize = (50,30))
plt.barh(l3,counts)
plt.show()

Related

Python optimization of time-series data re-indexing based on multiple-parameter multi-varialbe input and singular value output

I am trying to optimize a funciton that is trying to maximize the correlation between two (pandas) time series arrays (X and Y). This is done by using three parameters (a, b, c) and a third time series array (Z). The Z array is used to reindex the values in the X array (based on the parameters a, b, c) in such a way as to maximize the correlation of the reindexed X array (Xnew) with the Y array.
Below is some pseudo-code to demonstrate what I amy trying to do. I have attempted this using LMfit and scipy optimize but I am not sure how to make this task work in those packages. For example in LMfit if I tried to minimize the MyOpt function (which passes back a single value of the correlation metric) then it complains that I have more parameters than outputs. However, if I pass back the time series of the corrlation metric (diff) the the parameter values remain fixed at their input values.
I know the reindexing function I am using works because using the rather crude methods similar to the code below give signifianct changes in the mean (diff) metric passed back.
My knowledge of these optimizaiton packages is not up to scratch for this job so if anyone has a suggestion on how to tackle this, I would be greatfull.
def GetNewIndex(Z, a, b ,c):
old_index = np.arange(0, len(Z))
index_adj = some_func(a,b,c)
new_index = old_index + index_adj
max_old = np.max(old_index)
new_index[new_index > max_old] = max_old
new_index[new_index < 0] = 0
return new_index
def MyOpt(params, X, Y ,Z):
a = params['A']
b = params['B']
c = params['C']
# estimate lag (in samples) based on ambient RH
new_index = GetNewIndex(Z, a, b, c)
# assign old values to new locations and convert back to pandas series
Xnew = np.take(X.values, new_index)
Xnew = pd.Series(Xnew, index=X.index)
cc = Y.rolling(1201, center=True).corr(Xnew)
cc = cc.interpolate(limit_direction='both', limit_area=None)
diff = 1-np.abs(cc)
return np.mean(diff)
#==================================================
X = some long pandas time series data
Y = some long pandas time series data
Z = some long pandas time series data
As = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]
Bs = [0, 0 ,0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
Cs = [5, 6, 5, 6, 5, 6, 5, 6, 5, 6, 5, 6]
outs = []
for A, B, C in zip(As, Bs, Cs):
params={'A':A, 'B':B, 'C':C}
out = MyOpt(params, X, Y, Z)
outs.append(out)

Multiply each row of an array with coefficients in list - Python

I am very new to Python an need help. This is the problem statement:
I want to calculate the value of each of the three houses by multiplying the rows of the arraym X (each row representing one house) with the coefficients in list c, so for the first house: Price = (66x3000)+(5x200)+ (15x-50) + (2x5000) + (500x100) = 258.000
Do not use numpy
Print the price of the three houses
This is what I have so far:
# input values for three houses:
# - size [m^2],
# - size of the sauna [m^2],
# - distance to water [m],
# - number of indoor bathrooms,
# - proximity of neighbors [m]
X = [[66, 5, 15, 2, 500],
[21, 3, 50, 1, 100],
[120, 15, 5, 2, 1200]]
# coefficient values
c = [3000, 200 , -50, 5000, 100]
def predict(X, c):
price = 0
for i in range (len(X)):
for j in range (len(X[i])):
price += (c[j]*X[i][j])
print(price)
predict(X, c)
The output is
258250
334350
827100.
The program adds the value of the 2nd an 3rd hourse the the previous result, rather than returning each house's value. How can I fix this?
Many thanks!
Move the line
price = 0
into the outer for loop:
def predict(X, c):
for i in range (len(X)):
price = 0
for j in range (len(X[i])):
...

Interactive Plot of Pandas Data-frame Color coding based on a group from a Column

I have an example pandas dataframe as follows:
day id cnt
2 catx 4
2 kagm 3
2 dyrt 5
3 catx 3
3 kagm 3
3 dyrt 4
5 catx 2
5 kagm 2
5 dyrt 2
I want to plot the scatter data cnt (y) vs day(x), where the points will be labeled (colored/legend) based on the id column.
Now this is pretty simple in seaborn/matplotlib which I know can be plotted and the plot can be saved to a file.
However, I am looking to have an interactive plot using plotly/bokeh/d3/mp3ld etc and finally, put that plot into an url (of my choice or maybe an account based as in plotly). My goal is also to have hover function, which will show me the value of the points when I take the cursor over a specific cursor point.
I have tried bokeh/plotly with cufflinks using ColumnDataSource and everything to try out to get the plots. However, have failed to get anything which I am looking for. Can I get some help in this direction from the experts? Thanks in anticipation.
This code plots the data the way you requested. I created a new dataframe for every category in your dataframe so the interactive legend also works. An array with hex color strings is generated with the length of the number of unique categories and added to the dataframe to give every category it's own color.
#!/usr/bin/python3
import pandas as pd
from bokeh.models import ColumnDataSource
from bokeh.palettes import all_palettes
from bokeh.plotting import figure, output_file, show
data = {'day': [2, 2, 2, 3, 3, 3, 5, 5, 5], 'id': ['catx', 'kagm', 'dyrt', 'catx', 'kagm', 'dyrt', 'catx', 'kagm', 'dyrt'], 'cnt': [4, 3, 5, 3, 3, 4, 2, 2, 2]}
df = pd.DataFrame.from_dict(data)
output_file('plot.html')
tooltips = [
("day", "#day"),
("id", "#$name"),
("count", "#cnt")]
p = figure(tooltips=tooltips, plot_width=800, plot_height=800)
sources = []
colors = all_palettes['Viridis'][len(set(df['id'].tolist()))]
pd.options.mode.chained_assignment = None #Supress false positive warning
for ID, color in zip(set(df['id'].tolist()), colors):
dfSubset = df.loc[df['id'] == ID]
dfSubset['color'] = color
sources.append(ColumnDataSource(dfSubset))
p.circle(x = 'day', y = 'cnt', legend = 'id', color = 'color', name = 'id', alpha = 0.5, size = 15, source = sources[-1])
p.legend.click_policy="hide"
show(p)

Is there a way to reverse the order of the y axis in heatmap of plotly

So I have
hours = [x for x in range(7,18)]
columns = [1, 2, 3, 4, 5]
matrixDatos = [[0,1,0,1,0],
[0,1,0,1,1],
[2,3,2,3,2],
[2,3,2,3,3],
[4,5,4,5,4],
[4,5,4,5,5],
[6,7,6,7,6],
[6,7,6,7,7],
[8,9,8,9,8],
[8,9,8,9,8]
]
table = ff.create_table(matrixDatos)
fig = ff.create_annotated_heatmap(matrixDatos, x=columns, y=hours, colorscale='Viridis')
But it prints the heatmap with the y axis from 18 to 7 is there a way to print it from 7 to 18?
Hi I tried the code provided, I was getting an error saying that number of Y-axis (hours) does not equal the number of Z-axis (matrixDatos). So I reduced the range from 7 to 16 for the code to work.
I used the "autorange" parameter of the xaxis object in layout object, to reverse the axis we need to use "reversed" parameter.
Original Code (provided in question) Output:
Code Change:
hours = [x for x in range(7,17)]
columns = [1, 2, 3, 4, 5]
matrixDatos = [[0,1,0,1,0],
[0,1,0,1,1],
[2,3,2,3,2],
[2,3,2,3,3],
[4,5,4,5,4],
[4,5,4,5,5],
[6,7,6,7,6],
[6,7,6,7,7],
[8,9,8,9,8],
[8,9,8,9,8]
]
table = ff.create_table(matrixDatos)
fig = ff.create_annotated_heatmap(matrixDatos, x=columns, y=hours, colorscale='Viridis')
fig['layout']['yaxis']['autorange'] = "reversed"
iplot(fig)
Code Change Output:
I hope this is what you need.
References:
plotly layout xaxis reference
Add
fig.update_yaxes(autorange="reversed")
before showing figure.

Social Graph of common users of groups

Trying to create Social graph using NetworkX in theory(as i think) everything is good works, but in practice works wrong.
So i've got information about some groups in such format:
members={'Group Name 1':[User 1 ID, User ID 2...],...,'Group Name N' : [User 1 ID,...,User K Id]}
For example:
members={'Group 1' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Group 2' : [10, 11, 12, 13, 14, 9],
'Group 3':[21,22,23,24] }
In outcome i need graph in which:
Vertices - Social Group
Edges - the existence of common subscribers (User IDs)
Vertices Size - Users Count
distance between Vertices - common Subscribers (User IDs)
My code:
matrix={}
for i in members:
for j in members:
if i!=j:
matrix[i+j]=len(set(members[i]) & set(members[j]))*1.0/min(len(set(members[i])),len(set(members[j])))
max_matrix = max(matrix.values())
min_matrix = min(matrix.values())
for i in matrix:
matrix[i] = (matrix[i] - min_matrix) / (max_matrix - min_matrix)
g = networkx.Graph(directed=False)
for i in members:
for j in members:
if i != j:
g.add_edge(i, j, weight=matrix[i+j])
members_count = {x:len(members[x]) for x in members}
max_value = max(members_count.values()) * 1.0
size = []
max_size = 900
min_size = 100
for node in g.nodes():
size.append(((members_count[node]/max_value)*max_size + min_size)*10)
import matplotlib.pyplot as plt
pos=networkx.spring_layout(g)
plt.figure(figsize=(20,20))
networkx.draw_networkx(g, pos, node_size=size, width=0.5, font_size=8)
plt.axis('off')
plt.show()
BUT, i can't understand why Edges drawing for groups which have no common IDs.
NetworkX only use weight as an attribute of edges. Whether there is an edge or not doesn't depend on edges' weights.
In other word, Those edges with weight 0 are also count as edges and it will be displayed by drawing function.

Resources