Plot Bar graph with multiple series - python-3.x

I currently have three dictionaries that have the same keys but have different values since they are for three different years. I am trying to create a comparison between the years. I was wondering how I could go about plotting all three graphs on the same axis. I have the following code
import pandas as pd
dt = {'2018':nnum.values, '2019':nnum2.values, '2020':nnum3.values,}
df1=pd.DataFrame(dt,index=nnum.keys())
df1.plot.bar()
I am trying to do something along this line since each dictionary contains the same keys but I keep getting an error. Is there anyway to set the index to the keys of the dictionary without having to type it out manually

Add () for values, convert to lists in dictionary dt for dictionary of lists:
nnum = {1:4,5:3,7:3}
nnum2 = {7:8,9:1,1:0}
nnum3 = {0:7,4:3,8:5}
dt = {'2018': list(nnum.values()), '2019':list(nnum2.values()), '2020':list(nnum3.values())}
df1=pd.DataFrame(dt,index=nnum.keys())
print(df1)
2018 2019 2020
1 4 8 7
5 3 1 3
7 3 0 5
df1.plot.bar()
EDIT: If there is different length of dictionaries and need new values filled by 0 is possible use:
nnum = {1:4,5:3,7:3}
nnum2 = {7:8,9:1,1:0}
nnum3 = {0:7,4:3}
from itertools import zip_longest
L = [list(nnum.values()), list(nnum2.values()), list(nnum3.values())]
L = list(zip_longest(*L, fillvalue=0))
df1 = pd.DataFrame(L,index=nnum.keys(), columns=['2018','2019','2020'])
print(df1)
2018 2019 2020
1 4 8 7
5 3 1 3
7 3 0 0

Related

Combine dataframe within the list to form a single dataframe using pandas in python [duplicate]

This question already has answers here:
How to merge two dataframes side-by-side?
(6 answers)
Closed 2 years ago.
Let say I have a list df_list with 3 single column pandas dataframe as below:
>>> df_list
[ A
0 1
1 2
2 3, B
0 4
1 5
2 6, C
0 7
1 8
2 9]
I would like to merge them to become a single dataframe dat as below:
>>> dat
A B C
0 1 4 7
1 2 5 8
2 3 6 9
One way I can get it done is to create a blank dataframe and concatenate each of them using for loop.
dat = pd.DataFrame([])
for i in range(0, len(df_list)):
dat = pd.concat([dat, df_list[i]], axis = 1)
Is there a more efficient way to achieve this without using iteration? Thanks in advance.
Use concat with list of DataFrames:
dat = pd.concat(df_list, axis = 1)

Python: dictionaries of different dimensions to excel

Is there any effcient way to write different dimensions dictionaries to excel using pandas?
Example:
import pandas as pd
mylist1=[7,8,'woo']
mylist2=[[1,2,3],[4,5,6],['foo','boo','doo']]
d=dict(y=mylist1,x=mylist2)
df=pd.DataFrame.from_dict(d, orient='index').transpose().fillna('')
writer = pd.ExcelWriter('output.xls',engine = 'xlsxwriter')
df.to_excel(writer)
writer.save()
The current results,
The desired results,
Please note that my database is much bigger than this simple example. So a generic answer would be appreciated.
You can fix your dataframe first before exporting to excel:
df=pd.DataFrame.from_dict(d, orient='index').transpose()
df = pd.concat([df["y"],pd.DataFrame(df["x"].tolist(),columns=list("x"*len(df["x"])))],axis=1)
Or do it upstream:
df = pd.DataFrame([[a, *b] for a,b in zip(mylist1, mylist2)],columns=list("yxxx"))
Both yield the same result:
y x x x
0 7 1 2 3
1 8 4 5 6
2 woo foo boo doo
Get first appropriate format then save to excel.
df = df.join(df.x.apply(pd.Series)).drop('x',1)
df.columns = list('yxxx')
df
y x x x
0 7 1 2 3
1 8 4 5 6
2 woo foo boo doo
For Dynamic columns name
df.columns = ['y'] + list('x' * (len(df.columns)-1))

How can I delete useless strings by index from a Pandas DataFrame defining a function?

I have a DataFrame, namely 'traj', as follow:
x y z
0 5 3 4
1 4 2 8
2 1 1 7
3 Some string here
4 This is spam
5 5 7 8
6 9 9 7
... #continues repeatedly a lot with the same strings here in index 3 and 4
79 4 3 3
80 Some string here
I'm defining a function in order to delete useless strings positioned in certain index from the DataFrame. Here is what I'm trying:
def spam(names,df): #names is a list composed, for instance, by "Some" and "This" in 'traj'
return df.drop(index = ([traj[(traj.iloc[:,0] == n)].index for n in names]))
But when I call it it returns the error:
traj_clean = spam(my_list_of_names, traj)
...
KeyError: '[(3,4,...80)] not found in axis'
If I try alone:
traj.drop(index = ([traj[(traj.iloc[:,0] == 'Some')].index for n in names]))
it works.
I solved it in a different way:
df = traj[~traj[:].isin(names)].dropna()
Where names is a list of the terms you wish to delete.
df will contain only rows without these terms

How to specify a random seed while using Python's numpy random choice?

I have a list of four strings. Then in a Pandas dataframe I want to create a variable randomly selecting a value from this list and assign into each row. I am using numpy's random choice, but reading their documentation, there is no seed option. How can I specify the random seed to the random assignment so every time the random assignment will be the same?
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
df['SERVICE_CODE'] = [np.random.choice(service_code_options ) for i in df.index]
You need define it before by numpy.random.seed, also list comprehension is not necessary, because is possible use numpy.random.choice with parameter size:
np.random.seed(123)
df = pd.DataFrame({'a':range(10)})
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
df['SERVICE_CODE'] = np.random.choice(service_code_options, size=len(df))
print (df)
a SERVICE_CODE
0 0 13.59P
1 1 12.42R
2 2 13.59P
3 3 13.59P
4 4 899.59O
5 5 13.59P
6 6 13.59P
7 7 12.42R
8 8 204.68L
9 9 13.59P
Documentation numpy.random.seed
np.random.seed(this_is_my_seed)
That could be an integer or a list of integers
np.random.seed(300)
Or
np.random.seed([3, 1415])
Example
np.random.seed([3, 1415])
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
np.random.choice(service_code_options, 3)
array(['899.59O', '204.68L', '13.59P'], dtype='<U7')
Notice that I passed a 3 to the choice function to specify the size of the array.
numpy.random.choice
According to the notes of numpy.random.seed in numpy v1.2.4:
Best practice is to use a dedicated Generator instance rather than the random variate generation methods exposed directly in the random module.
Such a Generator is constructed using np.random.default_rng.
Thus, instead of np.random.seed, the current best practice is to use a np.random.default_rng with a seed to construct a Generator, which can be further used for reproducible results.
Combining jezrael's answer and the current best practice, we have:
import pandas as pd
import numpy as np
rng = np.random.default_rng(seed=121)
df = pd.DataFrame({'a':range(10)})
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
df['SERVICE_CODE'] = rng.choice(service_code_options, size=len(df))
print(df)
a SERVICE_CODE
0 0 12.42R
1 1 13.59P
2 2 12.42R
3 3 12.42R
4 4 899.59O
5 5 204.68L
6 6 204.68L
7 7 13.59P
8 8 12.42R
9 9 13.59P

How do I copy to a range, rather than a list, of columns?

I am looking to append several columns to a dataframe.
Let's say I start with this:
import pandas as pd
dfX = pd.DataFrame({'A': [1,2,3,4],'B': [5,6,7,8],'C': [9,10,11,12]})
dfY = pd.DataFrame({'D': [13,14,15,16],'E': [17,18,19,20],'F': [21,22,23,24]})
I am able to append the dfY columns to dfX by defining the new columns in list form:
dfX[[3,4]] = dfY.iloc[:,1:3].copy()
...but I would rather do so this way:
dfX.iloc[:,3:4] = dfY.iloc[:,1:3].copy()
The former works! The latter executes, returns no errors, but does not alter dfX.
Are you looking for
dfX = pd.concat([dfX, dfY], axis = 1)
It returns
A B C D E F
0 1 5 9 13 17 21
1 2 6 10 14 18 22
2 3 7 11 15 19 23
3 4 8 12 16 20 24
And you can append several dataframes in this like pd.concat([dfX, dfY, dfZ], axis = 1)
If you need to append say only column D and E from dfY to dfX, go for
pd.concat([dfX, dfY[['D', 'E']]], axis = 1)

Resources