I have simple Pandas DataFrame with 3 columns. I am trying to Transpose it into and then rename that new dataframe and I am having bit trouble.
df = pd.DataFrame({'TotalInvoicedPrice': [123],
'TotalProductCost': [18],
'ShippingCost': [5]})
I tried using
df =df.T
which transpose the DataFrame into:
TotalInvoicedPrice,123
TotalProductCost,18
ShippingCost,5
So now i have to add column names to this data frame "Metrics" and "Values"
I tried using
df.columns["Metrics","Values"]
but im getting errors.
What I need to get is DataFrame that looks like:
Metrics Values
0 TotalInvoicedPrice 123
1 TotalProductCost 18
2 ShippingCost 5
Let's reset the index then set the column labels
df.T.reset_index().set_axis(['Metrics', 'Values'], axis=1)
Metrics Values
0 TotalInvoicedPrice 123
1 TotalProductCost 18
2 ShippingCost 5
Maybe you can avoid transpose operation (little performance overhead)
#YOUR DATAFRAME
df = pd.DataFrame({'TotalInvoicedPrice': [123],
'TotalProductCost': [18],
'ShippingCost': [5]})
#FORM THE LISTS FROM YOUR COLUMNS AND FIRST ROW VALUES
l1 = df.columns.values.tolist()
l2 = df.iloc[0].tolist()
#CREATE A DATA FRAME.
df2 = pd.DataFrame(list(zip(l1, l2)),columns = ['Metrics', 'Values'])
print(df2)
Related
I have a dataframe df as below.
I want the final dataframe to be like this as follows. i.e, for each unique Name only last 2 rows must be present in the final output.
i tried the following snippet but its not working.
df = df[df['Name']].tail(2)
Use GroupBy.tail:
df1 = df.groupby('Name').tail(2)
Just one more way to solve this using GroupBy.nth:
df1 = df.groupby('Name').nth([-1,-2]) ## this will pick the last 2 rows
I am new to python and doing a time series analysis of stocks.I created a data frame of rolling average of 5 stocks according to their percentage change in close price.Therefore this df has 5 columns and i have another df index rolling average of percentage change of closing price.I want to plot individual stock column of the df with the index df. I wrote this code
fig.add_subplot(5,1,1)
plt.plot(pctchange_RA['HUL'])
plt.plot(N50_RA)
fig.add_subplot(5,1,2)
plt.plot(pctchange_RA['IRCON'])
plt.plot(N50_RA)
fig.add_subplot(5,1,3)
plt.plot(pctchange_RA['JUBLFOOD'])
plt.plot(N50_RA)
fig.add_subplot(5,1,4)
plt.plot(pctchange_RA['PVR'])
plt.plot(N50_RA)
fig.add_subplot(5,1,5)
plt.plot(pctchange_RA['VOLTAS'])
plt.plot(N50_RA)
NOTE:pctchange_RA is a pandas df of 5 stocks and N50_RA is a index df of one column
You can put your column names in a list and then just loop over it and create subplots dynamically. A pseudocode would look like the following
cols = ['HUL', 'IRCON', 'JUBLFOOD', 'PVR', 'VOLTAS']
for i, col in enumerate(cols):
ax = fig.add_subplot(5, 1, i+1)
ax.plot(pctchange_RA[col])
ax.plot(N50_RA)
I have a dataframe with 1M+ rows. A sample of the dataframe is shown below:
df
ID Type File
0 123 Phone 1
1 122 Computer 2
2 126 Computer 1
I want to split this dataframe based on Type and File. If the total count of Type is 2 (Phone and Computer), total number of files is 2 (1,2), then the total number of splits will be 4.
In short, total splits is as given below:
total_splits=len(set(df['Type']))*len(set(df['File']))
In this example, total_splits=4. Now, I want to split the dataframe df in 4 based on Type and File.
So the new dataframes should be:
df1 (having data of type=Phone and File=1)
df2 (having data of type=Computer and File=1)
df3 (having data of type=Phone and File=2)
df4 (having data of type=Computer and File=2)
The splitting should be done inside a loop.
I know we can split a dataframe based on one condition (shown below), but how do you split it based on two ?
My Code:
data = {'ID' : ['123', '122', '126'],'Type' :['Phone','Computer','Computer'],'File' : [1,2,1]}
df=pd.DataFrame(data)
types=list(set(df['Type']))
total_splits=len(set(df['Type']))*len(set(df['File']))
cnt=1
for i in range(0,total_splits):
for j in types:
locals()["df"+str(cnt)] = df[df['Type'] == j]
cnt += 1
The result of the above code gives 2 dataframes, df1 and df2. df1 will have data of Type='Phone' and df2 will have data of Type='Computer'.
But this is just half of what I want to do. Is there a way we can make 4 dataframes here based on 2 conditions ?
Note: I know I can first split on 'Type' and then split the resulting dataframe based on 'File' to get the output. However, I want to know of a more efficient way of performing the split instead of having to create multiple dataframes to get the job done.
EDIT
This is not a duplicate question as I want to split the dataframe based on multiple column values, not just one!
You can make do with groupby:
dfs = {}
for k, d in df.groupby(['Type','File']):
type, file = k
# do want ever you want here
# d is the dataframe corresponding with type, file
dfs[k] = d
You can also create a mask:
df['mask'] = df['File'].eq(1) * 2 + df['Type'].eq('Phone')
Then, for example:
df[df['mask'].eq(0)]
gives you the first dataframe you want, i.e. Type==Phone and File==1, and so on.
I have several dataframes that I have concatenated with pandas in the line:
xspc = pd.concat([df1,df2,df3], axis = 1, join_axes = [df3.index])
In df2 the index values read one day later than the values of df1, and df3. So for instance when the most current date is 7/1/19 the index values for df1 and df3 will read "7/1/19" while df2 reads '7/2/19'. I would like to be able to concatenate each series so that each dataframe is joined on the most recent date, so in other words I would like all the dataframe values from df1 index value '7/1/19' to be concatenated with dataframe 2 index value '7/2/19' and dataframe 3 index value '7/1/19'. When methods can I use to shift the data around to join on these not matching index values?
You can reset the index of the data frame and then concat the dataframes
df1=df1.reset_index()
df2=df2.reset_index()
df3=df3.reset_index()
df_final = pd.concat([df1,df2,df3],axis=1, join_axes=[df3.index])
This should work since you mentioned that the date in df2 will be one day after df1 or df3
I want to have a dataframe with repeated values with the same id number. But i want to split the repeated rows into colunms.
data = [[10450015,4.4],[16690019 4.1],[16690019,4.0],[16510069 3.7]]
df = pd.DataFrame(data, columns = ['id', 'k'])
print(df)
The resulting dataframe would have n_k (n= repated values of id rows). The repeated id gets a individual colunm and when it does not have repeated id, it gets a 0 in the new colunm.
data_merged = {'id':[10450015,16690019,16510069], '1_k':[4.4,4.1,3.7], '2_k'[0,4.0,0]}
print(data_merged)
Try assiging the column idx ref, using DataFrame.assign and groupby.cumcount then DataFrame.pivot_table. Finally use a list comprehension to sort column names:
df_new = (df.assign(col=df.groupby('id').cumcount().add(1))
.pivot_table(index='id', columns='col', values='k', fill_value=0))
df_new.columns = [f"{x}_k" for x in df_new.columns]
print(df_new)
1_k 2_k
id
10450015 4.4 0
16510069 3.7 0
16690019 4.1 4