Pandas Convert Multiple Columns to Rows

Pandas Convert Multiple Columns to Rows - python-3.x

I have a pandas dataframe as follows:
code title amount_1 amount_2 currency_1 currency_2
0 246 ex 500 550 USD GBP
1 300 am 200 250 USD GBP
2 315 ple 300 325 USD GBP
I'd like to get this into the format
code title amount currency
246 ex 500 USD
246 ex 550 GBP
All of the currencies are the same. How can I get this format? I've tried using melt and reset_index, but neither seemed to do exactly what I need.
Thank you

Use wide_to_long:
df1 = pd.wide_to_long(df,
stubnames=['amount','currency'],
i=['code','title'],
j='measure', sep='_').reset_index()
print (df1)
code title measure amount currency
0 246 ex 1 500 USD
1 246 ex 2 550 GBP
2 300 am 1 200 USD
3 300 am 2 250 GBP
4 315 ple 1 300 USD
5 315 ple 2 325 GBP

Related

How to replace a specific column data to its z-scores in a data set in python?

I have a numeric dataset and I want to calculate the z score for 'KM' column and replace the original values with the z score values. I'm new to python and please help.
KM CC Doors Gears Quarterly_Tax Weight Guarantee_Period
46986 2000 3 5 210 1165 3
72937 2000 3 5 210 1165 3
38500 2000 3 5 210 1170 3
31461 1800 3 6 100 1185 12
32189 1800 3 6 100 1185 3
23000 1800 3 6 100 1185 3
18739 1800 3 6 100 1185 3
34000 1800 3 5 100 1185 3
21716 1600 3 5 85 1105 18
64359 1600 3 5 85 1105 3
67660 1600 3 5 85 1105 3
43905 1600 3 5 100 1170 3

Something like this should do it for you
from scipy import stats
df["KM"] = df["KM"].apply(stats.zscore)

Please suggest approaches and code to solve the defined problem statement

x y z amount absolute_amount
121 abc def 500 500
131 fgh xyz -800 800
121 abc xyz 900 900
131 fgh ijk 800 800
141 obc pqr 500 500
151 mbr pqr -500 500
141 obc pqr -500 500
151 mbr pqr 900 900
I need to find the duplicate rows in the dataset where the x and y are same, with conditions being-
sum(amount) !=0
abs(sum(amount)) != absolute_amount
I tried grouping them and the code i used in R is working but i need it to work in python
logic1 <- tablename %>%
group_by('x','y')%>%
filter(n()>1 && sum(`amount`) != 0 && abs(sum(`amount`)) != absolute_amount)
Expected output
x y z amount absolute_amount
121 abc def 500 500
121 abc xyz 900 900
151 mbr pqr -500 500
151 mbr pqr 900 900

Use transform with groupby.sum() to return sum transformed for each group and then compare the 2 conditions you have:
c=df.groupby(['x','y'])['amount'].transform('sum')
df[c.ne(0) & c.abs().ne(df.absolute_amount)]
x y z amount absolute_amount
0 121 abc def 500 500
2 121 abc xyz 900 900
5 151 mbr pqr -500 500
7 151 mbr pqr 900 900

Display minimum value excluding zero along with adjacent column value from each year + Python 3+, dataframe

I have a dataframe with three columns as Year, Product, Price. I wanted to calculate minimum value excluding zero from Price from each year. Also wanted to populate adjacent value from column Product to the minimum value.
Data:
Year Product Price
2000 Grapes 0
2000 Apple 220
2000 pear 185
2000 Watermelon 172
2001 Orange 0
2001 Muskmelon 90
2001 Pear 165
2001 Watermelon 99
Desirable output in new dataframe:
Year Minimum Price Product
2000 172 Watermelon
2001 90 Muskmelon

First filter out 0 rows by boolean indexing:
df1 = df[df['Price'] != 0]
And then use DataFrameGroupBy.idxmin for indices for minimal Price per groups with selecting by loc:
df2 = df1.loc[df1.groupby('Year')['Price'].idxmin()]
Alternative is use sort_values with drop_duplicates:
df2 = df1.sort_values(['Year', 'Price']).drop_duplicates('Year')
print (df2)
Year Product Price
3 2000 Watermelon 172
5 2001 Muskmelon 90
If possible multiple minimal values and need all of them per groups:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 172
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2001 Pear 165
7 2001 Watermelon 99
df1 = df[df['Price'] != 0]
df = df1[df1['Price'].eq(df1.groupby('Year')['Price'].transform('min'))]
print (df)
Year Product Price
2 2000 pear 172
3 2000 Watermelon 172
5 2001 Muskmelon 90
EDIT:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 185
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2002 Pear 0
7 2002 Watermelon 0
df['Price'] = df['Price'].replace(0, np.nan)
df2 = df.sort_values(['Year', 'Price']).drop_duplicates('Year')
df2['Product'] = df2['Product'].mask(df2['Price'].isnull(), 'No data')
print (df2)
Year Product Price
3 2000 Watermelon 172.0
5 2001 Muskmelon 90.0
6 2002 No data NaN

weighted average by category filtered by time period

I've got a headache which I would love some help with
So I have the following table:
Table 1:
Date Hour Volume Value Price
10/09/2018 1 10 400 40.0
10/09/2018 2 80 200 2.5
10/09/2018 3 14 190 13.6
10/09/2018 4 74 140 1.9
11/09/2018 1 34 547 16.1
11/09/2018 2 26 849 32.7
11/09/2018 3 95 279 2.9
11/09/2018 4 31 216 7.0
Then what I wan to do is view the weighted average by hour.
e.g.
Hour Price
1 21.52272727
2 9.896226415
3 4.302752294
4 3.39047619
And if possible (bonus point). Then be able to change this by time period e.g. each hour within specified dates.
The way it looks in Excel:
A B C D E
1 Date Hour Volume Value Price
2 10/09/2018 1 10 400 40.0
3 10/09/2018 2 80 200 2.5
4 10/09/2018 3 14 190 13.6
5 10/09/2018 4 74 140 1.9
6 11/09/2018 1 34 547 16.1
7 11/09/2018 2 26 849 32.7
8 11/09/2018 3 95 279 2.9
9 11/09/2018 4 31 216 7.0
The output should be:
Hour Price
1 21.52272727
2 9.896226415
3 4.302752294
4 3.39047619
Calculated Like:
Hour Price
1 ((E2*C2)+(E6*C6))/SUM(C2,C6)
2 ((E3*C3)+(E7*C7))/SUM(C3,C7)
3 ((E4*C4)+(E8*C8))/SUM(C4,C8)
4 ((E5*C5)+(E9*C9))/SUM(C5,C9)
I've looked at lots of weighted average questions and answers and they all make sense but I can't quite put them together the way I want.
I hope that makes sense.
Thanks guys,

I have reproduced your desired report:
There is no need to involve price in the calculations. Weighted Average price is simply total value / total volume, for the selected set of dates. Let's say your table is called "Data". Create measure:
Weighted Average Price = DIVIDE( SUM(Data[Value]), SUM(Data[Volume]))
Put it into a pivot table against hours, and you are done.
The formular will work correctly for any set of dates you select. For example, in a version of the above report with both hours and dates on pivot:
you can see that it calculates prices correctly on individual dates, subtotals and the total.

Compare two dataframes and remove rows from a df based on a matching column value

I have two pandas df which look like this:
df1:
pid Name score age
100 Ram 3 36
101 Tony 2 40
101 Jack 4 56
200 Jill 6 30
df2
pid Name score age
100 Ram 3 36
101 Tony 2 40
101 John 4 51
101 Jack 9 32
200 Jill 6 30
Both df's are indexed with 'pid'. I would like to compare df1 & df2 based on the column 'score'. i.e, I need to keep only those rows in df2 that are matching with df1 on index and value of score.
My expected result should be
new df2:
pid Name index age
100 Ram 3 36
101 Tony 2 40
101 John 4 51
200 Jill 6 30
Any help on this regard is highly appreciated.

Use merge by columns pid and score, but first create columns from index by reset_index, last create pid index again and for same columns of new DataFrame add reindex by df2.columns:
df = (pd.merge(df1.reset_index(),
df2.reset_index(), on=['score', 'pid'], how='left', suffixes=['_',''])
.set_index('pid')
.reindex(columns=df2.columns))
print (df)
Name score age
pid
100 Ram 3 36
101 Tony 2 40
101 John 4 51
200 Jill 6 30
Inputs:
print (df1)
Name score age
pid
100 Ram 3 36
101 Tony 2 40
101 Jack 4 56
200 Jill 6 30
print (df2)
Name score age
pid
100 Ram 3 36
101 Tony 2 40
101 John 4 51
101 Jack 9 32
200 Jill 6 30

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pandas Convert Multiple Columns to Rows - python-3.x

Use wide_to_long: df1 = pd.wide_to_long(df, stubnames=['amount','currency'], i=['code','title'], j='measure', sep='_').reset_index() print (df1) code title measure amount currency 0 246 ex 1 500 USD 1 246 ex 2 550 GBP 2 300 am 1 200 USD 3 300 am 2 250 GBP 4 315 ple 1 300 USD 5 315 ple 2 325 GBP

Related

How to replace a specific column data to its z-scores in a data set in python?

Please suggest approaches and code to solve the defined problem statement

Display minimum value excluding zero along with adjacent column value from each year + Python 3+, dataframe

weighted average by category filtered by time period

Compare two dataframes and remove rows from a df based on a matching column value

Categories

Resources