Creating URNs based on a row ID - python-3.x

I have a pandas dataset that has rows with the same Site ID. I want to create a new ID for each row. Currently I have a df like this:
SiteID SomeData1 SomeData2
100001 20 30
100001 20 30
100002 30 40
I am looking to achieve the below output
Output:
SiteID SomeData1 SomeData2 Site_ID2
100001 20 30 1000011
100001 20 30 1000012
100002 30 40 1000021
What would be the best way to achieve this?

Add helper Series by GroupBy.cumcount converted to strings to column SiteID :
s = df.groupby(['SomeData1','SomeData2']).cumcount().add(1)
df['Site_ID2'] = df['SiteID'].astype(str).add(s.astype(str))
print (df)
SiteID SomeData1 SomeData2 Site_ID2
0 100001 20 30 1000011
1 100001 20 30 1000012
2 100002 30 40 1000021

Related

How to sum certain columns ending a certain word of a dataframe in python pandas?

I am trying to get 'summing' of columns ending 'Load' and 'Gen' to two new columns.
My dataframe is:
Date A_Gen A_Load B_Gen B_Load
1-1-2010 30 20 40 30
1-2-2010 45 25 35 25
The result wanted is:
Date A_Gen A_Load B_Gen B_Load S_Gen S_Load
1-1-2010 30 20 40 30 70 50
1-2-2010 45 25 35 25 80 50
Try using filter(like='..') to get the relevant columns, sum along axis=1, and return your 2 new columns:
df['S_Gen'] , df['B_Load'] = df.filter(like='Load').sum(1) , df.filter(like='Gen').sum(1)
Output:
df
Out[146]:
Date A_Gen A_Load B_Gen B_Load S_Gen
0 2010-01-01 30 20 40 70 50
1 2010-02-01 45 25 35 80 50

Pandas : Saving indivisual file which is having same column name in two dataframe

Hello I wanted cancat Two dataframe which share same column name and save as indivisual file having same column name and save file as column name
my dataframe looking like
A1=
name exam1 exam2 exam3 exam4
arun 0 12 25 0
joy 20 1 0 26
jeev 30 0 0 25
B2=
name exam1 exam2 exam3 exam4
arun 20 26 0 0
joy 30 0 25 3
jeev 17 2 15 25
what I wanted as a output
save diffrent file with column name such as exam1.txt, exam2.txt, exam3.txt etc i have very big dataframe
output indivisual file look like
example: exam1.txt
name exam1_A1 exam1_B1
arun 0 20
joy 20 30
jeev 30 17
I try to use cancat two dataframe pd.concat([A1,B1], axis=0) but not able get what I wanted. can any one suggest me ?
You can do a loop with merge:
for col in A1.columns[1:]:
(A1[['namme',col]]
.merge(B1[['name',col]], on='name', suffixes=('_A1','_B1'))
.to_csv(f'{col}.txt')
)

Apply multiple operations on same columns after groupby

I have the following df,
id year_month amount
10 201901 10
10 201901 20
10 201901 30
20 201902 40
20 201902 20
I want to groupby id and year-month and then get the group size and sum of amount,
df.groupby(['id', 'year_month'], as_index=False)['amount'].sum()
df.groupby(['id', 'year_month'], as_index=False).size().reset_index(name='count')
I am wondering how to do it at the same time in one line;
id year_month amount count
10 201901 60 3
20 201902 60 2
Use agg:
df.groupby(['id', 'year_month']).agg({'amount': ['count', 'sum']})
amount
count sum
id year_month
10 201901 3 60
20 201902 2 60
If you want to remove the multi-index, use MultiIndex.droplevel:
s = df.groupby(['id', 'year_month']).agg({'amount': ['count', 'sum']}).rename(columns ={'sum': 'amount'})
s.columns = s.columns.droplevel(level=0)
s.reset_index()
id year_month count amount
0 10 201901 3 60
1 20 201902 2 60

row substraction in lambda pandas dataframe

I have a dataframe with multiple columns. One of the column is the cumulative revenue column. If the year is not ended then the revenue will be constant for the rest of the period because the coming daily revenue is 0.
The dataframe looks like this
Now I want to create a new column where the row is substracted by the last row and if the result is 0 then print 0 for that row in the new column. If not zero then use the row value. The new dataframe should look like this:
My idea was to do this with the apply lambda method. So this is the thinking:
{df['2017new'] = df['2017'].apply(lambda x: 0 if row - lastrow == 0 else x)}
But i do not know how to write the row - lastrow part of the code. How to do this? Thanks in advance!
By using np.where
df2['New']=np.where(df2['2017'].diff().eq(0),0,df2['2017'])
df2
Out[190]:
2016 2017 New
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0
We can shift the data and fill the values based on condition using np.where i.e
df['new'] = np.where(df['2017']-df['2017'].shift(1)==0,0,df['2017'])
or with df.where i.e
df['new'] = df['2017'].where(df['2017']-df['2017'].shift(1)!=0,0)
2016 2017 new
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0

Excel: Average values where column values match

What I am attempting to accomplish is this - where Report ID matches, i need to calculate the average of Value, and then fill the rows with matching Report ID's with the average for that array of Value.
The data essentially looks like this:
Report ID | Report Instance | Value
11111 1 20
11112 1 50
11113 1 40
11113 2 30
11113 3 20
11114 1 40
11115 1 20
11116 1 30
11116 2 40
11117 1 20
The end goal should look like this:
Report ID | Report Instance | Value | Average
11111 1 20 20
11112 1 50 50
11113 1 40 30
11113 2 20 30
11113 3 30 30
11114 1 40 40
11115 1 20 20
11116 1 30 35
11116 2 40 35
11117 1 20 20
I have tried using average(if()), index(match()), vlookup(match()) and similar combinations of functions, but I haven't had much luck in getting my final output. I'm relatively new to using arrays in excel, and I dont have a strong grasp on how they're evaluated just yet, so any help is much appreciated.
Keep it simple :-)
Why not using =sumif(...)/countif(...) ?

Resources