Python Pandas: Change a value in pd1 based of pd2 with a different name - python-3.x

i have 2 data frames where the index is the same value but different name. I need to add a column to DF1 but get that information from DF2
DF1:
SKU X Y Z
1234 0 0 0
5642 0 0 0
DF2:
AH SKU X Y Z Total
1234 4 5 1 10
5642 1 0 1 2
So I know I can add the total column to DF1 by doing
df1 ["Total"]
How can I now have that total from DF2 to DF1 making sure the SKU and AH SKU are matching?

You would do it like this, using default index matching.
df1["Total"] = df2["Total"]
That the indexes have different names shouldn't matter in this case (there's only one index level).
Make sure that those "columns" you pointed to in the answer really are the index: print(df1.index) should have something named SKU, and so on.

Related

Get the difference between two dates when string value changes

I want to get the number of days between the change of string values (ie., the symbol column) in one column, grouped by their respective id. I want a separate column for datediff like the one below.
id date symbol datediff
1 2022-08-26 a 0
1 2022-08-27 a 0
1 2022-08-28 a 0
2 2022-08-26 a 0
2 2022-08-27 a 0
2 2022-08-28 a 0
2 2022-08-29 b 3
3 2022-08-29 c 0
3 2022-08-30 b 1
For id = 1, datediff = 0 since symbol stayed as a. For id = 2, datediff = 3 since symbol changed after 3 days from a to b. Hence, what I'm looking for is a code that computes the difference in which the id changes it's symbol.
I am currently using this code:
df['date'] = pd.to_datetime(df['date'])
diff = ['0 days 00:00:00']
for st,i in zip(df['symbol'],df.index):
if i > 0:#cannot evaluate previous from index 0
if df['symbol'][i] != df['symbol'][i-1]:
diff.append(df['date'][i] - df['data_ts'][i-1])
else:
diff.append('0 days 00:00:00')
The output becomes:
id date symbol datediff
1 2022-08-26 a 0
1 2022-08-27 a 0
1 2022-08-28 a 0
2 2022-08-26 a 0
2 2022-08-27 a 0
2 2022-08-28 a 0
2 2022-08-29 b 1
3 2022-08-29 c 0
3 2022-08-30 b 1
It also computes the difference between two different ids. But I want the computation to be separate from different ids.
I only see questions about difference of dates when values changes, but not when string changes. Thank you!
IIUC: my solution works with the assumption that the symbols within one id ends with a single changing symbol, if there is any (as in the example given in the question).
First use df.groupby on id and symbol and get the minimum date for each combination. Then, find the difference between the dates within each id. This gives the datediff. Finally, merge the findings with the original dataframe.
df1 = df.groupby(['id', 'symbol'], sort=False).agg({'date': np.min}).reset_index()
df1['datediff'] = abs(df1.groupby('id')['date'].diff().dt.days.fillna(0))
df1 = df1.drop(columns='date')
df_merge = pd.merge(df, df1, on=['id', 'symbol'])

Doubts pandas filtering data row by row

How can I solve this issue related on pandas? I've a dataframe of the following approach:
datetime64ns
type(int)
datetime64ns(analysis)
2019-02-02T10:02:05
4
2019-02-02T10:02:01
3
2019-02-02T10:02:02
4
2019-02-02T10:02:02
2019-02-02T10:02:04
3
2019-02-02T10:02:04
The goal is to do the following issue:
# psuedocode
for all the rows:
if datetime(analysis) exists and type=4:
insert in the a new row column type4=1
elseif datetime(analysis) exists and type=2:
insert in the a new row column type2=1
the idea to develop it is in order to make a group by count value. I'm sure that is possible because I manage to develop it in the past but I lost my .py file. Thanks for the attention
Need this?
df = pd.concat([df, pd.get_dummies(df['type(int)'].mask(
df['datetime64ns(analysis)'].isna()).astype('Int64')).add_prefix('type')], 1)
OUTPUT:
datetime64ns type(int) datetime64ns(analysis) type3 type4
0 2019-02-02T10:02:05 4 NaN 0 0
1 2019-02-02T10:02:01 3 NaN 0 0
2 2019-02-02T10:02:02 4 2019-02-02T10:02:02 0 1
3 2019-02-02T10:02:04 3 2019-02-02T10:02:04 1 0

How to find the length of non-exclusive data in Pandas DataFrame

Looking to find the total length of non-exclusive data in DataFrame
df1:
ID
0 7878aa
1 6565dd
2 9899ad
3 4158hf
4 4568fb
5 6877gh
df2:
ID
0 4568fb <-is in df1
1 9899ad <-is in df1
2 6877gh <-is in df1
3 9874ad <-not in df1
4 8745ag <-not in df1
desired output:
2
My code:
len(df1['ID'].isin(df2['ID'] == False)
My code end up showing the total length of the DataFrame which is 6. How do I find the total length of non-exclusive rows?
Thanks!
Use isin with negation and then sum
(~df2['ID'].isin(df1['ID'])).sum()

Selecting data from multiple dataframes

my workbook Rule.xlsx has following data.
sheet1:
group ordercode quantity
0 A 1
B 3
1 C 1
E 2
D 1
Sheet 2:
group ordercode quantity
0 x 1
y 3
1 x 1
y 2
z 1
I have created dataframe using below method.
df1 =data.parse('sheet1')
df2=data.parse('sheet2')
my desired result is writing a sequence using these two dataframe.
df3:
group ordercode quantity
0 A 1
B 3
0 x 1
y 3
1 C 1
E 2
D 1
1 x 1
y 2
z 1
one from df1 and one from df2.
I wish to know how I can print the data by selecting group number (eg. group(0), group(1) etc).
any suggestion ?
After some comments solution is:
#create OrderDict of DataFrames
dfs = pd.read_excel('Rule.xlsx', sheet_name=None)
#ordering of DataFrames
order = 'SWC_1380_81,SWC_1382,SWC_1390,SWC_1391,SWM_1380_81'.split(',')
#in loops lookup dictionaries, replace NaNs and create helper column
L = [dfs[x].ffill().assign(g=i) for i, x in enumerate(order)]
#last join together, sorting and last remove helper column
df = pd.concat(L).sort_values(['group','g'])

Pandas Pivot Table Conditional Counting

I have a simple dataframe:
df = pd.DataFrame({'id': ['a','a','a','b','b'],'value':[0,15,20,30,0]})
df
id value
0 a 0
1 a 15
2 a 20
3 b 30
4 b 0
And I want a pivot table with the number of values greater than zero.
I tried this:
raw = pd.pivot_table(df, index='id',values='value',aggfunc=lambda x:len(x>0))
But returned this:
value
id
a 3
b 2
What I need:
value
id
a 2
b 1
I read lots of solutions with groupby and filter. Is it possible to achieve this only with pivot_table command? If it is not, which is the best approach?
Thanks in advance
UPDATE
Just to make it clearer why I am avoinding filter solution. In my real and complex df, I have other columns, like this:
df = pd.DataFrame({'id': ['a','a','a','b','b'],'value':[0,15,20,30,0],'other':[2,3,4,5,6]})
df
id other value
0 a 2 0
1 a 3 15
2 a 4 20
3 b 5 30
4 b 6 0
I need to sum the column 'other', but when i filter I got this:
df=df[df['value']>0]
raw = pd.pivot_table(df, index='id',values=['value','other'],aggfunc={'value':len,'other':sum})
other value
id
a 7 2
b 5 1
Instead of:
other value
id
a 9 2
b 11 1
Need sum for count Trues created by condition x>0:
raw = pd.pivot_table(df, index='id',values='value',aggfunc=lambda x:(x>0).sum())
print (raw)
value
id
a 2
b 1
As #Wen mentioned, another solution is:
df = df[df['value'] > 0]
raw = pd.pivot_table(df, index='id',values='value',aggfunc=len)
You can filter the dataframe before pivoting:
pd.pivot_table(df.loc[df['value']>0], index='id',values='value',aggfunc='count')

Resources