Group by id and change column value based on condition - python-3.x

I'm a bit stuck on some code. I've looked through stack and found many similar questions but all are different in some way.
I have a dataframe df_jan which looks like this.
df_jan
ID Date days_since_last_purchase x_1
1 01/01/2020 0 0
1 04/01/2020 3 0
2 04/01/2020 0 0
1 06/02/2020 33 1
Basically x_1 denotes whether it has been over 30 days since their last purchase.
What I want to achieve is if an ID has x_1 = 1 anywhere in its lifetime all the x_1 values for that specific ID is set to 1 like this.
df_jan
ID Date days_since_last_purchase x_1
1 01/01/2020 0 1
1 04/01/2020 3 1
2 04/01/2020 0 0
1 06/02/2020 33 1
I've tried using a .groupby function along with a .loc but it says they can't work together. I also tried modifying the answers to this without much luck.
Thank you in advance for any help you guys can give!

You can groupby and transform, eg:
df['x_1'] = df_jan.groupby('ID')['days_since_last_purchase'].transform(lambda v: int(v.gt(30).any()))

Related

Counting the number of instances a row changes from a specific number to another

Got a bit of a conundrum I've been wracking my brain on for far to long and was wondering if anyone could help.
I have a list of items in column A and columns labelled as weekly periods from B:DA
Item Code
Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Results
Item 1
1
1
0
1
0
0
1
0
3
Item 2
1
1
0
0
1
1
1
I need to count the number of times the weekly status goes from 1 to 0 but not from 0 to 1.
In the tabled example I would expect the results to be Item 1 = 3 and Item 2 = 1
Any help pointing me in the right direction would be much appreciated!
Use COUNTIFS():
=COUNTIFS(B2:CZ2,1,C2:DA2,0)
The offset ranges will allow the count of when the prior cell is 1 and the following is 0.

Get the difference between two dates when string value changes

I want to get the number of days between the change of string values (ie., the symbol column) in one column, grouped by their respective id. I want a separate column for datediff like the one below.
id date symbol datediff
1 2022-08-26 a 0
1 2022-08-27 a 0
1 2022-08-28 a 0
2 2022-08-26 a 0
2 2022-08-27 a 0
2 2022-08-28 a 0
2 2022-08-29 b 3
3 2022-08-29 c 0
3 2022-08-30 b 1
For id = 1, datediff = 0 since symbol stayed as a. For id = 2, datediff = 3 since symbol changed after 3 days from a to b. Hence, what I'm looking for is a code that computes the difference in which the id changes it's symbol.
I am currently using this code:
df['date'] = pd.to_datetime(df['date'])
diff = ['0 days 00:00:00']
for st,i in zip(df['symbol'],df.index):
if i > 0:#cannot evaluate previous from index 0
if df['symbol'][i] != df['symbol'][i-1]:
diff.append(df['date'][i] - df['data_ts'][i-1])
else:
diff.append('0 days 00:00:00')
The output becomes:
id date symbol datediff
1 2022-08-26 a 0
1 2022-08-27 a 0
1 2022-08-28 a 0
2 2022-08-26 a 0
2 2022-08-27 a 0
2 2022-08-28 a 0
2 2022-08-29 b 1
3 2022-08-29 c 0
3 2022-08-30 b 1
It also computes the difference between two different ids. But I want the computation to be separate from different ids.
I only see questions about difference of dates when values changes, but not when string changes. Thank you!
IIUC: my solution works with the assumption that the symbols within one id ends with a single changing symbol, if there is any (as in the example given in the question).
First use df.groupby on id and symbol and get the minimum date for each combination. Then, find the difference between the dates within each id. This gives the datediff. Finally, merge the findings with the original dataframe.
df1 = df.groupby(['id', 'symbol'], sort=False).agg({'date': np.min}).reset_index()
df1['datediff'] = abs(df1.groupby('id')['date'].diff().dt.days.fillna(0))
df1 = df1.drop(columns='date')
df_merge = pd.merge(df, df1, on=['id', 'symbol'])

Count rows where two values appear together

My data are in MS Excel:
Col A Col B Col C Col D
1 amy john bob andy
2 andy mel amy john
3 max andy jim bob
4 wil steve andy amy
So, in 4x4 table there are 9 different values.
I need to create table to find how many times each PAIR is occurring in the same ROW. Something like this:
amy andy bob jim john max mel steve will
amy 0
andy 3 0
bob 1 2 0
jim 0 1 1 0
john 2 2 1 0 0
max 0 1 1 1 0 0
mel 1 1 0 0 1 0 0
steve 1 1 0 0 0 0 0 0
will 1 1 0 0 0 0 0 1 0
And I have no clue how to do it...
To reiterate: no duplicated values in each row, each row has unique values, each value in separate cell, so there are column with values and within column values can duplicate.
Any help will be much appreciated!
Assuming your data is in A5:D8 I proceeded like this -
created a helper column with the formula (copied downwards)
=A5&"-"&B5&"-"&C5&"-"&D5
Named this helper column as helper (named range)
listed down and across the unique combinations of names in H4:P4 (across) and G5:G13 (down)
enter this formula in H5 and copy it both downwards and across to fill all 9x9 matrix
=IF($G5=H$4,0,COUNTIFS(helper,"*"&$G5&"*",helper,"*"&H$4&"*"))
Your desired matrix is ready
A detailed blog is available on web for this.

How can I make a square matrix from a nested dict?

I am recently using networkx module, and now I am about to get distance data among countries.
So the excel raw data is something like this:
Nat1 Nat2 Y/N
ABW ANT 0
ABW ARG 0
ABW BEK 1
ABW BHS 1
ABW BRA 0
...
ALB COL 0
ALB CYP 1
...
And thanks to GeckStar(Networkx: Get the distance between nodes), I managed to know how the dataset is coded, as a nested dictionary.
The problem is, I am not familiar with the dictionary. If it was a nested list, I can deal with it, but the nested dict... I need help from others.
So I checked what would this give to me if I code like this:
distance = dict(nx.all_pairs_shortest_path_length(graph))
df = pd.DataFrame(list(distance.items()))
df.to_excel("C_C.xlsx")
(FYI,
distance = dict(nx.all_pairs_shortest_path_length(graph))
will calculate a shortest path from a nation to other nation. So if a nation is not connected to the other nation, and needs a detour, it will has a value more than 1.)
Of course, it didn't go well.
0 1
0 ABW {'ABW':0, 'ANT': 1 ..., 'BHS': 2 ...}
1 ANT {'ANT':0, 'ABW': 1 ...}
...
3 BEL {'BEL':0, 'ABW':1, ... 'BHS':4, ...}
...
But I know there should be a way to make those data to a square matrix like this:
ABW ANT ARG BEL BHS ...
ABW 0 0 0 1 2 ...
ANT 0 0 1 0 1 ...
ARG 0 1 0 1 0 ...
BEL 2 0 1 0 4 ...
...
Can you guys enlighten me, please?
Thanks for your time to check this out, and Thank you for your solution in advance.
I just did a walkaround with a list.
dis = dict(nx.all_pairs_shortest_path_length(graph))
Nations = list(dis.keys())
master = [[""]]
for x in Nations:
master[0].append(x)
for Nat1 in dis:
master.append([Nat1])
for Nat2 in Nations:
master[-1].append(dis[Nat1][Nat2])
Thanks for everyone taking care of this problem.
Have a wonderful day!

Total row visualisation IBM cognos

i have a list where employees are listed and next to it the clients afther that a 1 or 0 if there has been contact with the cliƫnts for longer than 3 months
now i want to visualise the list but i cant get the total per employee. i have grouped by employee but that doesn't seem te work.
example:
**Employee Cliƫntname longer than 3 months**
1 1 0
1 33 1
1 12 0
**total 1**
2 2 1
2 3 1
**total 2**
can anyone help me with this
Have you tried:
total([longer than 3 months] for [Employee])

Resources