friends.
I'm trying to figure out a formula that verifies if there is a matching row from table 2 on table 1. If not, the formula must show that the row were not listed, like stated on column E (CHECK). Is that possible? Or maybe a VBA macro, idk.
TABLE 1
A
B
C
D
29
1
1
1
29
2
1
2
30
3
1
2
15
1
1
1
15
2
1
2
15
3
1
2
20
1
1
1
20
2
1
2
20
3
2
1
20
4
2
2
20
5
1
3
TABLE 2
A
B
C
D
CHECK
29
1
1
1
EXISTS
15
1
1
2
NOT
15
2
1
2
EXISTS
15
3
1
2
EXISTS
20
6
1
1
NOT
100
1
2
3
NOT LISTED
Thanks, guys, would appreciate some help.
i want to add +2 days to column based on other column i use this table :
Company Type Joinning Date Starting day
1 1 19/01/2019
2 0 19/01/2019
3 0 19/01/2019
4 1 20/01/2019
5 0 20/01/2019
6 1 21/01/2019
i want to add +2 DAYS in column Starting day which is Joining day + 2 days if the company have type 1 how can i do it ?
What i've tried ?
pic
Desired Results
Company Type Joinning Date Starting day
1 1 19/01/2019 21/01/2019
2 0 19/01/2019
3 0 19/01/2019
4 1 20/01/2019 22/01/2019
5 0 20/01/2019
6 1 21/01/2019 23/01/2019
Just to show my comment of:
=IF(B2=1,C2+2,"")
Works. The output cell must be formatted in the desired method:
I have a multi indexed dataframe(groupby object) as the result of groupby (by 'id' and 'date').
x y
id date
abc 3/1/1994 100 7
9/1/1994 90 8
3/1/1995 80 9
bka 5/1/1993 50 8
7/1/1993 40 9
I'd like to convert those dates into an integer-like, such as
x y
id date
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9
I thought it would be simple but I couldn't get there easily. Is there a simple way to work on this?
Try this:
s = 'day ' + df.groupby(level=0).cumcount().astype(str)
df1 = df.set_index([s], append=True).droplevel(1)
x y
id
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9
You can calculate the new level and create a new index:
lvl1 = 'day ' + df.groupby('id').cumcount().astype('str')
df.index = pd.MultiIndex.from_tuples((x,y) for x,y in zip(df.index.get_level_values('id'), lvl1) )
output:
x y
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9
I have an dataframe which contain the observed data as:
import pandas as pd
d = {'humanID': [1, 1, 2,2,2,2 ,2,2,2,2], 'dogID':
[1,2,1,5,4,6,7,20,9,7],'month': [1,1,2,3,1,2,3,1,2,2]}
df = pd.DataFrame(data=d)
The df is follow
humanID dogID month
0 1 1 1
1 1 2 1
2 2 1 2
3 2 5 3
4 2 4 1
5 2 6 2
6 2 7 3
7 2 20 1
8 2 9 2
9 2 7 2
We total have two human and twenty dog, and above df contains the observed data. For example:
The first row means: human1 adopt dog1 at January
The second row means: human1 adopt dog2 at January
The third row means: human2 adopt dog1 at Febuary
========================================================================
My goal is randomly generating two unobserved data for each (human, month) that are not appear in the original observed data.
like for human1 at January, he does't adopt the dog [3,4,5,6,7,..20] And I want to randomly create two unobserved sample (human, month) in triple form
humanID dogID month
1 20 1
1 10 1
However, the follow sample is not allowed since it appear in original df
humanID dogID month
1 2 1
For human1, he doesn't have any activity at Feb, so we don't need to sample the unobserved data.
For human2, he have activity for Jan, Feb and March. Therefore, for each month, we want to randomly create the unobserved data. For example, In Jan, human2 adopt dog1, dog4 and god 20. The two random unobserved samples can be
humanID dogID month
2 2 1
2 6 1
same process can be used for Feb and March.
I want to put all of the unobserved in one dataframe such as follow unobserved
humanID dogID month
0 1 20 1
1 1 10 1
2 2 2 1
3 2 6 1
4 2 13 2
5 2 16 2
6 2 1 3
7 2 20 3
Any fast way to do this?
PS: this is a code interview for a start-up company.
Using groupby and random.choices:
import random
dogs = list(range(1,21))
dfs = []
n_sample = 2
for i,d in df.groupby(['humanID', 'month']):
h_id, month = i
sample = pd.DataFrame([(h_id, dogID, month) for dogID in random.choices(list(set(dogs)-set(d['dogID'])), k=n_sample)])
dfs.append(sample)
new_df = pd.concat(dfs).reset_index(drop=True)
new_df.columns = ['humanID', 'dogID', 'month']
print(new_df)
humanID dogID month
0 1 11 1
1 1 5 1
2 2 19 1
3 2 18 1
4 2 15 2
5 2 14 2
6 2 16 3
7 2 18 3
If I understand you correctly, you can use np.random.permutation() for the dogID column to generate random permutations of the column,
df_new=df.copy()
df_new['dogID']=np.random.permutation(df.dogID)
print(df_new.sort_values('month'))
humanID dogID month
0 1 1 1
1 1 20 1
4 2 9 1
7 2 1 1
2 2 4 2
5 2 5 2
8 2 2 2
9 2 7 2
3 2 7 3
6 2 6 3
Or to create random sampling of missing values within the range of dogID:
df_new=df.copy()
a=np.random.permutation(range(df_new.dogID.min(),df_new.dogID.max()))
df_new['dogID']=np.random.choice(a,df_new.shape[0])
print(df_new.sort_values('month'))
humanID dogID month
0 1 18 1
1 1 16 1
4 2 1 1
7 2 8 1
2 2 4 2
5 2 2 2
8 2 16 2
9 2 14 2
3 2 4 3
6 2 12 3
I am trying to delete rows when the date in column B is not present exactly 4 times for a given filekey in column C. Sample data below:
A B C
Row Date Filekey
2 1/6/2014 1
3 1/6/2014 1
4 1/6/2014 1
5 1/6/2014 1
6 1/7/2014 1
7 1/7/2014 1
8 1/8/2014 1
9 1/9/2014 1
10 1/9/2014 1
11 1/9/2014 1
12 1/9/2014 1
13 1/9/2014 1
14 1/6/2014 2
15 1/6/2014 2
16 1/6/2014 2
17 1/6/2014 2
The result I am looking for:
Row Date Filekey
2 1/6/2014 1
3 1/6/2014 1
4 1/6/2014 1
5 1/6/2014 1
14 1/6/2014 2
15 1/6/2014 2
16 1/6/2014 2
17 1/6/2014 2
Please note that Row 6-7 were removed for only having 2 dates the same (too few), Row 8 for 1 date (too few), Rows 9-13 for 5 dates (too many)
Rows 14-17 were kept because:
there are exactly 4 rows with that date and it has a different filekey (column C) than rows 2-5 even though it shares those four dates.
Thanks for your help.
In cell D2 use this formula and copy down:
=COUNTIFS(B:B,B2,C:C,C2)
Then filter on column D for everything other than 4 and delete those rows, then remove the filter and you can delete the formulas in column D