excel 2016 - delete rows based on multiple - excel

I am trying to delete rows when the date in column B is not present exactly 4 times for a given filekey in column C. Sample data below:
A B C
Row Date Filekey
2 1/6/2014 1
3 1/6/2014 1
4 1/6/2014 1
5 1/6/2014 1
6 1/7/2014 1
7 1/7/2014 1
8 1/8/2014 1
9 1/9/2014 1
10 1/9/2014 1
11 1/9/2014 1
12 1/9/2014 1
13 1/9/2014 1
14 1/6/2014 2
15 1/6/2014 2
16 1/6/2014 2
17 1/6/2014 2
The result I am looking for:
Row Date Filekey
2 1/6/2014 1
3 1/6/2014 1
4 1/6/2014 1
5 1/6/2014 1
14 1/6/2014 2
15 1/6/2014 2
16 1/6/2014 2
17 1/6/2014 2
Please note that Row 6-7 were removed for only having 2 dates the same (too few), Row 8 for 1 date (too few), Rows 9-13 for 5 dates (too many)
Rows 14-17 were kept because:
there are exactly 4 rows with that date and it has a different filekey (column C) than rows 2-5 even though it shares those four dates.
Thanks for your help.

In cell D2 use this formula and copy down:
=COUNTIFS(B:B,B2,C:C,C2)
Then filter on column D for everything other than 4 and delete those rows, then remove the filter and you can delete the formulas in column D

Related

Row comparison on different tables

friends.
I'm trying to figure out a formula that verifies if there is a matching row from table 2 on table 1. If not, the formula must show that the row were not listed, like stated on column E (CHECK). Is that possible? Or maybe a VBA macro, idk.
TABLE 1
A
B
C
D
29
1
1
1
29
2
1
2
30
3
1
2
15
1
1
1
15
2
1
2
15
3
1
2
20
1
1
1
20
2
1
2
20
3
2
1
20
4
2
2
20
5
1
3
TABLE 2
A
B
C
D
CHECK
29
1
1
1
EXISTS
15
1
1
2
NOT
15
2
1
2
EXISTS
15
3
1
2
EXISTS
20
6
1
1
NOT
100
1
2
3
NOT LISTED
Thanks, guys, would appreciate some help.

Grouped Sum on complicated calculated fields in other column

I have an excel sheet with data (Sheet1). First number is a secuencial number representing a number of month.
Sheet1 <month, year, data1, data2>
[first row: titles]
1 1 data11 data12
2 1 data21 data22
3 1 data31 data32
4 1 data41 data42
5 1 data51 data52
6 1 data61 data62
7 1 data71 data72
8 1 data81 data82
9 1 data91 data92
10 1 data101 data102
11 1 data111 data112
12 1 data121 data122
13 2 data131 data132
14 2 data141 data142
Sheet2
[month, year, formule]
1 1 sheet1!C2-3*sheet1!B1
2 1 sheet1!C3-3*sheet1!B2
3 1 sheet1!C4-3*sheet1!B3
4 1 sheet1!C5-3*sheet1!B4
5 1 sheet1!C6-3*sheet1!B5
6 1 sheet1!C7-3*sheet1!B6
7 1 sheet1!C8-3*sheet1!B7
8 1 sheet1!C9-3*sheet1!B8
9 1 sheet1!C10-3*sheet1!B9
10 1 sheet1!C11-3*sheet1!B10
11 1 sheet1!C12-3*sheet1!B11
12 1 sheet1!C13-3*sheet1!B12
13 2 sheet1!C14-3*sheet1!B13
14 2 sheet1!C15-3*sheet1!B114
Sheet3
[year, Sum of column C in sheet2 grouped by year]
Firts row <year,formule>
1 =SUMIF(sheet2!B$2:B$15, A2, sheet!C$2:C$15)
2 =SUMIF(sheet2!B$2:B$15, A3, sheet!C$2:C$15)
My question, Can I remove and do the calculation in Sheet3
I can if the column C of sheet2 is moved to sheet1 but I don't want to put many columns in sheet1 because Sheet2 has many columns. If we can remove Sheet2, we removing a lot of formula (in this example 14 + 2 formules -> only 2 formules)
Thanks
Solved: The year is in column 2 then
=SUMPRODUCT((Sheet1!B$2:B$424=Sheet3!B2)*(Formula using $2:$424 in each column of the mensual formula))

Excel - Shift starting column right by x

In excel I have a dataset. This represents how much stock of 2 products is sold in the first, second, third, etc... month of the product being on the shelves (starts in A1):
Month 1 2 3 4 5 6 7 8 9 10 11 12
Product 1 3 5 2 1 6 1 2 4 7 2 1 5
Product 2 2 1 5 6 2 8 2 1 2 3 4 9
However, the first product sales do not always occur in month 1. They occur in month X. Is there a way (not VBA or copy and paste) of shifting the entries right by 'x' so they align with the month.
Example for data above
Product 1 starts in month 2
Product 2 starts in month 5
Month 1 2 3 4 5 6 7 8 9 10 11 12
Product 1 0 3 5 2 1 6 1 2 4 7 2 1 5
Product 2 0 0 0 0 2 1 5 6 2 8 2 1 2 3 4 9
*0 not required (great if possible), but more for illustration
Thanks
I have created a simple example that does the same job. The shown formula is copied over the shown cells in the row of new data. (The number '2' in the formula refers to the column number of the starting data cell which is column B, hence 2.)

How to randomly generate an unobserved data in Python3

I have an dataframe which contain the observed data as:
import pandas as pd
d = {'humanID': [1, 1, 2,2,2,2 ,2,2,2,2], 'dogID':
[1,2,1,5,4,6,7,20,9,7],'month': [1,1,2,3,1,2,3,1,2,2]}
df = pd.DataFrame(data=d)
The df is follow
humanID dogID month
0 1 1 1
1 1 2 1
2 2 1 2
3 2 5 3
4 2 4 1
5 2 6 2
6 2 7 3
7 2 20 1
8 2 9 2
9 2 7 2
We total have two human and twenty dog, and above df contains the observed data. For example:
The first row means: human1 adopt dog1 at January
The second row means: human1 adopt dog2 at January
The third row means: human2 adopt dog1 at Febuary
========================================================================
My goal is randomly generating two unobserved data for each (human, month) that are not appear in the original observed data.
like for human1 at January, he does't adopt the dog [3,4,5,6,7,..20] And I want to randomly create two unobserved sample (human, month) in triple form
humanID dogID month
1 20 1
1 10 1
However, the follow sample is not allowed since it appear in original df
humanID dogID month
1 2 1
For human1, he doesn't have any activity at Feb, so we don't need to sample the unobserved data.
For human2, he have activity for Jan, Feb and March. Therefore, for each month, we want to randomly create the unobserved data. For example, In Jan, human2 adopt dog1, dog4 and god 20. The two random unobserved samples can be
humanID dogID month
2 2 1
2 6 1
same process can be used for Feb and March.
I want to put all of the unobserved in one dataframe such as follow unobserved
humanID dogID month
0 1 20 1
1 1 10 1
2 2 2 1
3 2 6 1
4 2 13 2
5 2 16 2
6 2 1 3
7 2 20 3
Any fast way to do this?
PS: this is a code interview for a start-up company.
Using groupby and random.choices:
import random
dogs = list(range(1,21))
dfs = []
n_sample = 2
for i,d in df.groupby(['humanID', 'month']):
h_id, month = i
sample = pd.DataFrame([(h_id, dogID, month) for dogID in random.choices(list(set(dogs)-set(d['dogID'])), k=n_sample)])
dfs.append(sample)
new_df = pd.concat(dfs).reset_index(drop=True)
new_df.columns = ['humanID', 'dogID', 'month']
print(new_df)
humanID dogID month
0 1 11 1
1 1 5 1
2 2 19 1
3 2 18 1
4 2 15 2
5 2 14 2
6 2 16 3
7 2 18 3
If I understand you correctly, you can use np.random.permutation() for the dogID column to generate random permutations of the column,
df_new=df.copy()
df_new['dogID']=np.random.permutation(df.dogID)
print(df_new.sort_values('month'))
humanID dogID month
0 1 1 1
1 1 20 1
4 2 9 1
7 2 1 1
2 2 4 2
5 2 5 2
8 2 2 2
9 2 7 2
3 2 7 3
6 2 6 3
Or to create random sampling of missing values within the range of dogID:
df_new=df.copy()
a=np.random.permutation(range(df_new.dogID.min(),df_new.dogID.max()))
df_new['dogID']=np.random.choice(a,df_new.shape[0])
print(df_new.sort_values('month'))
humanID dogID month
0 1 18 1
1 1 16 1
4 2 1 1
7 2 8 1
2 2 4 2
5 2 2 2
8 2 16 2
9 2 14 2
3 2 4 3
6 2 12 3

Excel - How do I create a cumulative sum column within a group?

In Excel, I have an hours log that looks like this:
PersonID Hours JobCode
1 7 1
1 6 2
1 8 3
1 10 1
2 5 3
2 3 5
2 12 2
2 4 1
What I would like to do is create a column with a running total, but only within each PersonID so I want to create this:
PersonID Hours JobCode Total
1 7 1 7
1 6 2 13
1 8 3 21
1 10 1 31
2 5 3 5
2 3 5 8
2 12 2 20
2 4 1 24
Any ideas on how to do that?
In D2 and fill down:
=SUMIF(A$2:A2,A2,B$2:B2)
Assuming that your data starts in cell A1, this formula will accumulate the hours until it finds a change in person ID.
=IF(A2=A1,D1+B2,B2)
Put the formula in cell D2, and copy down for each row of your data.

Resources