I have a table full of numbers with with headings. I also have a separate list of numbers that are contained in the table. I would like to find the location of each number on the list, in the table. I would then like to use the cell location to provide the corresponding row heading. I demonstrated what I'm looking for below.
How do I go about doing this? I'm imagining some combination of index/match functions, or perhaps vlookup, but none of the formulas that I've tried have worked so far. I'm completely lost at this point, so any help will be appreciated.
Thanks in advance!
Imagine something like this:
Table:
- Category A 1 2 3 4 5
- Category B 6 7 8 9 10
- Category C 11 12 13 14 15
- Category D 16 17 18 19 20
- Category E 21 22 23 24 25
List:
22
5
10
4
18
6
14
2
Desired Outcome:
- 22 Category E
- 5 Category A
- 10 Category B
- 4 Category A
- 18 Category D
- 6 Category B
- 14 Category C
- 2 Category A
Step 1: Find the row that the matching value is in
You can find the matching row by using a combination of a boolean function and SUMPRODUCT:
SUMPRODUCT((dataRange=22)*ROW(dataRange))
(note that this assumes that the items are all unique; it will not work if you have more than one match)
Step 2: find the category for that row
OFFSET(categoryACell, rows, 0)
so the resulting function would be:
OFFSET(categoryACell, SUMPRODUCT(--(dataRange=22)*ROW(dataRange)), 0)
A | B | C | D | E | F
_________________________________________________________
1 || Category A | 1 | 2 | 3 | 4 | 5
2 || Category B | 6 | 7 | 8 | 9 | 10
3 || Category C | 11 | 12 | 13 | 14 | 15
4 || Category D | 16 | 17 | 18 | 19 | 20
5 || Category E | 21 | 22 | 23 | 24 | 25
6 ||
7 ||
8 ||
9 ||
10 || 22 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A10)*ROW(B1:F5)))
11 || 5 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A11)*ROW(B1:F5)))
12 || 10 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A12)*ROW(B1:F5)))
13 || 4 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A13)*ROW(B1:F5)))
14 || 18 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A14)*ROW(B1:F5)))
15 || 6 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A15)*ROW(B1:F5)))
16 || 14 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A16)*ROW(B1:F5)))
17 || 2 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A17)*ROW(B1:F5)))
Related
I have a dataframe that looks like this:
ID EVENT DATE
1 1 142
1 5 167
1 3 245
2 1 54
2 5 87
3 3 165
3 2 178
And I would like to generate something like this:
EVENT_1 EVENT_2 COUNT
1 5 2
5 3 1
3 2 1
The idea is how many items (ID) go from one event to the next one. Don't care about previous states, I just want to consider the next state from the current state (e.g.: for ID 1, I don't want to count a transition from 1 to 3 because first, it goes to event 5 and then to 3).
The date format is the number of days from a specific date (sort of like SAS format).
Is there a clean way to achieve this?
Let's try this:
(df.groupby([df['EVENT'].rename('EVENT_1'),
df.groupby('ID')['EVENT'].shift(-1).rename('EVENT_2')])['ID']
.count()).rename('COUNT').reset_index().astype(int)
Output:
| | EVENT_1 | EVENT_2 | COUNT |
|---:|----------:|----------:|--------:|
| 0 | 1 | 5 | 2 |
| 1 | 3 | 2 | 1 |
| 2 | 5 | 3 | 1 |
Details: Groupby on 'EVENT' and shifted 'EVENT' within each ID, then count.
You could use groupby and shift. We'll also use rename_axis and reset_index to tidy up the final output:
(pd.concat([f.groupby([f['EVENT'], f['EVENT'].shift(-1).astype('Int64')]).size()
for _, f in df.groupby('ID')])
.groupby(level=[0, 1]).sum()
.rename_axis(['EVENT_1', 'EVENT_2']).reset_index(name='COUNT'))
[out]
EVENT_1 EVENT_2 COUNT
0 1 5 2
1 3 2 1
2 5 3 1
I want to create a new column in Python dataframe with specific requirements from other columns. For example, my python dataframe df:
A | B
-----------
5 | 0
5 | 1
15 | 1
10 | 1
10 | 1
20 | 2
15 | 2
10 | 2
5 | 3
15 | 3
10 | 4
20 | 0
I want to create new column C, with below requirements:
When the value of B = 0, then C = 0
The same value in B will have the same value in C. The same values in B will be classified as start, middle, and end. So for values 1, it has 1 start, 2 middle, and 1 end, for values 3, it has 1 start, 0 middle, and 1 end. And the calculation for each section:
I specify a threshold = 10.
Let's look at values B = 1 :
Start :
C.loc[2] = min(threshold, A.loc[1]) + A.loc[2]
Middle :
C.loc[3] = A.loc[3]
C.loc[4] = A.loc[4]
End:
C.loc[5] = min(Threshold, A.loc[6])
However, the output value of C will be the sum of the above calculations.
When the value of B is unique and not 0. For example when B = 4
C[10] = min(threshold, A.loc[9]) + min(threshold, A.loc[11])
I can solve point 0 and 3. But I'm struggling to solve point 2.
So, the final output will be:
A | B | c
--------------------
5 | 0 | 0
5 | 1 | 45
15 | 1 | 45
10 | 1 | 45
10 | 1 | 45
20 | 2 | 50
15 | 2 | 50
10 | 2 | 50
5 | 3 | 25
10 | 3 | 25
10 | 4 | 20
20 | 0 | 0
This question already has answers here:
How do I create a new column from the output of pandas groupby().sum()?
(4 answers)
Closed 3 years ago.
I want to create a new column in python dataframe based on other column values in multiple rows.
For example, my python dataframe df:
A | B
------------
10 | 1
20 | 1
30 | 1
10 | 1
10 | 2
15 | 3
10 | 3
I want to create variable C that is based on the value of variable A with condition from variable B in multiple rows. When the value of variable B in row i,i+1,..., the the value of C is the sum of variable A in those rows. In this case, my output data frame will be:
A | B | C
--------------------
10 | 1 | 70
20 | 1 | 70
30 | 1 | 70
10 | 1 | 70
10 | 2 | 10
15 | 3 | 25
10 | 3 | 25
I haven't got any idea the best way to achieve this. Can anyone help?
Thanks in advance
recreate the data:
import pandas as pd
A = [10,20,30,10,10,15,10]
B = [1,1,1,1,2,3,3]
df = pd.DataFrame({'A':A, 'B':B})
df
A B
0 10 1
1 20 1
2 30 1
3 10 1
4 10 2
5 15 3
6 10 3
and then i'll create a lookup Series from the df:
lookup = df.groupby('B')['A'].sum()
lookup
A
B
1 70
2 10
3 25
and then i'll use that lookup on the df using apply
df.loc[:,'C'] = df.apply(lambda row: lookup[lookup.index == row['B']].values[0], axis=1)
df
A B C
0 10 1 70
1 20 1 70
2 30 1 70
3 10 1 70
4 10 2 10
5 15 3 25
6 10 3 25
You have to use groupby() method, to group the rows on B and sum() on A.
df['C'] = df.groupby('B')['A'].transform(sum)
I need to turn this excel sheet (one number per cell):
A B C D E F G
--------------------------------
1 | 1 2 3 4 5 6 7
2 | 8 9 10 11 12 13 14
3 | 15 16 17 18 19 20 21
into this (with all spaces between numbers, each row in one cell):
A
------------
| 1
1 | 2
| 3 4 5
| 6 7
—-line break—-
| 8
2 | 9
| 10 11 12
| 13 14
—-line break—-
| 15
3 | 16
| 17 18 19
| 20 21
Does anyone have any ideas or suggestions?
After a little playing around, I finally found a formula that worked for the above. The CHAR(10)s are the line breaks.
=(TRANSPOSE(A1)&CHAR(10)&TRANSPOSE(B1)&CHAR(10)&CONCATENATE(C1," ",D1," ",E1)&CHAR(10)&CONCATENATE(F1," ",G1)&CHAR(10))
I have a table with students' answers to 20 math problems like this:
A | B | C | D | E |...
------------+-----+-----+-----+-----+...
problem no | 1 | 2 | 3 | 4 |...
------------+-----+-----+-----+-----+...
right answer| 3 | 2 | A | 15 |...
------------+-----+-----+-----+-----+...
student1 | 3 | 4 | A | 12 |...
student2 | 2 | 2 | C | 15 |...
student3 | 3 | 2 | A | 13 |...
Now a need a column that counts the 'right' answers for each student.
I can do it this so: =(IF(D$3=D5;1;0))+(IF(E$3=E5;1;0))+(IF(F$3=F5;1;0))+...
...but it's not the nicest way :)
This is a typical use case for SUMPRODUCT:
A B C D E F G
1 problem no 1 2 3 4
2 right answer 3 2 A 15 right answers per student
3 student1 3 4 A 12 2
4 student2 2 2 C 15 2
5 student3 3 2 A 13 3
Formula in G3:
=SUMPRODUCT($B$2:$E$2=$B3:$E3)
If there are more problem numbers, then the column letters in $E$2 and $E3 have to be increased.
How it works:
SUMPRODUCT takes its inner functions as array formulas. So the $B$2:$E$2=$B3:$E3 becomes a matrix of {TRUE, FALSE, TRUE, FALSE} depending of if $B$2=$B3, $C$2=$C3, $D$2=$D3, $E$2=$E3.
In Libreoffice or Openoffice TRUE is 1 and FALSE is 0. So the SUMPRODUCT sums all TRUEs.
In Excel you have to get the boolean values in numeric context first. So the Formula in Excel will be =SUMPRODUCT(($B$2:$E$2=$B3:$E3)*1).
The formula in Row 3 then can be filled down for all student rows. The $ before the row number 2 ensures that thereby the row of the right answers not changes.
Greetings
Axel