sum of column if an other colum matches a value - python-3.x

I have a dataframe, i want to have a new column with the sum of all count where foo is equal than the current row.
It would be possble to create a new dataframe and group sum it there and merge it back. but I guess there is a much simpler solution.
Any hints are highly appreciated
Input:
foo
count
a
3
a
7
b
1
b
2
Output:
foo
count
sum_of_count
a
3
10
a
7
10
b
1
3
b
2
3

Related

Sum values of one column depending on the ID of another

I have a table with many values as such (this is an oversimplified example):
IDx
Namex
Pricex
1
a
5
2
b
2
1
a2
5
3
c
3
2
b2
9
and another table with only the ID, in which I'd like to add a column that shows the addition of all the values that match that ID, in this example:
IDy
Totaly
1
10
2
11
3
3
I'm guessing this is a combination of vlookup with sum or sumif, I've tried so far:
=SUM(VLOOKUP(IDy1,$IDx$1:$IDx$5,$Pricex$1:$Pricex$5),// don't know how to proceed here
Try this:
B5:B9 = IDx
B16,D5 = Price
=SUMIF(B$5:B$9,B16,D$5:D$9)

I want to count the occurrence of duplicate values in a column in a dataframe and update the count in a new column in python

Example: Let's say I have a df
Id
A
B
C
A
A
B
It should look like:
Id count
A. 1
B. 1
C. 1
A. 2
A. 3
B. 2
Note: I've tried using the for loop method and while loop option but it works for small datasets but takes a lot of time for large datasets.
for i in df:
for j in df:
if i==j:
count+=1
You can groupby with cumcount, like this:
df['counts'] = df.groupby('Id', sort=False).cumcount() + 1
df.head()
Id counts
0 A 1
1 B 1
2 C 1
3 A 2
4 A 3
5 B 2
dups_values = df.pivot_table(index=['values'], aggfunc='size')
print(dups_values)

Identify and count alternating parts of a column in a (timeseries) dataframe

I am analyzing trades done in a futures contract, based on a csv file with a list of trades (columns are Side, Qty, Price, Date).
I have imported the file and sorted the trades chronologically by time. The column "Side" (BUY/SELL) is now:
B
S
S
B
B
S
S
B
B
B
B
I want to give each row of B's and each row of S's a unique number, in order for me to group each individual parts of B's and S's for further analysis. I want for example to find out what the average price of each row of Bs and each row of Ss are.
In the example above there are 5 rows/parts in total, 3 B's and 2 S's. The first row of B's should be 1. The second row of B's should be 3 and the last row of B's should be 5. Basically I want to add a column with this output:
1
2
2
3
3
4
4
5
5
5
5
Now I should be able to find the average price of the four B's in row number 5 using groupby with the new column as argument and mean().
But how can I make the counter needed for this new column? I am able to identify each change using somehing like np.where(), diff(), abs() + cumsum() and 1 and -1, but I dont see how I can add +1 to each alternation.
Use Series.shift with compare not equal and cumulative sum by Series.cumsum:
df['new'] = df['Side'].ne(df['Side'].shift()).cumsum()
How it working:
df = df.assign(shifted = df['Side'].shift(),
mask = df['Side'].ne(df['Side'].shift()),
new = df['Side'].ne(df['Side'].shift()).cumsum())
print (df)
Side shifted mask new
0 B NaN True 1
1 S B True 2
2 S S False 2
3 B S True 3
4 B B False 3
5 S B True 4
6 S S False 4
7 B S True 5
8 B B False 5
9 B B False 5
10 B B False 5

Subtract a subset of columns from a key column in Pandas Pivot

I have a pivot table with multiple columns of data in a time series:
A B C D
11/1/2018 1 5 5 7
11/2/2018 2 6 6 8
11/3/2018 3 7 7 9
The values in the data columns are not important for this example. I would like to subtract the value in the "key" column (column A in this case) from a subset of columns: B & C in this case. I would then like to drop any columns not in the subset or the key column. Result would be:
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4
I have subtracted columns in the past via code like this:
df['dif'] = df['B'] -df['A']
But this will add the "dif" column. I would like to replace column B with B-A values. Also, instead of passing the instructions one at a time (B-A, C-A), would like to pass the list something like "if column in list, subtract key column, else drop column."
Thanks
pandas.DataFrame.sub with axis=0
When subtracting a Series from a DataFrame Pandas will align the columns of the DataFrame with the index of the Series by default. This is what happens when you use the - operator. However, when you use the pandas.DataFrame.sub method, you can override that default and specify that the DataFrame should align its index with the index of the Series.
def f(d, key, subset):
return d[[key]].join(d[subset].sub(d[key], axis=0))
f(df, 'A', ['B', 'C'])
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4
You can use apply to substract A from the subset columns that you choose and finally join again with A.
df['A'].to_frame().join(df[['B','C']].apply(lambda x: x - df['A']))
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4

How can I count the number of values by group in excel

I was wondering if there was a way to count the number of values by category. Example:
A 3
A 3
A 3
B 4
B 4
B 4
B 4
C 5
C 5
C 5
C 5
C 5
D 2
D 2
What is happening there is that there are 5 categories "A, B, C, D" and there are different counts of it. Duplicate values. I would like to create a new column and output the number of times it occurs in a different column as shown above. Please no VBA as i don't know it.
Try this...
=IF(A2<>A1,COUNTIF(A:A,A2),"")

Resources