Identify and count alternating parts of a column in a (timeseries) dataframe - python-3.x

I am analyzing trades done in a futures contract, based on a csv file with a list of trades (columns are Side, Qty, Price, Date).
I have imported the file and sorted the trades chronologically by time. The column "Side" (BUY/SELL) is now:
B
S
S
B
B
S
S
B
B
B
B
I want to give each row of B's and each row of S's a unique number, in order for me to group each individual parts of B's and S's for further analysis. I want for example to find out what the average price of each row of Bs and each row of Ss are.
In the example above there are 5 rows/parts in total, 3 B's and 2 S's. The first row of B's should be 1. The second row of B's should be 3 and the last row of B's should be 5. Basically I want to add a column with this output:
1
2
2
3
3
4
4
5
5
5
5
Now I should be able to find the average price of the four B's in row number 5 using groupby with the new column as argument and mean().
But how can I make the counter needed for this new column? I am able to identify each change using somehing like np.where(), diff(), abs() + cumsum() and 1 and -1, but I dont see how I can add +1 to each alternation.

Use Series.shift with compare not equal and cumulative sum by Series.cumsum:
df['new'] = df['Side'].ne(df['Side'].shift()).cumsum()
How it working:
df = df.assign(shifted = df['Side'].shift(),
mask = df['Side'].ne(df['Side'].shift()),
new = df['Side'].ne(df['Side'].shift()).cumsum())
print (df)
Side shifted mask new
0 B NaN True 1
1 S B True 2
2 S S False 2
3 B S True 3
4 B B False 3
5 S B True 4
6 S S False 4
7 B S True 5
8 B B False 5
9 B B False 5
10 B B False 5

Related

sum of column if an other colum matches a value

I have a dataframe, i want to have a new column with the sum of all count where foo is equal than the current row.
It would be possble to create a new dataframe and group sum it there and merge it back. but I guess there is a much simpler solution.
Any hints are highly appreciated
Input:
foo
count
a
3
a
7
b
1
b
2
Output:
foo
count
sum_of_count
a
3
10
a
7
10
b
1
3
b
2
3

Subtract a subset of columns from a key column in Pandas Pivot

I have a pivot table with multiple columns of data in a time series:
A B C D
11/1/2018 1 5 5 7
11/2/2018 2 6 6 8
11/3/2018 3 7 7 9
The values in the data columns are not important for this example. I would like to subtract the value in the "key" column (column A in this case) from a subset of columns: B & C in this case. I would then like to drop any columns not in the subset or the key column. Result would be:
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4
I have subtracted columns in the past via code like this:
df['dif'] = df['B'] -df['A']
But this will add the "dif" column. I would like to replace column B with B-A values. Also, instead of passing the instructions one at a time (B-A, C-A), would like to pass the list something like "if column in list, subtract key column, else drop column."
Thanks
pandas.DataFrame.sub with axis=0
When subtracting a Series from a DataFrame Pandas will align the columns of the DataFrame with the index of the Series by default. This is what happens when you use the - operator. However, when you use the pandas.DataFrame.sub method, you can override that default and specify that the DataFrame should align its index with the index of the Series.
def f(d, key, subset):
return d[[key]].join(d[subset].sub(d[key], axis=0))
f(df, 'A', ['B', 'C'])
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4
You can use apply to substract A from the subset columns that you choose and finally join again with A.
df['A'].to_frame().join(df[['B','C']].apply(lambda x: x - df['A']))
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4

Excel Formula comparing two columns

Below is a sample of the data I have. I want to match the data in Column A and B. If column B is not matching column A, I want to add a row and copy the data from Column A to B. For example, "4" is missing in column B, so I want to add a space and add "4" to column B so it will match column A. I have a large set of data, so I am trying to find a different way instead of checking for duplicate values in the two columns and manually adding one row at a time. Thanks!
A B C D
3 3 Y B
4 5 G B
5 6 B G
6 8 P G
7 9 Y P
8 11 G Y
9 12 B Y
10
11
12
11
12
I would move col B,C,D to a separate columns, say E,F,G, then using index matches against col A and col B identify which records are missing.
For col C: =IFERROR(INDEX(F:F,Match(A1,E:E,0)),"N/A")
For col D: =IFERROR(INDEX(G:G,Match(A1,E:E,0)),"N/A")
Following this you can filter for C="N/A" to identify cases where a B value is missing for an A value, and manually edit. Since you want A & B to be matching here col B is unnecessary, final result w/ removing col B and C->B, D->C:
A B C
3 Y B
4 N/A N/A
5 G B
6 B G
7 N/A N/A
Hope this helps!

Finding from a list of patients whit multiple treatment events those who don't have an initial event

I have a table with a list of patients with repeating treatment events (several thousand of records).
Eg. in the following table in column A patients are coded with numbers (the same numbers the same patient), and in column B are coded the treatment events of the patients.
I want to exclude those patients who don’t have an initial treatment event (here "a"), and to mark them in column C for example with "E".
A B C
1 a
1 b
1 c
2 b E
2 c E
3 a
3 c
4 a
4 b
5 a
5 b
5 c
6 a
6 b
6 c
6 d
6 e
6 f
7 b E
7 f E
The formula to put in the column C is
=IF(COUNTIFS($A$2:$A$21,A2,$B$2:$B$21,"a")=0,"E","")
It counts the occurences of "a" treatments for each patient, and where there is none (count = 0) it puts letter E.

Find summation and count only if they are EQUAL in Excel

In EXCEL sheet I have 1728 rows and 2 columns (L and O). I am doing addition of these 2 columns in column P. Further I want to count the occurrence in this column if addition is EQUAL to 2 or 4 or 6 or 8 BUT condition here is that The COUNT should be such that BOTH the columns L and O are EQUAL and Their addition is either 2 or 4 or 6 or 8.
This means that only the columns in L and O with values "1+1" , "2+2", "3+3", "4+4" should be counted. The addition of "1+3", "4+2" should not be counted.
=COUNTIF(P:P,4)
does not work.
L O P M
===========================
1 1 2 1 (NO OF 2'S)
2 2 4 1 (NO OF 4'S)
3 3 6 1 (NO OF 6'S)
1 3 4* NO TO BE COUNTED
4 4 8 1 (NO OF 8'S)
2 4 6* NOT TO BE COUNTED
4 2 6*
AS SEEN ABOVE RESULT OF COUNTING IS STORED IN M. Let me know the formula
=IF(L29=M29,SUMPRODUCT(--($L$29:$L$35=$M$29:$M$35)*(L29=$L$29:$L$35)),"Not Counted")
My data started in row 29 so you will need to adjust the references. It counts the entire table in 1 shot. So if you added a row to the bottom that had 1 and 1 and 2, the results in column M in your first row would become 2 and the same for the row you just added.
Will this formula help...?
=IF(AND(A1=B1,OR(SUM(A1,B1)=2,SUM(A1,B1)=4,SUM(A1,B1)=6,SUM(A1,B1)=8)),SUM(A1,B1),"NOT TO BE COUNTED")
Just drag the formula till you have data. You will need to adjust the references.
Here is the reference data.

Resources