Is there a way to convert my column of incrementing integers separated by zero to the number of intervals encountered so far in a pandas datafram? - python-3.x

I'm working in pandas and I have a column in my dataframe filled by 0s and incrementing integers starting at one. I would like to add another column of integers but that column would be a counter of how many intervals separated by zero we have encountered to this point. For example my data would like like
Index
1
2
3
0
1
2
0
1
and I would like it to look like
Index IntervalCount
1 1
2 1
3 1
0 1
1 2
2 2
0 2
1 2
Is it possible to do this with vectorized operation or do I have to do this iteratively? Note, it's not important that it be a new column could also overwrite the old one.

You can use cumsum function.
df["IntervalCount"] = (df["Index"] == 1).cumsum()

Related

Csv file split comma separated values into separate rows and dividing the corresponding dollar amount by the number of comma separated values in panda

beginner here!
I have a csv file with comma separated values. I want to split each comma separated value in different rows in pandas. However, the corresponding dollar amounts should be divided by the number of comma separated values in each cell and export the result in a different csv file.
the csv table and the desired output table
I have used df.explode(IDs) but couldn’t figure out how to divide the Dollar_Amount by the number of IDs in the corresponding cells.
import pandas as pd
in_csv = pd.read_csv(‘inputCSV.csv’)
new_csv = df.explode(‘IDs’)
new_csv.to_csv(‘outputCSV.csv’)
You can divide the dollar amount by the number of ids in each row before using explode. This can be done as follows:
# Preprocessing
df['Dollar_Amount'] = df['Dollar_Amount'].str[1:].str.replace(',', '').astype(float)
df['IDs'] = df['IDs'].str.split(",")
# Compute the new dollar amount and explode
df['Dollar_Amount'] = df['Dollar_Amount'] / df['IDs'].str.len()
df = df.explode('IDs')
# Postprocessing
df['Dollar_Amount'] = df['Dollar_Amount'].round(2).apply(lambda x: '${0:,.2f}'.format(x))
With an example input:
IDs Dollar_Amount A
0 1,2,3,4 $100,000.00 4
1 5,6,7 $50,000.00 3
2 9 $20,000.00 1
3 10,11 $20,000.00 2
The result is as follows:
IDs Dollar_Amount A
0 1 $25,000.00 4
0 2 $25,000.00 4
0 3 $25,000.00 4
0 4 $25,000.00 4
1 5 $16,666.67 3
1 6 $16,666.67 3
1 7 $16,666.67 3
2 9 $20,000.00 1
3 10 $10,000.00 2
3 11 $10,000.00 2
There will be a one line way to do this with a lambda function (if you are new, read up on lambda functions!) but as a slightly less new beginner, I think its easier to think about this as two separate operations.
Operation 1 - get the count of ids, Operation 2 - do the division
If you take a look here https://towardsdatascience.com/count-occurrences-of-a-value-pandas-e5dad02303e9 you'll get a good lesson on how to do the group by you need to get the count of ids and join it back to your data frame. I'd read that because its a much more detailed explainer, but if you want a simple line of code consider this Pandas, how to count the occurance within grouped dataframe and create new column?
Once you have it, the divison is as simple as df['new_col'] = df['col1']/df['col2']

How to generate a Dataframe whose length equals to the product of all columns lengths?

I am looking for a quick way to generate a long dataframe. For example, the input is:
Column "color": [1,2,3] (length: 3)
Column "weekday": [0,1] (length: 2)
The expected output is:
color weekday
1 0
2 0
3 0
1 1
2 1
3 1
And this output dataframe has the length as 2*3 = 6.
Is there a quick way to generate such dataframes based on the series as the input? And it is possible that there are many columns. Thanks.

In Excel, how can I create a counter to increment based on a condition?

Lets say in Excel, that I have this in a column. When the number in column N is greater than 3, then the number in the Counter column increases. I have tried if statements, but the count never works.
N Counter
1 1
2 1
3 1
1 2
2 2
3 2
1 3
2 3
3 3
Try this simple COUNTIFS() function.
=COUNTIFS($A$2:$A2,A2)

Repeat X in seven rows before incrementing X

Like so:
1
1
1
1
1
1
1
2
2
2
2
2
2
2
...
Something like =IF(A1=4,1,A1+1) would work if the sequence wasn't all the same value.
I believe you are looking for division:
=INT((ROW()-1)/7)
NOTE: You will have to play with the 1 and 7 in the formula above to adapt to your needs. -1 is like an offset, and 7 is the number of times for the repeat. Use -2 if the numbers should start on row 2 for example. Lastly, if you want to start with 1 instead of zero, simply add 1.
=(the previous row's value) + IF(the previous 7 rows all have the same value, 1, 0 )
Obviously(?) this will not work for the first 7 rows; the first row could be the starting value, the next 6 just a straight copy of that.

How to compute the maximum series of a specific condition returning true

i have a slight issue to count the MAX frequency of where the third colmn is bigger than the second. This is just a statistic with scores.
The issue is that i want to have it in one single formula without a macro.
B C
------
2 0
1 2
2 1
2 3
0 1
1 2
0 1
3 3
0 2
0 2
i have tried it with:
{=MAX(FREQUENCY(B3:B100;B3:B100>=C3:C100))} to get 1 for B
{=MAX(FREQUENCY(C3:C100;C3:C100>=B3:B100))} to get 7 for C
I excpected it to deliver me the longest series where the value in the one column was bigger than in the other one, but i failed hard...
Try this version to get 7
=MAX(FREQUENCY(IF(C3:C100>=B3:B100,IF(B3:B100<>"",ROW(B3:B100))),IF(C3:C100<B3:B100,ROW(B3:B100))))
confirmed with CTRL+SHIFT+ENTER
obviously reverse the ranges to get your other result
See example here

Resources