i want Cumulative count of zero only in column c grouped by column a and sorted by b if other number the count reset to 1
this a sample
df = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
)
i try next code that work but if zero appear more than one time shift function didn't depend on new value and need to run more than one time depend on count of zero series
df.loc[df.c == 0 ,'n'] = df.n.shift(1)+1
i try next code it done with small data frame but when try with large data take a long time and didn't finsh
for ind in df.index:
if df.loc[ind,'c'] == 0 :
df.loc[ind,'new'] = df.loc[ind-1,'new']+1
else :
df.loc[ind,'new'] = 1
pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
The desired result
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2
Try use cumsum to create a group variable and then use groupby.cumcount to create the new column:
df.sort_values(['a', 'b'], inplace=True)
df['n'] = df['c'].groupby([df.a, df['c'].ne(0).cumsum()]).cumcount() + 1
df
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2
Starting Dataframe:
A B
0 1 1
1 1 2
2 2 3
3 3 4
4 3 5
5 1 6
6 1 7
7 1 8
8 2 9
Desired result - eg. Remove rows where column A has values that match the row above or below:
A B
0 1 1
2 2 3
3 3 4
5 1 6
8 2 9
You can use boolean indexing, the following condition will return true if value of A is NOT equal to value of A's next row
new_df = df[df['A'].ne(df['A'].shift())]
A B
0 1 1
2 2 3
3 3 4
5 1 6
8 2 9
I have a table like this
times v2
0 4 10
1 2 20
2 0 30/n30
3 1 40
4 0 9
What I want if change the values of v2 when times != 0, and the change consists in adding "\0" as many times as the times columns says.
times v2
0 4 10\n0\n0\n0\n0
1 2 20\n0\n0
2 0 30\n30
3 1 40\n0
4 0 9
You can do
df.v2+=df.times.map(lambda x : x*"\n0")
df
Out[325]:
times v2
0 4 10\n0\n0\n0\n0
1 2 20\n0\n0
2 0 30/n30
3 1 40\n0
4 0 9
I have DataFrame containing three columns:
The incrementor
The incremented
Other
I would like lengthen the DataFrame in a particular way. In each row, I want to add a number of rows, depending on the incrementor, and in these rows we increment the incremented, while the "other" is just replicated.
I made a small example which makes it more clear:
df = pd.DataFrame([[2,1,3], [5,20,0], ['a','b','c']]).transpose()
df.columns = ['incrementor', 'incremented', 'other']
df
incrementor incremented other
0 2 5 a
1 1 20 b
2 3 0 c
The desired output is:
incrementor incremented other
0 2 5 a
1 2 6 a
2 1 20 b
3 3 0 c
4 3 1 c
5 3 2 c
Is there a way to do this elegantly and efficiently with Pandas? Or is there no way to avoid looping?
First get repeated rows on incrementor using repeat and .loc
In [1029]: dff = df.loc[df.index.repeat(df.incrementor.astype(int))]
Then, modify incremented with cumcount
In [1030]: dff.assign(
incremented=dff.incremented + dff.groupby(level=0).incremented.cumcount()
).reset_index(drop=True)
Out[1030]:
incrementor incremented other
0 2 5 a
1 2 6 a
2 1 20 b
3 3 0 c
4 3 1 c
5 3 2 c
Details
In [1031]: dff
Out[1031]:
incrementor incremented other
0 2 5 a
0 2 5 a
1 1 20 b
2 3 0 c
2 3 0 c
2 3 0 c
In [1032]: dff.groupby(level=0).incremented.cumcount()
Out[1032]:
0 0
0 1
1 0
2 0
2 1
2 2
dtype: int64
I have a data set that is spread across five columns. Sample of data:
Raw Data End Results
A B C D E A B C D E
1 2 2 1 6 1 2 2 1 6
0 3 3 0 6 0 3 3 0 6
1 2 2 1 6
0 3 3 0 6
1 2 2 1 6
0 3 3 0 6
1 2 2 1 6
0 3 3 0 6
1 2 2 1 6
0 3 3 0 6
1 2 2 1 6
0 3 3 0 6
1 2 2 1 6
0 3 3 0 6
The length of record varies from 10 to 40.
The data is to help me keep record of inventory and I wish to know which orders are popular.
Unfortunately I am still using Excel 2003.
Because I am not really sure what you have, this is deliberately simple:
In ColumnG Row1 put:
=A1&B1&C1&D1&E1
and copy down to suit. Select ColumnG and Paste Special, Values. Select ColumnG and sort. Insert in H1 and copy down to suit:
=COUNTIF(G$1:G1,G1)
1 should indicate the first ("unique") instance of each of the rows of Raw Data (and the other numbers the number of repetitions - up to 7 in your example, so one 'original' and six 'copies'.