what is the good way to add 1 in column values if value greater than 2 python - python-3.x

I want to add 1 in column values if column value is greater than 2
here is my dataframe
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,5,1,1,0,1,1,1]})
df_out
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,6,1,1,0,1,1,1]})

Use DataFrame.loc with add 1:
df.loc[df.A.gt(2), 'flag'] += 1
print (df)
A flag
0 1 1
1 1 1
2 1 0
3 1 1
4 1 1
5 1 1
6 3 6
7 2 1
8 2 1
9 2 0
10 2 1
11 2 1
12 2 1
Or:
df['flag'] = np.where(df.A.gt(2), df['flag'] + 1, df['flag'])
EDIT:
mean = df.groupby(pd.cut(df['x'], bins))['y'].transform('mean')
df['flag'] = np.where(mean.gt(2), df['y'] + 1, df['y'])
And then:
x= df.groupby(pd.cut(df['x'], bins))['y'].apply(lambda x:abs(x-np.mean(x)))

Related

Cumulative count using grouping, sorting, and condition

i want Cumulative count of zero only in column c grouped by column a and sorted by b if other number the count reset to 1
this a sample
df = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
)
i try next code that work but if zero appear more than one time shift function didn't depend on new value and need to run more than one time depend on count of zero series
df.loc[df.c == 0 ,'n'] = df.n.shift(1)+1
i try next code it done with small data frame but when try with large data take a long time and didn't finsh
for ind in df.index:
if df.loc[ind,'c'] == 0 :
df.loc[ind,'new'] = df.loc[ind-1,'new']+1
else :
df.loc[ind,'new'] = 1
pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
The desired result
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2
Try use cumsum to create a group variable and then use groupby.cumcount to create the new column:
df.sort_values(['a', 'b'], inplace=True)
df['n'] = df['c'].groupby([df.a, df['c'].ne(0).cumsum()]).cumcount() + 1
df
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2

Change every CSV file value

I'm sure there's a simple solution to this but I'm struggling. I want to set the values of a csv file I've created to 1s and 0s so that I can work out the probability based on each row.
Here's the csv data:
0 1 2 3 4 5 6 7 \
0 Reference China Greece Japan S Africa S Korea Sri lanka Taiwan
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ...
14898 1 1 1 1 1 1 1 1
14899 1 1 1 1 1 1 1 1
14900 1 1 1 1 1 1 1 1
14901 1 1 1 1 1 1 1 1
14902 1 1 1 1 1 1 1 1
8 9 10 11 12 13 14 15 16
0 USA Ecuador Egypt Ghana India Isreal Pakistan Taiwan USA Ohio
1 1.031 1 1 1 1 1 1 1 1
2 1.031 1 1 1 1 1 1 1 1
3 1.031 1 1 1 1 1 1 1 1
4 1.031 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ... ...
14898 1 1 1 1 1 1 1 1 1
14899 1 1 1 1 1 1 1 1 1
14900 1 1 1 1 1 1 1 1 1
14901 1 1 1 1 1 1 1 1 1
14902 1 1 1 1 1 1 1 1 1
[14903 rows x 17 columns]
And I've tried this:
data = pd.DataFrame(pd.read_csv('IEratios.csv', header=None, sep=','))
for x in data:
if x == 1:
x = 0
else:
x = 1
Which I thought would be simple and work but I was wrong and everywhere I look nothing I find seems to apply to all columns and rows, so I am lost.
You can use the .map() function in pandas, this allows you to run a function trough an entire DF column like so:
def changeNumber(x):
if x == 1:
return 0
else:
return 1
df = pd.read_csv('IEratios.csv', sep=',')
df['China'] = df['china'].map(changeNumber)
I don't know if I understand what you want to do.
Do you want to replace the values that are one by zero, and the zeros by one?
If I understand correctly, how are you using panda you can use the following statement
df.replace({"0": "1", "1": "0"}, inplace=True)
You have to be careful with the data type of your dataframe
Have you tried using the numpy.where function?
data = pd.DataFrame(pd.read_csv('IEratios.csv', header=None, sep=','))
data = np.where((data == 1), 0, 1)

all possible steps by 1 or 2 to reach nth stair

I am working on a python program where I want to find all possible ways to reach nth floor.
Here is my program taken from here:
# A program to count the number of ways to reach n'th stair
# Recurssive program to find n'th fibonacci number
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
# returns no. of ways to reach s'th stair
def countWays(s):
return fib(s + 1)
# Driver program
s = 10
print("Number of ways = ", countWays(s) )
Here I am getting the total number of ways to reach nth floor, but I want a function that can return an array of all possible ways to reach nth floor.
Example:
1) s = 3 output should be the possible steps which are {1,1,1}, {2,1}, {1,2}.
2) s = 10, has 89 combinations:
1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
1 2 1 1 1 1 1 1 1
1 1 2 1 1 1 1 1 1
2 2 1 1 1 1 1 1
1 1 1 2 1 1 1 1 1
2 1 2 1 1 1 1 1
1 2 2 1 1 1 1 1
1 1 1 1 2 1 1 1 1
2 1 1 2 1 1 1 1
1 2 1 2 1 1 1 1
1 1 2 2 1 1 1 1
2 2 2 1 1 1 1
1 1 1 1 1 2 1 1 1
2 1 1 1 2 1 1 1
1 2 1 1 2 1 1 1
1 1 2 1 2 1 1 1
2 2 1 2 1 1 1
1 1 1 2 2 1 1 1
2 1 2 2 1 1 1
1 2 2 2 1 1 1
1 1 1 1 1 1 2 1 1
2 1 1 1 1 2 1 1
1 2 1 1 1 2 1 1
1 1 2 1 1 2 1 1
2 2 1 1 2 1 1
1 1 1 2 1 2 1 1
2 1 2 1 2 1 1
1 2 2 1 2 1 1
1 1 1 1 2 2 1 1
2 1 1 2 2 1 1
1 2 1 2 2 1 1
1 1 2 2 2 1 1
2 2 2 2 1 1
1 1 1 1 1 1 1 2 1
2 1 1 1 1 1 2 1
1 2 1 1 1 1 2 1
1 1 2 1 1 1 2 1
2 2 1 1 1 2 1
1 1 1 2 1 1 2 1
2 1 2 1 1 2 1
1 2 2 1 1 2 1
1 1 1 1 2 1 2 1
2 1 1 2 1 2 1
1 2 1 2 1 2 1
1 1 2 2 1 2 1
2 2 2 1 2 1
1 1 1 1 1 2 2 1
2 1 1 1 2 2 1
1 2 1 1 2 2 1
1 1 2 1 2 2 1
2 2 1 2 2 1
1 1 1 2 2 2 1
2 1 2 2 2 1
1 2 2 2 2 1
1 1 1 1 1 1 1 1 2
2 1 1 1 1 1 1 2
1 2 1 1 1 1 1 2
1 1 2 1 1 1 1 2
2 2 1 1 1 1 2
1 1 1 2 1 1 1 2
2 1 2 1 1 1 2
1 2 2 1 1 1 2
1 1 1 1 2 1 1 2
2 1 1 2 1 1 2
1 2 1 2 1 1 2
1 1 2 2 1 1 2
2 2 2 1 1 2
1 1 1 1 1 2 1 2
2 1 1 1 2 1 2
1 2 1 1 2 1 2
1 1 2 1 2 1 2
2 2 1 2 1 2
1 1 1 2 2 1 2
2 1 2 2 1 2
1 2 2 2 1 2
1 1 1 1 1 1 2 2
2 1 1 1 1 2 2
1 2 1 1 1 2 2
1 1 2 1 1 2 2
2 2 1 1 2 2
1 1 1 2 1 2 2
2 1 2 1 2 2
1 2 2 1 2 2
1 1 1 1 2 2 2
2 1 1 2 2 2
1 2 1 2 2 2
1 1 2 2 2 2
2 2 2 2 2
Update:
I found this working code in Java, I am not able to understand how do I change this to python
public static void main(String args[]) {
int s = 10;
List<Integer> vals = new ArrayList<>();
ClimbWays(s, 0, new int[s], vals);
vals.sort(null);
System.out.println(vals);
}
public static void ClimbWays(int n, int currentIndex, int[] currectClimb, List<Integer> vals) {
if (n < 0)
return;
if (n == 0) {
vals.add(currentIndex);
int last = 0;
for (int i = currentIndex - 1; i >= 0; i--) {
int current = currectClimb[i];
int res = current - last;
last = current;
System.out.print(res + " ");
}
System.out.println();
return;
}
currectClimb[currentIndex] = n;
ClimbWays(n - 1, currentIndex + 1, currectClimb, vals);
ClimbWays(n - 2, currentIndex + 1, currectClimb, vals);
}
It seems like you are looking for a modification of the partitions of a number:
import itertools as it
def partitions(n, I=1):
yield (n,)
for i in range(I, n//2 + 1):
for p in partitions(n-i, i):
yield (i,) + p
def countWays(s):
for i in partitions(s):
if s in i: continue # just the original number
yield from set(it.permutations(i)) # set to remove duplicates
print(list(countWays(3)))
Displays:
[(1, 2), (2, 1), (1, 1, 1)]
Note that this will return them in no particularly sorted order.
(Partitions algorithm from here.)
Here is a conversion of your java code into python:
def climbWays(n, currentIndex, currentClimb, vals):
if n < 0:
return
if n == 0:
vals.append(currentIndex)
last = 0
for i in range(currentIndex - 1, -1, -1):
current = currentClimb[i]
res = current - last
last = current
print(res, end=" ")
print()
return
currentClimb[currentIndex] = n
climbWays(n - 1, currentIndex + 1, currentClimb, vals)
climbWays(n - 2, currentIndex + 1, currentClimb, vals)
s = 10
vals = []
climbWays(s, 0, [0] * s, vals)
vals.sort()
print(vals)

Calculate column by condition

I have table in df:
X1 X2
1 1
1 2
2 2
2 2
3 3
3 3
And i want calculate Y, where Y = Yprevious + 1 if X1=X1previous and X2=X2previous, elso 0. Y on first line = 0. Example.
X1 X2 Y
1 1 0
2 2 0
2 2 1
2 2 2
2 2 3
3 3 0
Not a duplicate... Previously, the question was simpler - addition with a value in a specific line. Now the term appears in the calculation process. I need some cumulative calculation
What I need, more example:
X1 X2 Y
1 1 0
2 2 0
2 2 1
2 2 2
2 2 3
3 3 0
3 3 1
2 2 0
What I get on the link to the duplicate
X1 X2 Y
1 1 0
2 2 0
2 2 1
2 2 2
2 2 3
3 3 0
3 3 1
2 2 4
Use GroupBy.cumcount with new columns by consecutive values:
df1 = df[['X1','X2']].ne(df[['X1','X2']].shift()).cumsum()
df['Y'] = df.groupby([df1['X1'], df1['X2']]).cumcount()
print (df)
X1 X2 Y
0 1 1 0
1 2 2 0
2 2 2 1
3 2 2 2
4 2 2 3
5 3 3 0
6 3 3 1
7 2 2 0

How to apply function to data frame column to created iterated column

I have IDs with system event times, and I have grouped the event times by id (individual systems) and made a new column where the value is 1 if the eventtimes.diff() is greater than 1 day, else 0 . Now that I have the flag I am trying to make a function that will be applied to groupby('ID') so the new column starts with 1 and keeps returning 1 for each row in the new column until the flag shows 1 then the new column will go up 1, to 2 and keep returning 2 until the flag shows 1 again.
I will apply this along with groupby('ID') since I need the new column to start over again at 1 for each ID.
I have tried to the following:
def try(x):
y = 1
if row['flag']==0:
y = y
else:
y += y+1
df['NewCol'] = df.groupby('ID')['flag'].apply(try)
I have tried differing variations of the above to no avail. Thanks in advance for any help you may provide.
Also, feel free to let me know if I messed up posting the question. Not sure if my title is great either.
Use boolean indexing for filtering + cumcount + reindex what is much faster solution as loopy apply :
I think you need for count only 1 per group and if no 1 then 1 is added to output:
df = pd.DataFrame({
'ID': ['a','a','a','a','b','b','b','b','b'],
'flag': [0,0,1,1,0,0,1,1,1]
})
df['new'] = (df[df['flag'] == 1].groupby('ID')['flag']
.cumcount()
.add(1)
.reindex(df.index, fill_value=1))
print (df)
ID flag new
0 a 0 1
1 a 0 1
2 a 1 1
3 a 1 2
4 b 0 1
5 b 0 1
6 b 1 1
7 b 1 2
8 b 1 3
Detail:
#filter by condition
print (df[df['flag'] == 1])
ID flag
2 a 1
3 a 1
6 b 1
7 b 1
8 b 1
#count per group
print (df[df['flag'] == 1].groupby('ID')['flag'].cumcount())
2 0
3 1
6 0
7 1
8 2
dtype: int64
#add 1 for count from 1
print (df[df['flag'] == 1].groupby('ID')['flag'].cumcount().add(1))
2 1
3 2
6 1
7 2
8 3
dtype: int64
If need count 0 and if no 0 is added -1:
df['new'] = (df[df['flag'] == 0].groupby('ID')['flag']
.cumcount()
.add(1)
.reindex(df.index, fill_value=-1))
print (df)
ID flag new
0 a 0 1
1 a 0 2
2 a 1 -1
3 a 1 -1
4 b 0 1
5 b 0 2
6 b 1 -1
7 b 1 -1
8 b 1 -1
Another 2 step solution:
df['new'] = df[df['flag'] == 1].groupby('ID')['flag'].cumcount().add(1)
df['new'] = df['new'].fillna(1).astype(int)
print (df)
ID flag new
0 a 0 1
1 a 0 1
2 a 1 1
3 a 1 2
4 b 0 1
5 b 0 1
6 b 1 1
7 b 1 2
8 b 1 3

Resources