Replacing less occurring as Others.
In one of the column, i.e "Name" of the data frame say df, I have the data as below:
Sample Input:
Name
A
A
A
B
B
C
D
df['Name'].value_counts()
A 3
B 2
C 1
D 1
I would need output in below format.
Expected output:
A 3
B 2
Others 2
Any code in python3 is appreciated. Thanks in advance.
You need:
x = list(df['name'].value_counts()[:2].index) # This will fetch top N values
df['name'] = np.where(df['name'].isin(x), df['name'], 'others')
print(df['Name'].value_counts())
output
A 3
B 2
others 2
Related
I have a dataframe, i want to have a new column with the sum of all count where foo is equal than the current row.
It would be possble to create a new dataframe and group sum it there and merge it back. but I guess there is a much simpler solution.
Any hints are highly appreciated
Input:
foo
count
a
3
a
7
b
1
b
2
Output:
foo
count
sum_of_count
a
3
10
a
7
10
b
1
3
b
2
3
Key
----------
0 a
1 a
2 b
3 b
4 a
5 c
so far i tried this:
df.groupby(["key1"],).count()
However it is also showing the counts of b and c, i want only for a.
Create mask and count by sum:
df["Key"].eq('a').sum()
Example: Let's say I have a df
Id
A
B
C
A
A
B
It should look like:
Id count
A. 1
B. 1
C. 1
A. 2
A. 3
B. 2
Note: I've tried using the for loop method and while loop option but it works for small datasets but takes a lot of time for large datasets.
for i in df:
for j in df:
if i==j:
count+=1
You can groupby with cumcount, like this:
df['counts'] = df.groupby('Id', sort=False).cumcount() + 1
df.head()
Id counts
0 A 1
1 B 1
2 C 1
3 A 2
4 A 3
5 B 2
dups_values = df.pivot_table(index=['values'], aggfunc='size')
print(dups_values)
This is the original series. I'm trying to replace values of the non top 2 in the series with 'Other'.
Original Series(ser3):
b 8
c 6
a 5
h 4
g 2
d 2
f 2
e 1
This is my extracted top 2.
Top 2:
t2 = ((ser3.value_counts().head(2)))
b 8
c 6
Expected Output:
b 8
c 6
a Other
h Other
g Other
d Other
f Other
e Other
How can I do that? I do not want to convert to dictionary and replace the values by indexing. I prefer to do it by Series. I tried using .isin, but my code gives me an error.
a[a[~a.isin(t2)].index]='Other'
The above gives me an error.
You are close, need select t2.index and remove outer ser3[]:
ser3[~ser3.isin(t2.index)]='Other'
I was wondering if there was a way to count the number of values by category. Example:
A 3
A 3
A 3
B 4
B 4
B 4
B 4
C 5
C 5
C 5
C 5
C 5
D 2
D 2
What is happening there is that there are 5 categories "A, B, C, D" and there are different counts of it. Duplicate values. I would like to create a new column and output the number of times it occurs in a different column as shown above. Please no VBA as i don't know it.
Try this...
=IF(A2<>A1,COUNTIF(A:A,A2),"")