pd.Series(pred).value_counts() how to get the first column in dataframe?

pd.Series(pred).value_counts() how to get the first column in dataframe? - python-3.x

I apply pd.Series(pred).value_counts() and get this output:
0 2084
-1 15
1 13
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
dtype: int64
When I create a list I get only the second column:
c_list=list(pd.Series(pred).value_counts()), Out:
[2084, 15, 13, 10, 7, 4, 3, 3, 3, 2, 2, 2, 2]
How do I get ultimately a dataframe that looks like this including a new column for size% of total size?
df=
[class , size ,relative_size]
0 2084 , x%
-1 15 , y%
1 13 , etc.
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2

You are very nearly there. Typing this in the blind as you didn't provide a sample input:
df = pd.Series(pred).value_counts().to_frame().reset_index()
df.columns = ['class', 'size']
df['relative_size'] = df['size'] / df['size'].sum()

Related

New DataFrame column that contains IDs where value is outside bounds?

I have the following DataFrame :
data: Dict[str, list[int]] = {
"x1": [5 , 6, 7, 8, 9],
"min1": [3 , 3, 3, 3, 3],
"max1": [8, 8, 8, 8, 8],
"x2": [0 , 1, 2, 3, 4],
"min2": [2 , 2, 2, 2, 2],
"max2": [7, 7, 7, 7, 7],
"x3": [7 , 6, 7, 6, 7],
"min3": [1 , 1, 1, 1, 1],
"max3": [6, 6, 6, 6, 6],
}
n: int = 3 # number of xi
df: pd.DataFrame = pd.DataFrame(data=data)
print(df)
Output
x1 min1 max1 x2 min2 max2 x3 min3 max3
0 5 3 8 0 2 7 7 1 6
1 6 3 8 1 2 7 6 1 6
2 7 3 8 2 2 7 7 1 6
3 8 3 8 3 2 7 6 1 6
4 9 3 8 4 2 7 7 1 6
I would like to add a new column alert to df that contains the IDs i where xi < mini or xi > maxi.
Expected result
x1 min1 max1 x2 min2 max2 x3 min3 max3 alert
0 5 3 8 0 2 7 7 1 6 "2,3"
1 6 3 8 1 2 7 6 1 6 "2"
2 7 3 8 2 2 7 7 1 6 "3"
3 8 3 8 3 2 7 6 1 6 ""
4 9 3 8 4 2 7 7 1 6 "1,3"
I looked at this answer but could not understand how to apply it to my problem.
Below is my working implementation that I wish to improve.
def f(row: pd.Series) -> str:
alert: str = ""
for k in range(1, n+1):
if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
alert += f"{k}"
return ",".join(list(alert))
df["alert"] = df.apply(f, axis=1)

Actually given your output as strings, your approach isn't too bad. I would just suggest making alert a list, not a string:
def f(row: pd.Series) -> str:
alert: list = []
for k in range(1, n+1):
if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
alert.append(f"{k}")
return ",".join(alert)
In a bit fancy way, you can do:
xs = df.filter(regex='^x')
mins = df.filter(like='min').to_numpy()
maxes = df.filter(like='max').to_numpy()
mask = (xs < mins) | (xs > maxes)
df['alert'] = ( mask # xs.columns.str.replace('x',',')).str.replace('^,','')

We can groupby to dataframe along columns according to integer it contains
df['alert'] = (df.groupby(df.columns.str.extract('(\d+)$')[0].tolist(), axis=1)
.apply(lambda g: g[f'x{g.name}'].le(g[f'min{g.name}']) | g[f'x{g.name}'].gt(g[f'max{g.name}']))
.apply(lambda row: ','.join(row.index[row]), axis=1))
print(df)
x1 min1 max1 x2 min2 max2 x3 min3 max3 alert
0 5 3 8 0 2 7 7 1 6 2,3
1 6 3 8 1 2 7 6 1 6 2
2 7 3 8 2 2 7 7 1 6 2,3
3 8 3 8 3 2 7 6 1 6
4 9 3 8 4 2 7 7 1 6 1,3
Intermediate result
(df.groupby(df.columns.str.extract('(\d+)$')[0].tolist(), axis=1)
.apply(lambda g: g[f'x{g.name}'].le(g[f'min{g.name}']) | g[f'x{g.name}'].gt(g[f'max{g.name}'])))
1 2 3
0 False True True
1 False True False
2 False True True
3 False False False
4 True False True

Using pandas:
a = (pd.wide_to_long(df.reset_index(), ['x', 'min', 'max'],'index', 'alert')
.loc[lambda x: x['x'].lt(x['min']) | x['x'].gt(x['max'])]
.reset_index()
.groupby('index')['alert'].agg(lambda x: ','.join(x.astype(str))))
df.join(a)
x1 min1 max1 x2 min2 max2 x3 min3 max3 alert
0 5 3 8 0 2 7 7 1 6 2,3
1 6 3 8 1 2 7 6 1 6 2
2 7 3 8 2 2 7 7 1 6 3
3 8 3 8 3 2 7 6 1 6 NaN
4 9 3 8 4 2 7 7 1 6 1,3

how to change rows to column in python

I want to convert my dataframe rows to column and take last value of last column.
here is my dataframe
df=pd.DataFrame({'flag_1':[1,2,3,1,2,500],'dd':[1,1,1,7,7,8],'x':[1,1,1,7,7,8]})
print(df)
flag_1 dd x
0 1 1 1
1 2 1 1
2 3 1 1
3 1 7 7
4 2 7 7
5 500 8 8
df_out:
1 2 3 1 2 500 1 1 1 7 7 8 8

Assuming you want a list as output, you can mask the initial values of the list column and stack:
import numpy as np
out = (df
.assign(**{df.columns[-1]: np.r_[[pd.NA]*(len(df)-1),[df.iloc[-1,-1]]]})
.T.stack().to_list()
)
Output:
[1, 2, 3, 1, 2, 500, 1, 1, 1, 7, 7, 8, 8]
For a wide dataframe with a single row, use .to_frame().T in place of to_list() (here with a MultiIndex):
flag_1 dd x
0 1 2 3 4 5 0 1 2 3 4 5 5
0 1 2 3 1 2 500 1 1 1 7 7 8 8

Pandas grouping with loops

is there a way to group a dataframe (*csv file) in these ways?
For example, I want to select blocks of ten rows for the first column to average and then I would like to do the same for the second column but not keeping blocks, but grouping every 10th row.
For ex. I want the average of:
1 1 3rd 4th
1 2 .. ..
1 3 .. ..
..
1 9 .. ..
1 10 .. ..
2 1 .. ..
2 2 .. ..
2 3 .. ..
So selecting the first chunk of the 1st column to calculate an average, and then every x rows for the second column.
For example, from a df like this one...
241888 1 1
241888 2 1
241888 3 2
241888 4 2
241888 5 3
241888 6 3
241888 7 4
241888 8 4
241888 9 5
241888 10 5
665309 1 3
665309 2 3
665309 3 4
665309 4 4
665309 5 5
665309 6 5
665309 7 6
665309 8 6
665309 9 7
665309 10 7
and then
df.groupby('24188').mean()[3]
df.groupby('665309').mean()[3]
df.groupby('1' of the 2nd column).mean()[3]
df.groupby('10' of the 2nd column).mean()[3]
giving 3, 5 2 and 6

Sorry, if I did not get you properly. do you want this?
import pandas as pd
import numpy as np
df = pd.DataFrame({'a' : [1, 2, 3 , 4, 5 ,6 , 7],
'b': [11, 12, 13 , 14, 15 ,16 , 17],
'c': [21, 22, 23 , 24, 25 ,26 , 27] } )
print(df)
print("sum of 1 st column with 2nd rows(i.e. 2,4,6):")
print(df.iloc[[x for x in df.index if (x+1) % 2 == 0],0].sum())
print("sum of 3rd column with 3rd rows(i.e. 22,36):")
print(df.iloc[[x for x in df.index if (x+1) % 3 == 0],2].sum())
output:
a b c
0 1 11 21
1 2 12 22
2 3 13 23
3 4 14 24
4 5 15 25
5 6 16 26
6 7 17 27
sum of 1 st column with 2nd rows(i.e. 2,4,6):
12
sum of 3rd column with 3rd rows(i.e. 22,36):
49

Need to create incremental series number using python

I need to create a incremental series for a given value of dataframe in python.
Any help much appreciated
Suppose I have dataframe column
df['quadrant']
Out[6]:
0 4
1 4
2 4
3 3
4 3
5 3
6 2
7 2
8 2
9 1
10 1
11 1
I want to create a new column such that
index quadrant new value
0 4 1
1 4 5
2 4 9
3 3 2
4 3 6
5 3 10
6 2 3
7 2 7
8 2 11
9 1 4
10 1 8
11 1 12

Using Numpy, you can create the array as:
import numpy as np
def value(q, k=1):
diff_quadrant = np.diff(q)
j = 0
ramp = []
for i in np.where(diff_quadrant != 0)[0]:
ramp.extend(list(range(i-j+1)))
j = i+1
ramp.extend(list(range(len(quadrant)-j)))
ramp = np.array(ramp) * k # sawtooth-shaped array
a = np.ones([len(quadrant)], dtype = np.int)*5
return a - q + ramp
quadrant = np.array([3, 3, 3, 3, 4, 4, 4, 2, 2, 1, 1, 1])
b = value(quadrant, 4)
# [ 2 6 10 14 1 5 9 3 7 4 8 12]

Expanding/Duplicating dataframe rows based on condition

I am an R user who has recently started using Python 3 for data management. I am struggling with a way to expand/duplicate data frame rows based on a condition. I also need to be able to expand rows in a variable way. I'll illustrate with this example.
I have this data:
df = pd.DataFrame([[1, 10], [1,15], [2,10], [2, 15], [2, 20], [3, 10], [3, 15]], columns = ['id', 'var'])
df
Out[6]:
id var
0 1 10
1 1 15
2 2 10
3 2 15
4 2 20
5 3 10
6 3 15
I would like to expand rows for both ID == 1 and ID == 3. I would also like to expand each ID == 1 row by 1 duplicate each, and I would like to expand each ID == 3 row by 2 duplicates each. The result would look like this:
df2
Out[8]:
id var
0 1 10
1 1 10
2 1 15
3 1 15
4 2 10
5 2 15
6 2 20
7 3 10
8 3 10
9 3 10
10 3 15
11 3 15
12 3 15
13 3 15
I've been trying to use np.repeat, but I am failing to think of a way that I can use both ID condition and variable duplication numbers at the same time. Index ordering does not matter here, only that the rows are duplicated appropriately. I apologize in advance if this is an easy question. Thanks in advance for any help and feel free to ask clarifying questions.

This should do it:
dup = {1: 1, 3:2} #what value and how much to add
res = df.copy()
for k, v in dup.items():
for i in range(v):
res = res.append(df.loc[df['id']==k], ignore_index=True)
res.sort_values(['id', 'var'], inplace=True)
res.reset_index(inplace=True, drop=True)
res
# id var
#0 1 10
#1 1 10
#2 1 15
#3 1 15
#4 2 10
#5 2 15
#6 2 20
#7 3 10
#8 3 10
#9 3 10
#10 3 15
#11 3 15
#12 3 15
P.S. your desired solution had 7 values for id 3 while your description implies 6 values.

I think below code gets your job done:
df_1=df.loc[df.id==1]
df_3=df.loc[df.id==3]
df1=df.append([df_1]*1,ignore_index=True)
df1.append([df_3]*2,ignore_index=True).sort_values(by='id')
id var
0 1 10
1 1 15
7 1 10
8 1 15
2 2 10
3 2 15
4 2 20
5 3 10
6 3 15
9 3 10
10 3 15
11 3 10
12 3 15

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

pd.Series(pred).value_counts() how to get the first column in dataframe? - python-3.x

You are very nearly there. Typing this in the blind as you didn't provide a sample input: df = pd.Series(pred).value_counts().to_frame().reset_index() df.columns = ['class', 'size'] df['relative_size'] = df['size'] / df['size'].sum()

Related

New DataFrame column that contains IDs where value is outside bounds?

how to change rows to column in python

Pandas grouping with loops

Need to create incremental series number using python

Expanding/Duplicating dataframe rows based on condition

Categories

Resources