how to change rows to column in python - python-3.x

I want to convert my dataframe rows to column and take last value of last column.
here is my dataframe
df=pd.DataFrame({'flag_1':[1,2,3,1,2,500],'dd':[1,1,1,7,7,8],'x':[1,1,1,7,7,8]})
print(df)
flag_1 dd x
0 1 1 1
1 2 1 1
2 3 1 1
3 1 7 7
4 2 7 7
5 500 8 8
df_out:
1 2 3 1 2 500 1 1 1 7 7 8 8

Assuming you want a list as output, you can mask the initial values of the list column and stack:
import numpy as np
out = (df
.assign(**{df.columns[-1]: np.r_[[pd.NA]*(len(df)-1),[df.iloc[-1,-1]]]})
.T.stack().to_list()
)
Output:
[1, 2, 3, 1, 2, 500, 1, 1, 1, 7, 7, 8, 8]
For a wide dataframe with a single row, use .to_frame().T in place of to_list() (here with a MultiIndex):
flag_1 dd x
0 1 2 3 4 5 0 1 2 3 4 5 5
0 1 2 3 1 2 500 1 1 1 7 7 8 8

Related

New DataFrame column that contains IDs where value is outside bounds?

I have the following DataFrame :
data: Dict[str, list[int]] = {
"x1": [5 , 6, 7, 8, 9],
"min1": [3 , 3, 3, 3, 3],
"max1": [8, 8, 8, 8, 8],
"x2": [0 , 1, 2, 3, 4],
"min2": [2 , 2, 2, 2, 2],
"max2": [7, 7, 7, 7, 7],
"x3": [7 , 6, 7, 6, 7],
"min3": [1 , 1, 1, 1, 1],
"max3": [6, 6, 6, 6, 6],
}
n: int = 3 # number of xi
df: pd.DataFrame = pd.DataFrame(data=data)
print(df)
Output
x1 min1 max1 x2 min2 max2 x3 min3 max3
0 5 3 8 0 2 7 7 1 6
1 6 3 8 1 2 7 6 1 6
2 7 3 8 2 2 7 7 1 6
3 8 3 8 3 2 7 6 1 6
4 9 3 8 4 2 7 7 1 6
I would like to add a new column alert to df that contains the IDs i where xi < mini or xi > maxi.
Expected result
x1 min1 max1 x2 min2 max2 x3 min3 max3 alert
0 5 3 8 0 2 7 7 1 6 "2,3"
1 6 3 8 1 2 7 6 1 6 "2"
2 7 3 8 2 2 7 7 1 6 "3"
3 8 3 8 3 2 7 6 1 6 ""
4 9 3 8 4 2 7 7 1 6 "1,3"
I looked at this answer but could not understand how to apply it to my problem.
Below is my working implementation that I wish to improve.
def f(row: pd.Series) -> str:
alert: str = ""
for k in range(1, n+1):
if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
alert += f"{k}"
return ",".join(list(alert))
df["alert"] = df.apply(f, axis=1)
Actually given your output as strings, your approach isn't too bad. I would just suggest making alert a list, not a string:
def f(row: pd.Series) -> str:
alert: list = []
for k in range(1, n+1):
if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
alert.append(f"{k}")
return ",".join(alert)
In a bit fancy way, you can do:
xs = df.filter(regex='^x')
mins = df.filter(like='min').to_numpy()
maxes = df.filter(like='max').to_numpy()
mask = (xs < mins) | (xs > maxes)
df['alert'] = ( mask # xs.columns.str.replace('x',',')).str.replace('^,','')
We can groupby to dataframe along columns according to integer it contains
df['alert'] = (df.groupby(df.columns.str.extract('(\d+)$')[0].tolist(), axis=1)
.apply(lambda g: g[f'x{g.name}'].le(g[f'min{g.name}']) | g[f'x{g.name}'].gt(g[f'max{g.name}']))
.apply(lambda row: ','.join(row.index[row]), axis=1))
print(df)
x1 min1 max1 x2 min2 max2 x3 min3 max3 alert
0 5 3 8 0 2 7 7 1 6 2,3
1 6 3 8 1 2 7 6 1 6 2
2 7 3 8 2 2 7 7 1 6 2,3
3 8 3 8 3 2 7 6 1 6
4 9 3 8 4 2 7 7 1 6 1,3
Intermediate result
(df.groupby(df.columns.str.extract('(\d+)$')[0].tolist(), axis=1)
.apply(lambda g: g[f'x{g.name}'].le(g[f'min{g.name}']) | g[f'x{g.name}'].gt(g[f'max{g.name}'])))
1 2 3
0 False True True
1 False True False
2 False True True
3 False False False
4 True False True
Using pandas:
a = (pd.wide_to_long(df.reset_index(), ['x', 'min', 'max'],'index', 'alert')
.loc[lambda x: x['x'].lt(x['min']) | x['x'].gt(x['max'])]
.reset_index()
.groupby('index')['alert'].agg(lambda x: ','.join(x.astype(str))))
df.join(a)
x1 min1 max1 x2 min2 max2 x3 min3 max3 alert
0 5 3 8 0 2 7 7 1 6 2,3
1 6 3 8 1 2 7 6 1 6 2
2 7 3 8 2 2 7 7 1 6 3
3 8 3 8 3 2 7 6 1 6 NaN
4 9 3 8 4 2 7 7 1 6 1,3

count Total rows of an Id from another column

I have a dataframe
Intialise data of lists.
data = {'Id':['1', '2', '3', '4','5','6','7','8','9','10'], 'reply_id':[2, 2,2, 5,5,6,8,8,1,1]}
Create DataFrame
df = pd.DataFrame(data)
Id reply_id
0 1 2
1 2 2
2 3 2
3 4 5
4 5 5
5 6 6
6 7 8
7 8 8
8 9 1
9 10 1
I want to get total of reply_id in new for every Id.
Id=1 have 2 time occurrence in reply_id which i want in new column new
Desired output
Id reply_id new
0 1 2 2
1 2 2 3
2 3 2 0
3 4 5 0
4 5 5 2
5 6 6 1
6 7 8 0
7 8 8 2
8 9 1 0
9 10 1 0
I have done this line of code.
df['new'] = df.reply_id.eq(df.Id).astype(int).groupby(df.Id).transform('sum')
In this answer, I used Series.value_counts to count values in reply_id, and converted the result to a dict. Then, I used Series.map on the Id column to associate counts to Id. fillna(0) is used to fill values not present in reply_id
df['new'] = (df['Id']
.astype(int)
.map(df['reply_id'].value_counts().to_dict())
.fillna(0)
.astype(int))
Use, Series.groupby on the column reply_id, then use the aggregation function GroupBy.count to create a mapping series counts, finally use Series.map to map the values in Id column with their respective counts:
counts = df['reply_id'].groupby(df['reply_id']).count()
df['new'] = df['Id'].map(counts).fillna(0).astype(int)
Result:
# print(df)
Id reply_id new
0 1 2 2
1 2 2 3
2 3 2 0
3 4 5 0
4 5 5 2
5 6 6 1
6 7 8 0
7 8 8 2
8 9 1 0
9 10 1 0

pd.Series(pred).value_counts() how to get the first column in dataframe?

I apply pd.Series(pred).value_counts() and get this output:
0 2084
-1 15
1 13
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
dtype: int64
When I create a list I get only the second column:
c_list=list(pd.Series(pred).value_counts()), Out:
[2084, 15, 13, 10, 7, 4, 3, 3, 3, 2, 2, 2, 2]
How do I get ultimately a dataframe that looks like this including a new column for size% of total size?
df=
[class , size ,relative_size]
0 2084 , x%
-1 15 , y%
1 13 , etc.
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
You are very nearly there. Typing this in the blind as you didn't provide a sample input:
df = pd.Series(pred).value_counts().to_frame().reset_index()
df.columns = ['class', 'size']
df['relative_size'] = df['size'] / df['size'].sum()

How can i calculate population in pandas?

I have a data set like this:-
S.No.,Year of birth,year of death
1, 1, 5
2, 3, 6
3, 2, -
4, 5, 7
I need to calculate population on till that years let say:-
year,population
1 1
2 2
3 3
4 3
5 4
6 3
7 2
8 1
How can i solve it in pandas?
Since i am not good in pandas.
Any help would be appreciate.
First is necessary choose maximum year of year of death if not exist, in solution is used 8.
Then convert values of year of death to numeric and replace missing values by this year. In first solution is used difference between birth and death column with Index.repeat with GroupBy.cumcount, for count is used Series.value_counts:
#if need working with years
#today_year = pd.to_datetime('now').year
today_year = 8
df['year of death'] = pd.to_numeric(df['year of death'], errors='coerce').fillna(today_year)
df = df.loc[df.index.repeat(df['year of death'].add(1).sub(df['Year of birth']).astype(int))]
df['Year of birth'] += df.groupby(level=0).cumcount()
df1 = (df['Year of birth'].value_counts()
.sort_index()
.rename_axis('year')
.reset_index(name='population'))
print (df1)
year population
0 1 1
1 2 2
2 3 3
3 4 3
4 5 4
5 6 3
6 7 2
7 8 1
Another solution use list comprehension with range for repeat years:
#if need working with years
#today_year = pd.to_datetime('now').year
today_year = 8
s = pd.to_numeric(df['year of death'], errors='coerce').fillna(today_year)
L = [x for s, e in zip(df['Year of birth'], s) for x in range(s, e + 1)]
df1 = (pd.Series(L).value_counts()
.sort_index()
.rename_axis('year')
.reset_index(name='population'))
print (df1)
year population
0 1 1
1 2 2
2 3 3
3 4 3
4 5 4
5 6 3
6 7 2
7 8 1
Similar like before, only is used Counter for dictionary for final DataFrame:
from collections import Counter
#if need working with years
#today_year = pd.to_datetime('now').year
today_year = 8
s = pd.to_numeric(df['year of death'], errors='coerce').fillna(today_year)
d = Counter([x for s, e in zip(df['Year of birth'], s) for x in range(s, e + 1)])
print (d)
Counter({5: 4, 3: 3, 4: 3, 6: 3, 2: 2, 7: 2, 1: 1, 8: 1})
df1 = pd.DataFrame({'year':list(d.keys()),
'population':list(d.values())})
print (df1)
year population
0 1 1
1 2 2
2 3 3
3 4 3
4 5 4
5 6 3
6 7 2
7 8 1

Need to create incremental series number using python

I need to create a incremental series for a given value of dataframe in python.
Any help much appreciated
Suppose I have dataframe column
df['quadrant']
Out[6]:
0 4
1 4
2 4
3 3
4 3
5 3
6 2
7 2
8 2
9 1
10 1
11 1
I want to create a new column such that
index quadrant new value
0 4 1
1 4 5
2 4 9
3 3 2
4 3 6
5 3 10
6 2 3
7 2 7
8 2 11
9 1 4
10 1 8
11 1 12
Using Numpy, you can create the array as:
import numpy as np
def value(q, k=1):
diff_quadrant = np.diff(q)
j = 0
ramp = []
for i in np.where(diff_quadrant != 0)[0]:
ramp.extend(list(range(i-j+1)))
j = i+1
ramp.extend(list(range(len(quadrant)-j)))
ramp = np.array(ramp) * k # sawtooth-shaped array
a = np.ones([len(quadrant)], dtype = np.int)*5
return a - q + ramp
quadrant = np.array([3, 3, 3, 3, 4, 4, 4, 2, 2, 1, 1, 1])
b = value(quadrant, 4)
# [ 2 6 10 14 1 5 9 3 7 4 8 12]

Resources