How to add text element to series data in Python - python-3.x

I have a series data in python defined as:
scores_data = (pd.Series([F1[0], auc, ACC[0], FPR[0], FNR[0], TPR[0], TNR[0]])).round(4)
I want to append the text 'Featues' at location 0 to the series data.
I tried scores_data.loc[0] but that replaced the data at location 0.
Thanks for your help.

You can't directly insert a value in a Series (like you could in a DataFrame with insert).
You can use concat:
s = pd.Series([1,2,3,4])
s2 = pd.concat([pd.Series([0], index=[-1]), s])
output:
-1 0
0 1
1 2
2 3
3 4
dtype: int64
Or create a new Series from the values:
pd.Series([0]+s.to_list())
output:
0 0
1 1
2 2
3 3
4 4
dtype: int64

Related

Convert string column to int pandas DataFrame

I have a Dataframe that has a column with unique string column. like below:
id customerId ...
1 vqUkxUDuEmB7gHWQvcYrBn
2 KaLEhwzZxCQ7GjPmVwBVav
3 pybDYgTiCUv3Pv3WLgxKCM
4 zqPiDV33KwrMBZoyeQXMJW
5 CR8z3ThPyzBKXFqqzemQAS
.
I want to replace customerIDs to int by a method like
# replace dataFrame.customerId[from start to end]
dataFrame.customerId.replace(sum(map(ord, ???)))
How can i do that?
Given something like
import pandas as pd
df = pd.DataFrame(columns=['UID'], index=range(7))
df.iloc[0,0] = 'vqUkxUDuEmB7gHWQvcYrBn'
df.iloc[1,0] = 'KaLEhwzZxCQ7GjPmVwBVav'
df.iloc[2,0] = 'pybDYgTiCUv3Pv3WLgxKCM'
df.iloc[3,0] = 'zqPiDV33KwrMBZoyeQXMJW'
df.iloc[4,0] = 'CR8z3ThPyzBKXFqqzemQAS'
df.iloc[5,0] = 'zqPiDV33KwrMBZoyeQXMJW' # equal to 3
df.iloc[6,0] = 'vqUkxUDuEmB7gHWQvcYrBn' # equal to 0
PS: I added 2 UIDs equal to previous ones to see that they'll be correctly categorized
you can use a categorical type
df['UID_categorical'] = df.UID.astype('category').cat.codes
output
UID UID_categorical
0 vqUkxUDuEmB7gHWQvcYrBn 3
1 KaLEhwzZxCQ7GjPmVwBVav 1
2 pybDYgTiCUv3Pv3WLgxKCM 2
3 zqPiDV33KwrMBZoyeQXMJW 4
4 CR8z3ThPyzBKXFqqzemQAS 0
5 zqPiDV33KwrMBZoyeQXMJW 4
6 vqUkxUDuEmB7gHWQvcYrBn 3
where UID_categorical is int
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 UID 7 non-null object
1 UID_categorical 7 non-null int8
dtypes: int8(1), object(1)
memory usage: 191.0+ bytes
If you want to replace just do
df['UID'] = df.UID.astype('category').cat.codes

Calculation using shifting is not working in a for loop

The problem consist on calculate from a dataframe the column "accumulated" using the columns "accumulated" and "weekly". The formula to do this is accumulated in t = weekly in t + accumulated in t-1
The desired result should be:
weekly accumulated
2 0
1 1
4 5
2 7
The result I'm obtaining is:
weekly accumulated
2 0
1 1
4 4
2 2
What I have tried is:
for key, value in df_dic.items():
df_aux = df_dic[key]
df_aux['accumulated'] = 0
df_aux['accumulated'] = (df_aux.weekly + df_aux.accumulated.shift(1))
#df_aux["accumulated"] = df_aux.iloc[:,2] + df_aux.iloc[:,3].shift(1)
df_aux.iloc[0,3] = 0 #I put this because I want to force the first cell to be 0.
Being df_aux.iloc[0,3] the first row of the column "accumulated".
What I´m doing wrong?
Thank you
EDIT: df_dic is a dictionary with 5 dataframes. df_dic is seen as {0: df1, 1:df2, 2:df3}. All the dataframes have the same size and same columns names. So i do the for loop to do the same calculation in every dataframe inside the dictionary.
EDIT2 : I'm trying doing the computation outside the for loop and is not working.
What im doing is:
df_auxp = df_dic[0]
df_auxp['accumulated'] = 0
df_auxp['accumulated'] = df_auxp["weekly"] + df_auxp["accumulated"].shift(1)
df_auxp.iloc[0,3] = df_auxp.iloc[0,3].fillna(0)
Maybe have something to do with the dictionary interaction...
To solve for 3 dataframes
import pandas as pd
df1 = pd.DataFrame({'weekly':[2,1,4,2]})
df2 = pd.DataFrame({'weekly':[3,2,5,3]})
df3 = pd.DataFrame({'weekly':[4,3,6,4]})
print (df1)
print (df2)
print (df3)
for d in [df1,df2,df3]:
d['accumulated'] = d['weekly'].cumsum() - d.iloc[0,0]
print (d)
The output of this will be as follows:
Original dataframes:
df1
weekly
0 2
1 1
2 4
3 2
df2
weekly
0 3
1 2
2 5
3 3
df3
weekly
0 4
1 3
2 6
3 4
Updated dataframes:
df1:
weekly accumulated
0 2 0
1 1 1
2 4 5
3 2 7
df2:
weekly accumulated
0 3 0
1 2 2
2 5 7
3 3 10
df3:
weekly accumulated
0 4 0
1 3 3
2 6 9
3 4 13
To solve for 1 dataframe
You need to use cumsum and then subtract the value from first row. That will give you the desired result. here's how to do it.
import pandas as pd
df = pd.DataFrame({'weekly':[2,1,4,2]})
print (df)
df['accumulated'] = df['weekly'].cumsum() - df.iloc[0,0]
print (df)
Original dataframe:
weekly
0 2
1 1
2 4
3 2
Updated dataframe:
weekly accumulated
0 2 0
1 1 1
2 4 5
3 2 7

Taking different records from groups using group by in pandas

Suppose I have dataframe like this
>>> df = pd.DataFrame({'id':[1,1,1,2,2,2,2,3,4],'value':[1,2,3,1,2,3,4,1,1]})
>>> df
id value
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
6 2 4
7 3 1
8 4 1
Now I want top all records from each group using group id except last 3. That means I want to drop last 3 records from all groups. How can I do it using pandas group_by. This is dummy data.
Use GroupBy.cumcount for counter from back by ascending=False and then compare by Series.gt for greater values like 2, because python count from 0:
df = df[df.groupby('id').cumcount(ascending=False).gt(2)]
print (df)
id value
3 2 1
Details:
print (df.groupby('id').cumcount(ascending=False))
0 2
1 1
2 0
3 3
4 2
5 1
6 0
7 0
8 0
dtype: int64

Highest frequency in a dataframe

I am looking for a way to get the highest frequency in the entire pandas, not in a particular column. I have looked at value count, but it seems that works in a column specific way. Any way to do that?
Use DataFrame.stack with Series.mode for top values, for first select by position:
df = pd.DataFrame({
'B':[4,5,4,5,4,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
})
a = df.stack().mode().iat[0]
print (a)
4
Or if need also frequency is possible use Series.value_counts:
s = df.stack().value_counts()
print (s)
4 6
5 4
3 3
9 2
7 2
2 2
1 2
8 1
6 1
0 1
dtype: int64
print (s.index[0])
4
print (s.iat[0])
6

Selective multiplication of a pandas dataframe

I have a pandas Dataframe and Series of the form
df = pd.DataFrame({'Key':[2345,2542,5436,2468,7463],
'Segment':[0] * 5,
'Values':[2,4,6,6,4]})
print (df)
Key Segment Values
0 2345 0 2
1 2542 0 4
2 5436 0 6
3 2468 0 6
4 7463 0 4
s = pd.Series([5436, 2345])
print (s)
0 5436
1 2345
dtype: int64
In the original df, I want to multiply the 3rd column(Values) by 7 except for the keys which are present in the series. So my final df should look like
What should be the best way to achieve this in Python 3.x?
Use DataFrame.loc with Series.isin for filter Value column with inverted condition for non membership with multiple by scalar:
df.loc[~df['Key'].isin(s), 'Values'] *= 7
print (df)
Key Segment Values
0 2345 0 2
1 2542 0 28
2 5436 0 6
3 2468 0 42
4 7463 0 28
Another method could be using numpy.where():
df['Values'] *= np.where(~df['Key'].isin([5436, 2345]), 7,1)

Resources