Data frame transformation using transposing and flatening - python-3.x

I have a data frame that looks like:
tdelta A B label
1 11 21 Lab1
2 24 45 Lab2
3 44 65 Lab3
4 77 22 Lab4
5 12 64 Lab5
6 39 09 Lab6
7 85 11 Lab7
8 01 45 Lab8
And I need to transform this dataset into:
For selected window: 4
A1 A2 A3 A4 B1 B2 B3 B4 L1 label
11 24 44 77 21 45 65 22 Lab1 Lab4
12 39 85 01 64 09 11 45 Lab5 Lab8
So based on the selected window - 'w', I need to transpose w rows with the first corresponding label as my X values and the corresponding last label as my Y value. here is what I have developed till now:
def data_process(data,window):
n=len(data)
A = pd.DataFrame(data['A'])
B = pd.DataFrame(data['B'])
lb = pd.DataFrame(data['lab'])
df_A = pd.concat([gsr.loc[i] for i in range(0,window)],axis=1).reset_index()
df_B = pd.concat([st.loc[i] for i in range(0,window)],axis=1).reset_index()
df_lb = pd.concat([lb.loc[0],axis=1).reset_index()
X = pd.concat([df_A,df_B,df_lab],axis=1)
Y = pd.DataFrame(data['lab']).shift(-window)
return X, Y
I think this works for only the first 'window' rows. I need it to work for my entire dataframe.

This is essentially a pivot, with a lot of cleaning up after the pivot. For the pivot to work we need to use integer and modulus division so that we can group the rows into windows of length w and figure out which column they then belong to.
# Number of rows to group together
w = 4
df['col'] = np.arange(len(df))%w + 1
df['i'] = np.arange(len(df))//w
# Reshape and flatten the MultiIndex
df = (df.drop(columns='tdelta')
.pivot(index='i', columns='col')
.rename_axis(index=None))
df.columns = [f'{x}{y}'for x,y in df.columns]
# Define these columns and remove the intermediate label columns.
df['L1'] = df['label1']
df['label'] = df[f'label{w}']
df = df.drop(columns=[f'label{i}' for i in range(1, w+1)])
print(df)
A1 A2 A3 A4 B1 B2 B3 B4 L1 label
0 11 24 44 77 21 45 65 22 Lab1 Lab4
1 12 39 85 1 64 9 11 45 Lab5 Lab8

Related

Creating a list from series of pandas

Click here for the imageI m trying to create a list from 3 different series which will be of the shape "({A} {B} {C})" where A denotes the 1st element from series 1, B is for 1st element from series 2, C is for 1st element from series 3 and this way it should create a list containing 600 element.
List 1 List 2 List 3
u_p0 1 v_p0 2 w_p0 7
u_p1 21 v_p1 11 w_p1 45
u_p2 32 v_p2 25 w_p2 32
u_p3 45 v_p3 76 w_p3 49
... .... ....
u_p599 56 v_p599 78 w_599 98
Now I want the output list as follows
(1 2 7)
(21 11 45)
(32 25 32)
(45 76 49)
.....
These are the 3 series I created from a dataframe
r1=turb_1.iloc[qw1] #List1
r2=turb_1.iloc[qw2] #List2
r3=turb_1.iloc[qw3] #List3
Pic of the seriesFor the output I think formatted string python method will be useful but I m quite not sure how to proceed.
turb_3= ["({A} {B} {C})".format(A=i,B=j,C=k) for i in r1 for j in r2 for k in r3]
Any kind of help will be useful.
Use pandas.DataFrame.itertuples with str.format:
# Sample data
print(df)
col1 col2 col3
0 1 2 7
1 21 11 45
2 32 25 32
3 45 76 49
fmt = "({} {} {})"
[fmt.format(*tup) for tup in df[["col1", "col2", "col3"]].itertuples(False, None)]
Output:
['(1 2 7)', '(21 11 45)', '(32 25 32)', '(45 76 49)']

Check each value in a dataframe and if that is less means then change the value which is given

cateory Percentage
AB 99
CD 65
EF 12
GH 25
IJ 90
KL 100
If CD's percentage is less than 70 then change that as 71 else existing value is fine
If EF's percentage is less than 20 then change that as 21 else existing value is fine
If GH's percentage is less than 30 then change that as 45 else existing value is fine
For AB existing value is fine
Output
cateory Percentage
AB 99
CD 65
EF 21
GH 45
IJ 90
KL 100
Create list of tuples for replacement by compare both columns and if match replace by new value - last value of tuple:
L = [('CD', 70, 71), ('EF', 20, 21), ('GH', 30, 45)]
for cat, less, new in L:
m = df['cateory'].eq(cat) & df['Percentage'].lt(less)
df.loc[m, 'Percentage'] = new
print (df)
cateory Percentage
0 AB 99
1 CD 71
2 EF 21
3 GH 45
4 IJ 90
5 KL 100

Add columns - specify name - row value based on other

I have a Dataframe like this:
data = {'TYPE':['X', 'Y', 'Z'],'A': [11,12,13], 'B':[21,22,23], 'C':[31,32,34]}
df = pd.DataFrame(data)
TYPE A B C
0 X 11 21 31
1 Y 12 22 32
2 Z 13 23 34
I like to get the following DataFrame:
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 40
For each column (next to TYPE column), here A,B,C:
add a new column with the name column_name_added
if TYPE = X add 4, if TYPE = Y add 6, if Z add 7
Idea is multiple values by helper Series created by Series.map with dictionary with DataFrame.add, add to original by DataFrame.join and last change order of columns by DataFrame.reindex:
d = {'X':4,'Y':6, 'Z':7}
cols = df.columns[:1].tolist() + [i for x in df.columns[1:] for i in (x, x + '_added')]
df1 = df.iloc[:, 1:].add(df['TYPE'].map(d), axis=0, fill_value=0).add_suffix('_added')
df2 = df.join(df1).reindex(cols, axis=1)
print (df2)
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 41
EDIT: For values not matched dictionary are created missing values, so if add Series.fillna it return value 7 for all another values:
d = {'X':4,'Y':6}
cols = df.columns[:1].tolist() + [i for x in df.columns[1:] for i in (x, x + '_added')]
df1 = df.iloc[:, 1:].add(df['TYPE'].map(d).fillna(7).astype(int), axis=0).add_suffix('_added')
df2 = df.join(df1).reindex(cols, axis=1)
print (df2)
TYPE A A_added B B_added C C_added
0 X 11 15 21 25 31 35
1 Y 12 18 22 28 32 38
2 Z 13 20 23 30 34 41

A vectorized solution producing a new column in DataFrame that depends on conditions of existing columns and also the new column itself

My current dataframe data is as follows:
df=pd.DataFrame([[1.4,3.5,4.6],[2.8,5.4,6.4],[7.8,6.5,5.8]],columns=['t','i','m'])
t i m
0 14 35 46
1 28 54 64
2 28 34 64
3 78 65 58
My goal is to apply a vectorized operations on a df with a conditions as follows (pseudo code):
New column of answer starts with value of 1.
For row in df.itertuples():
if (m > i) & (answer in row-1 is an odd number):
answer in row = answer in row-1 + m
elif (m > i):
answer in row = answer in row-1 - m
else:
answer in row = answer in row-1
The desired output is as follows:
t i m answer
0 14 35 46 1
1 28 54 59 60
2 78 12 58 2
3 78 91 48 2
Any elegant solution would be appreciated.

Return a Value from the first Column matching Max()

I have a table:
A B C D
2 10 70 45
2 20 80 55
3 30 90 65
3 40 15 76
4 50 25 85
4 60 35 95
I want to get the maximum from the array B1:D6 which is 95 and return
the value of the first column A which is 4
Insert a new column before column A and put the following formula in cell A2 and drag down:
=MAX(C2:E2)
Then put the following formula in cell H2:
=VLOOKUP(MAX(A2:A7), A2:B7, 2, FALSE)
Result:

Resources