adding reversed columns to dataframe [duplicate] - python-3.x

This question already has an answer here:
Reversing the order of values in a single column of a Dataframe
(1 answer)
Closed 1 year ago.
Trying to add a reversed column to a data frame, but it just adds in normal order. For me, it looks like it is just following the index of the dataframe. Is it possible to reorder the index?
df_reversed = df['Buy'].iloc[::-1]
Data["newColumn"] = df_reversed
Image of the output
Image of df_reversed
This is how I want the output to be

A slight modification from #Chicodelarose, you can reverse just the values and get the result you want as follows:
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
df["calories_reversed"] = df["calories"].values[::-1]
print(df)
Output will be:
calories duration
0 420 50
1 380 40
2 390 45
calories duration calories_reversed
0 420 50 390
1 380 40 380
2 390 45 420

You need to call reset_index before assigning the values to the new column so that they are added to the data frame in reverse order:
Example:
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
df["calories_reversed"] = df["calories"][::-1].reset_index(drop=True)
print(df)
Output:
calories duration
0 420 50
1 380 40
2 390 45
calories duration calories_reversed
0 420 50 390
1 380 40 380
2 390 45 420

Related

Change the structure of column name

I have the column as
id_no| 2021-05-19 00:00:00 | 2021-05-20 00:00:00 | decider
100 20 20 878
200 64 38 917
here idno is the index and the rest are columns
I want the outupt as
id_no| 2021-05-19 | 2021-05-20 | decider
100 20 20 878
200 64 38 917
I tried converting the column names but just column name is not getting changed and column names are in datetime format except the population column. I tried below code
for (columnName, columnData) in df.iteritems():
columnName = pd.to_datetime(columnName)
We can try str slice when other column length are not greater than 10
df.columns = df.columns.astype(str).str[:10]
df
Out[356]:
id_no 2021-05-19 2021-05-20 decider
0 100 20 20 878
1 200 64 38 917
Changing a loop variable changes only... the loop variable, not the column name! You must create a list of strings representing the new column names, and make it the new column index:
new_columns = [df.columns[0]] + \
pd.to_datetime(df.columns[1:-1]).astype(str).tolist() +\
[df.columns[-1]]
df.columns = new_columns
You can just assign a list of names to the columns attribute of your df.
data = {'id_no': {0: 100, 1: 200},
'2021-05-19 00:00:00': {0: 20, 1: 64},
'2021-05-20 00:00:00': {0: 20, 1: 38},
'decider': {0: 878, 1: 917}}
df = pd.DataFrame(data)
df.columns = ['id_no', '2021-05-19', '2021-05-20', 'decider'] # simple solution
# edit, you can use a list comprehension with conditional
df.columns = [str(x)[0:10] if x[0] == '2' else x for x in df.columns]
Output:
id_no 2021-05-19 2021-05-20 decider
0 100 20 20 878
1 200 64 38 917

In pandas dataframe, how to make one column act on all the others?

Consider the small following dataframe:
import pandas as pd
value1 = [15, 20, 50, 70]
value2 = [15, 80, 45, 30]
base = [175, 150, 200, 125]
df = pd.DataFrame({"val1": value1, "val2": value2, "base": base})
df
val1 val2 base
0 15 15 175
1 20 80 150
2 50 45 200
3 70 30 125
Actually, there are much more rows and much more val*** columns...
I would like to express the figures given in the columns val*** as percent of their corresponding base (in the same row); as an example, 70 (last in val1) should become (70/125)*100, (which is 56), or 30 (last in val2) should become (30/125)*100 (which is 28) ; and so on for every figure.
I am sure the solution lies in a correct use of assign or apply and lambda, but I can't find how to do it ...
We can filter the val like columns then divide these columns by the base column along axis=0 followed by multiplication with 100 to calculate the percentage
df.filter(like='val').div(df['base'], axis=0).mul(100).add_suffix('%')
val1% val2%
0 8.571429 8.571429
1 13.333333 53.333333
2 25.000000 22.500000
3 56.000000 24.000000

Python Pandas DF Pivot and Groupby

I need to iterate through my dataframe rows and pivot the single column bounding_box_y into 8 columns each time the value in text_y column changes.
original data frame
desired data frame
Can anyone help with some code that does NOT hardcode values into the code? The entire dataframe is over 6000 rows. I need to pivot the one column into 8 each time the value in another column changes.
Thanks!
Please try to include your data as callable code, so others can easily copy/paste and experiment. In your case you can get it with df.head(16).to_dict('list'). I used the following
df = pd.DataFrame({
'boundingBox_y': [183, 120, 305, 120, 305, 161, 182, 161, 318, 120, 381, 120, 382, 162, 318, 161],
'text_y': (['FORM'] * 8) + (['ABC'] * 8),
'confidence': ([0.987] * 8) + ([0.976] * 8)
})
Then you can pivot your dataframe but you need to add a new column to hold the pivoted column names.
# rename the current values column
df.rename({'boundingBox_y': 'value'}, axis=1, inplace=True)
# create a column that contains the columns headers and can be pivoted
df['boundingBox_y'] = df.groupby(['confidence', 'text_y']).transform('cumcount')
# pivot your df
df = df.pivot(index=['confidence', 'text_y'],
columns='boundingBox_y', values='value')
Output
boundingBox_y 0 1 2 3 4 5 6 7
confidence text_y
0.976 ABC 318 120 381 120 382 162 318 161
0.987 FORM 183 120 305 120 305 161 182 161

Getting columns by list of substring values

I have dataframe which is mentioned below, i have large data wanted to create diffrent data frame from substring values of column
df
ID ex_srr123 ex2_srr124 ex3_srr125 ex4_srr1234 ex23_srr5323
san 12 43 0 34 0
mat 53 0 34 76 656
jon 82 223 23 32 21
jack 0 12 2 0 0
i have a list of substring of column
coln1=['srr123', 'srr124']
coln2=['srr1234','srr5323']
I wanted
df2=
ID ex_srr123 ex2_srr12
san 12 43
mat 53 0
jon 82 223
jack 0 12
I tried
df2=df[coln1]
i didn't get what i wanted please help me how can i get desire output
Statically
df2 = df.filter(regex="srr123$|srr124$").copy()
Dynamically
coln1 = ['srr123', 'srr124']
df2 = df.filter(regex=f"{coln1[0]}$|{coln1[1]}$").copy()
The $ signifies the end of the string, so that the column ex4_srr1234 isn't also included in your result.
Look into the filter method
df.filter(regex="srr123|srr124").copy()
I am making a few assumptions:
'ID' is a column and not the index.
The third column in df2 should read 'ex2_srr124' instead of 'ex2_srr12'.
You do not want to include columns of 'df' in 'df2' if the substring does not match everything after the underscore (since 'srr123' is a substring of 'ex4_srr1234' but you did not include it in 'df2').
# set the provided data frames
df = pd.DataFrame([['san', 12, 43, 0, 34, 0],
['mat', 53, 0, 34, 76, 656],
['jon', 82, 223, 23, 32, 21],
['jack', 0, 12, 2, 0, 0]],
columns = ['ID', 'ex_srr123', 'ex2_srr124', 'ex3_srr125', 'ex4_srr1234', 'ex23_srr5323'])
# set the list of column-substrings
coln1=['srr123', 'srr124']
coln2=['srr1234','srr5323']
I suggest to solve this as follows:
# create df2 and add the ID column
df2 = pd.DataFrame()
df2['ID'] = df['ID']
# iterate over each substring in a list of column-substrings
for substring in coln1:
# iterate over each column name in the df columns
for column_name in df.columns.values:
# check if column name ends with substring
if substring == column_name[-len(substring):]:
# assign the new column to df2
df2[column_name] = df[column_name]
This yields the desired dataframe df2:
ID ex_srr123 ex2_srr124
0 san 12 43
1 mat 53 0
2 jon 82 223
3 jack 0 12
df.filter(regex = '|'.join(['ID'] + [col+ '$' for col in coln1])).copy()
ID ex_srr123 ex2_srr124
0 san 12 43
1 mat 53 0
2 jon 82 223
3 jack 0 12

Groupby without an aggregation function and sort that data

I have customer ID and date of purchase. I need to sort date of purchase for each of the customer ID seperately.
I need a groupby operation but without an aggregation, and sort date of purchase for each customer.
Tried this way
new_data = data.groupby('custID').sort_values('purchase_date')
AttributeError: Cannot access callable attribute 'sort_values' of
'DataFrameGroupBy' objects, try using the 'apply' method
Expected result is like:
custID purchase_date
100 23/01/2019
100 29/01/2019
100 03/04/2019
120 02/05/2018
120 09/03/2019
120 11/05/2019
# import the pandas library
import pandas as pd
data = {
'purchase_date': ['23/01/2019', '19/01/2019', '12/01/2019', '23/01/2019', '11/01/2019', '23/01/2019', '06/05/2019', '05/05/2019', '05/01/2019', '02/07/2019',],
'custID': [100, 160, 100, 110, 160, 110, 110, 110, 110, 160]
}
df = pd.DataFrame(data)
sortedData = df.groupby('custID').apply(
lambda x: x.sort_values(by = 'purchase_date', ascending = True))
sortedData=sortedData.reset_index(drop=True, inplace=False)
OUTPUT:
print(sortedData)
Index custID purchase_date
0 100 12/01/2019
1 100 23/01/2019
2 110 05/01/2019
3 110 05/05/2019
4 110 06/05/2019
5 110 23/01/2019
6 110 23/01/2019
7 160 02/07/2019
8 160 11/01/2019
9 160 19/01/2019
print(sortedData.to_string(index=False))
custID purchase_date
100 12/01/2019
100 23/01/2019
110 05/01/2019
110 05/05/2019
110 06/05/2019
110 23/01/2019
110 23/01/2019
160 02/07/2019
160 11/01/2019
160 19/01/2019

Resources