Using Python3.x rename a column name of a dataframe with user provided new column name in console window

Using Python3.x rename a column name of a dataframe with user provided new column name in console window - python-3.x

How to rename a column name of my DataFrame with the 'user provided new column name in the console screen'.

Say you have a dataframe like this
>>> df
A B C
0 1 9 1
1 4 7 2
2 3 7 6
3 6 1 6
4 8 1 9
And you want to rename the column B to that given by user:
>>> col_name = input("Enter Column Name: ")
Enter Column Name: test
>>> col_names = df.columns.tolist()
>>> col_names
['A', 'B', 'C']
>>> col_names[1] = col_name
>>> col_names
['A', 'test', 'C']
>>> df.columns = col_names
>>> df
A test C
0 1 9 1
1 4 7 2
2 3 7 6
3 6 1 6
4 8 1 9

Related

Pandas DataFrame copy with condition [duplicate]

This question already has answers here:
How can I replicate rows of a Pandas DataFrame?
(10 answers)
Closed 11 months ago.
I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.
import pandas as pd
what_i_have = pd.DataFrame(data={
'id': ['A', 'B', 'C'],
'n' : [ 1, 2, 3],
'v' : [ 10, 13, 8]
})
what_i_want = pd.DataFrame(data={
'id': ['A', 'B', 'B', 'C', 'C', 'C'],
'v' : [ 10, 13, 13, 8, 8, 8]
})
Is this possible?

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:
df2 = df.loc[df.index.repeat(df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
Or you could use np.repeat to get the repeated indices and then use that to index into the frame:
df2 = df.loc[np.repeat(df.index.values, df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
After which there's only a bit of cleaning up to do:
df2 = df2.drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
Note that if you might have duplicate indices to worry about, you could use .iloc instead:
df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
which uses the positions, and not the index labels.

You could use set_index and repeat
In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index()
Out[1057]:
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
Details
In [1058]: df
Out[1058]:
id n v
0 A 1 10
1 B 2 13
2 C 3 8

It's something like the uncount in tidyr:
https://tidyr.tidyverse.org/reference/uncount.html
I wrote a package (https://github.com/pwwang/datar) that implements this API:
from datar import f
from datar.tibble import tribble
from datar.tidyr import uncount
what_i_have = tribble(
f.id, f.n, f.v,
'A', 1, 10,
'B', 2, 13,
'C', 3, 8
)
what_i_have >> uncount(f.n)
Output:
id v
0 A 10
1 B 13
1 B 13
2 C 8
2 C 8
2 C 8

Not the best solution, but I want to share this: you could also use pandas.reindex() and .repeat():
df.reindex(df.index.repeat(df.n)).drop('n', axis=1)
Output:
id v
0 A 10
1 B 13
1 B 13
2 C 8
2 C 8
2 C 8
You can further append .reset_index(drop=True) to reset the .index.

The way `Drop column by id ` result in all same columns removed in dataframe

import pandas as pd
df1 = pd.DataFrame({"A":[14, 4, 5, 4],"B":[1,2,3,4]})
df2 = pd.DataFrame({"A":[14, 4, 5, 4],"C":[5,6,7,8]})
df = pd.concat([df1,df2],axis=1)
Let's see the concated df,the first column and third column shares the same column name A.
df
A B A C
0 14 1 14 5
1 4 2 4 6
2 5 3 5 7
3 4 4 4 8
I want to get the following format.
df
A B C
0 14 1 5
1 4 2 6
2 5 3 7
3 4 4 8
Drop column by id.
result = df.drop(df.columns[2],axis=1)
result
B C
0 1 5
1 2 6
2 3 7
3 4 8
I can get what i expect this way:
import pandas as pd
df1 = pd.DataFrame({"A":[14, 4, 5, 4],"B":[1,2,3,4]})
df2 = pd.DataFrame({"A":[14, 4, 5, 4],"C":[5,6,7,8]})
df2 = df2.drop(df2.columns[0],axis=1)
df = pd.concat([df1,df2],axis=1)
It is so strange that both the first and third column removed when to drop specified column by id.
1.Please tell me the reason of dataframe's this action.
2.How can i remove the third column at the same time keep the first column undeleted?

Here's a way using indexes:
index_to_drop = 2
# get indexes to keep
col_idxs = [en for en, _ in enumerate(df.columns) if en != index_to_drop]
# subset the df
df = df.iloc[:,col_idxs]
A B C
0 14 1 5
1 4 2 6
2 5 3 7
3 4 4 8

how to split name in dataframe's column python3

I have a dataframe:
import pandas as pd
d = {'A-101': ['1','2','4'], 'B-102':
['5','7','8'],'A-102': ['1','2','9']}
df = pd.DataFrame(data=d)
which is
A-101 B-102 A-102
0 1 5 1
1 2 7 2
2 4 8 9
how to change above into the follow:
company number '0' '1' '2'
A 101 1 2 4
B 102 5 7 8
A 102 1 2 9
Here I wan to split the column name A-101 into two columns and transpose the column into a row with name column name '0', '1', '2'

Solution
output = pd.DataFrame(columns=['company', 'number', '0', '1', '2'])
output['company'] = [col.split('-')[0] for col in df.columns]
output['number'] = [col.split('-')[1] for col in df.columns]
output['0'] = df.iloc[0].values
output['1'] = df.iloc[1].values
output['2'] = df.iloc[2].values

Pandas aggregate column and keep header

I have code which works but gives me data without header is there a way I can write this code so header is not removed? I know one way will be to add back header, but is there a better way?
My code:
df = pd.read_csv(“_data.csv",skiprows=[0], header=None)
df = df.groupby([2])[10].sum().astype(float)
Data:
A B
1 2
1 1
2 3
2 4
I have data like above trying to get this result:
A B
1 3
2 7

Try to use the function reset_index after the sum:
data = [{'a': 1, 'b': 2},{'a': 1, 'b': 1},{'a': 2, 'b': 3},{'a': 2, 'b': 4}]
df = pd.DataFrame(data)
df
a b
0 1 2
1 1 1
2 2 3
3 2 4
df.groupby('a').sum().reset_index()
a b
0 1 3
1 2 7

You should specify the separator (several spaces in your case) and that the header is the first row (=0, with python indexing), than groupby the column you want.
df = pd.read_csv("_data.csv", sep='\s*', header=0)
A B
0 1 2
1 1 1
2 2 3
3 2 4
df = df.groupby(['A']).sum()
B
A
1 3
2 7

Pandas dataframe merge by function on column names

I say to dataframes.
df_A has columns A__a, B__b, C. (shape 5,3)
df_B has columns A_a, B_b, D. (shape 4,3)
How can I unify them (without having to iterate over all columns) to get one df with columns A,B ? (shape 9,2) - meaning A__a and A_a should be unified to the same column.
I need to use merge with applying the function lambda x: x.replace("_",""). Is it possible?

import pandas as pd
df = pd.DataFrame(np.random.randint(0,5,size=(5, 3)), columns=['A__a', 'B__b', 'C'])
df:
A__a B__b C
0 3 0 2
1 0 3 4
2 0 4 4
3 4 2 1
4 3 4 3
df2:
df2 = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=['A__a', 'B__b', 'D'])
A__a B__b D
0 3 2 0
1 3 1 1
2 0 2 0
3 3 2 0
df3 = pd.concat([df, df2], join='inner', ignore_index=True)
df_final = df3.rename(lambda x: str(x).split("__")[0],axis='columns')
df_final
df_final:
A B
0 3 0
1 0 3
2 0 4
3 4 2
4 3 4
5 3 2
6 3 1
7 0 2
8 3 2

A simple concatenation will do
pd.concat([df_A, df_B], join='outer')[['A', 'B']].copy().
or
'pd.concat([df_A, df_B], join='inner')

You have to merge Dataframe using 'outer'
import pandas as pd
import numpy as np
df_A = pd.DataFrame(np.random.randint(10,size=(5,3)), columns=['A','B','C'])
df_B = pd.DataFrame(np.random.randint(10,size=(4,3)), columns=['A','B','D'])
print(df_A.shape,df_B.shape)
#(5, 3) (4, 3)
new_df = df_A.merge(df_B , how= 'outer', on = ['A','B'])[['A','B']]
print(new_df.shape)
#(9,2)

If you cant change the name of the columns in advance and you want to use lambda x: x.replace("_",""), this is a way:
df = pd.concat([df1.rename_axis(lambda x: str(x).replace("_",""),axis='columns'), df2.rename_axis(lambda x: str(x).replace("_",""),axis='columns')], join='inner', ignore_index=True)
Example:
d1 = {'A__a' : ('A', 'B', 'C', 'D', 'E') , 'B__b' : ('a', 'b', 'c', 'd', 'e') ,'C': (1,2,3,4,5)}
df1 = pd.DataFrame(d1)
A__a B__b C
0 A a 1
1 B b 2
2 C c 3
3 D d 4
4 E e 5
d2 = {'A_a' : ('B', 'C', 'D','G') , 'B_b' : ('l','m','n','o') ,'D': (6,7,8,9)}
df2=pd.DataFrame(d2)
A_a B_b D
0 B l 6
1 C m 7
2 D n 8
3 G o 9
Output:
Aa Bb
0 A a
1 B b
2 C c
3 D d
4 E e
5 B l
6 C m
7 D n
8 G o
Alternative with:
df = pd.concat([df1.rename(columns={'A__a':'A', 'B__b':'B'}), df2.rename(columns={'A_a':'A', 'B_b':'B'})], join='inner', ignore_index=True)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using Python3.x rename a column name of a dataframe with user provided new column name in console window - python-3.x

How to rename a column name of my DataFrame with the 'user provided new column name in the console screen'.

Related

Pandas DataFrame copy with condition [duplicate]

The way `Drop column by id ` result in all same columns removed in dataframe

how to split name in dataframe's column python3

Pandas aggregate column and keep header

Pandas dataframe merge by function on column names

Categories

Resources