I've joined two dataframes using an inner join, and created multi-indexed column headers in the process. I've also set the 'ID' column as the index, however now that sits a row below the other column headers, meaning there is a blank row of data under the headers. How can I realign this ID column with the column headers?
Here is my code:
result.set_index('ID', inplace=True)
result.sort_values(by=('ID'), inplace=True)
columns = [('c', 'c'), ('X', 'd'), ('X', 'e'), ('X', 'f'), ('g', 'g'), ('Y', 'h'), ('Y', 'i')]
result.columns = pd.MultiIndex.from_tuples(columns)
result.to_excel("data.xlsx", index=True)
this writes the following to my excel file (notice the ID header is a row below - I've also removed the data for privacy reasons):
enter image description here
Related
I have a pandas dataframe as below:
Which I instead want to transpose it like so relabeling the first 2 columns "title" and "heading" while the remainder are labeled starting from 0:
title
heading
0
1
2
3
0
Residential Zones
Purpose
0.021688304841518402
0.034876345
0.01611880026757717
-0.014965585432946682
1
Purposes of Residential Zones
RR (Residential Ranchette)
0.00977486465126276
-0.0021642891224473715
-0.008860375732183456
0.01690787263214588
I tried:
df.T
df.rename(columns = {0:'title', 1: 'heading'}, inplace=True)
But while this renames the first two columns I still need the remaining column labels to start from 0 as above. There are loads of columns so a for loop would be totally inefficient. How can I do this while transposing the pandas dataframe? This originated from a csv file btw
Your current dataset does not contain the appropriate column labels. Therefore, once it has been transposed you must relabel each column, however you can simply pop the last label and insert a new label, 'Heading', without ever using a for loop.
from pandas import DataFrame
import numpy as np
data = np.array([['', 'RZ', 'POTRZ'],
['0', 'Purpose', 'RR'],
['1', 0.1, 0.2],
['2', 0.3, 0.4]])
df = DataFrame(data=data[1:, 1:], index=data[1:, 0], columns=data[0, 1:])
# Transpose the data set
df = df.T
# Take the current index col ('RZ, POTRZ') then Shift & Replace
col_index = df.columns[:-1]
shifted_col = df.index
df.index = col_index
df.insert(0, '', shifted_col)
# Pop last label & inject new 'Heading' col. label, then rename
df.rename(columns=dict(zip(df.columns, ['Title', 'Heading'] + list(df.columns[1:-1].values))), inplace=True)
Suppose my data frame df has the following column names: ['date', 'value', '20211010', '20211017', '20211024', ...]
I want to rename the column names of '20211010', '20211017', '20211024', ... (that is, all the columns starting from 20211010) to t1, t2, t3, ... continue to increase.
The expected new column names will be ['date', 'value', 't1', 't2', 't3', ...].
How to achieve this in Pandas? Thanks.
Reference:
how do i rename the columns from unnamed:0 to columns with increment number in python
IIUC, a robust method could be to use pandas.to_datetime and pandas.factorize:
idx, _ = pd.factorize(pd.to_datetime(df.columns, format='%Y%m%d',
errors='coerce'),
sort=True)
df.columns = ('t'+pd.Series(idx+1, dtype='str')).mask(idx<0, df.columns)
Example output:
Index(['date', 'value', 't1', 't2', 't4', 'other', 't3'], dtype='object')
Input columns:
Index(['date', 'value', '20211010', '20211017', '20211024', 'other',
'20211018'],
dtype='object')
robustness
to_datetime ensures that valid dates are used, and sort=True in factorize enables to keep the dates sorted.
Example on this input:
['X', '20211010', '20229999', '20211018', '20211024', 'Y', '20211001']
The output would be:
['X', 't2', '20229999', 't3', 't4', 'Y', 't1']
The invalid date is ignored and the tn are in order.
If always the first two columns will be skipped and just rename the rest a simple way would be
for i, col in enumerate(df.columns[2:]):
df.rename(columns={col:f't{i+1}'}, inplace = True)
But this will not consider the name of any column, so if somehow you have a column in the middle that doesn't need to be renamed it will do it anyway
If you need to do it bulletproof I would go with the answer from mozway
I have following dataframe df with 3 rows where 3rd row consists of all empty strings. I am trying to drop all the rows which has all the columns empty but somehow the rows are not getting dropped. Below is my snippet.
import pandas as pd
d = {'col1': [1, 2, ''], 'col2': [3, 4, '']}
df = pd.DataFrame(data=d)
df = df.dropna(how='all')
Please suggest where I am doing wrong?
You don't have NaN values. You have '', which is not NaN. So:
df[df.ne('').any(1)]
I have 2 DataFrames : df0 and df1 and df1.shape[0] > df1.shape[0].
df0 and df1 have the exact same columns.
Most of the rows of df0 are in df1.
The indices of df0 and df1 are
df0.index = range(df0.shape[0])
df1.index = range(df1.shape[0])
I then created dft
dft = pd.concat([df0, df1], axis=0, sort=False)
and removed duplicated rows with
dft.drop_duplicates(subset='this_col_is_not_index', keep='first', inplace=True)
I have some duplicates on the index of dft. For example :
dft.loc[3].shape
returns
(2, 38)
My aim is to change the index of the second row returned to have a unique index 3.
This second row should be indexed dft.index.sort_values()[-1]+1.
I would like to apply this operation on all duplicates.
References :
Python Pandas: Get index of rows which column matches certain value
Pandas: Get duplicated indexes
Redefining the Index in a Pandas DataFrame object
Add parameter ignore_index=True to concat for avoid duplicated index values:
dft = pd.concat([df0, df1], axis=0, sort=False, ignore_index=True)
Use reset_index(drop = True)
dft.reset_index(drop=True)
I have a dataframe with three columns containing 220 datapoints. Now I need to make one column the key and the other column the value and remove the third column. How do I do that?
I have created the dataframe by scraping Wikipedia in order to create a Keyword Search. Now I need to create an index of terms contained, for which dictionaries are the most effective. How do I create a dictionaries out of a dataframe where one column in the key for another column?
I have used a sample dataframe having 3 columns and 3 rows as you have not provided the actual data. You can replace it with your data and column names.
I have used for loop with iterrows() to loop over each row.
Code:
import pandas as pd
df = pd.DataFrame (
{'Alphabet': ['A', 'B','C'] ,
'Number': [1,2,3],
'To_Remove': [10, 15, 8]})
sample_dictionary = {}
for index,row in df.iterrows():
sample_dictionary[row['Alphabet']] = row['Number']
print(sample_dictionary)
Output:
{'A': 1, 'B': 2, 'C': 3}
You can use the Pandas function,
pd.Dataframe.to_dict
Documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html
Example
import pandas as pd
# Original dataframe
df = pd.DataFrame({'col1': [1, 2, 3],
'col2': [0.5, 0.75, 1.0],
'col3':[0.1, 0.9, 1.9]},
index=['a', 'b', 'c'])
# To dictonary
dictionary = df.to_dict(df)