Is there a way in pandas to give the same column of a pandas dataframe two names, so that I can index the column by only one of the two names? Here is a quick example illustrating my problem:
import pandas as pd
index=['a','b','c','d']
# The list of tuples here is really just to
# somehow visualize my problem below:
columns = [('A','B'), ('C','D'),('E','F')]
df = pd.DataFrame(index=index, columns=columns)
# I can index like that:
df[('A','B')]
# But I would like to be able to index like this:
df[('A',*)] #error
df[(*,'B')] #error
You can create a multi-index column:
df.columns = pd.MultiIndex.from_tuples(df.columns)
Then you can do:
df.loc[:, ("A", slice(None))]
Or: df.loc[:, (slice(None), "B")]
Here slice(None) is equivalent to selecting all indices at the level, so (slice(None), "B") selects columns whose second level is B regardless of the first level names. This is semantically the same as :. Or write in pandas index slice way. df.loc[:, pd.IndexSlice[:, "B"]] for the second case.
Related
This is how I am reading and creating the dataframe with pandas
def get_sheet_data(sheet_name='SomeName'):
df = pd.read_excel(f'{full_q_name}',
sheet_name=sheet_name,
header=[0,1],
index_col=0)#.fillna(method='ffill')
df = df.swapaxes(axis1="index", axis2="columns")
return df.set_index('Product Code')
printing this tabularized gives me(this potentially will have hundreds of columns):
I cant seem to add those first two rows into the header, I've tried:
python:pandas - How to combine first two rows of pandas dataframe to dataframe header?https://stackoverflow.com/questions/59837241/combine-first-row-and-header-with-pandas
and I'm failing at each point. I think its because of the multiindex, not necessarily the axis swap? But using: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html is kind of going over my head right now. Please help me add those two rows into the header?
The output of df.columns is massive so Ive cut it down alot:
Index(['Product Code','Product Narrative\nHigh-level service description','Product Name','Huawei Product ID','Type','Bill Cycle Alignment',nan,'Stackable',nan,
and ends with:
nan], dtype='object')
We Create new column names and set them to df.columns, the new column names are generated by joining the 3 Multindex headers and the 1st row of the DataFrame.
df.columns = ['_'.join(i) for i in zip(df.columns.get_level_values(0).tolist(), df.columns.get_level_values(1).tolist(), df.iloc[0,:].replace(np.nan,'').tolist())]
I'm merging my two dataframes below on two fields.
successes = pd.merge(failures, successes, left_on=['name', 'project_name'], right_on=['name', 'project_name'], how='left')
But I get this error - can anyone help me out please?
/usr/local/lib/python3.8/site-packages/pandas/core/reshape/merge.py:643: UserWarning: merging between different levels can give an unintended result (1 levels on the left,2 on the right)
warnings.warn(msg, UserWarning)
I think it must be written this way:
successes.merge(failures, on=['name', 'project_name'])
This happens when you merge DataFrames with different levels of column indices.
Artificial example below reproduces your warning:
import pandas as pd
# a has 2 level column index
a = pd.DataFrame({("name_0","name_01"):[1,2,3,4],
("name_0","name_02"):[4,3,2,1]})
# b has 1 level column index
b = pd.DataFrame({"name_0":[10,2,30,40],
"name_1":[40,30,20,10]})
# Notice how left_on accepts list of tuples. Tuples can be used to adress multilevel columns
pd.merge(a,b,how="left",left_on=[("name_0","name_01")],right_on=["name_0"])
If you instead use only the level 1 of multilevel column index in DataFrame "a" this warning disappears:
import pandas as pd
a = pd.DataFrame({("name_0","name_01"):[1,2,3,4],
("name_0","name_02"):[4,3,2,1]})
# Only use the 1st level index (e.g. "name_01" and "name_02")
a.columns = a.columns.get_level_values(1)
b = pd.DataFrame({"name_0":[10,2,30,40],
"name_1":[40,30,20,10]})
# Notice how left_on is now a normal string since only 1 level is used
pd.merge(a,b,how="left",left_on=["name_01"],right_on=["name_0"])
I suggest you check whether both your DataFrames have same level indices. If not consider dropping one level or flattening them to one level.
I have a dataframe df as below.
I want the final dataframe to be like this as follows. i.e, for each unique Name only last 2 rows must be present in the final output.
i tried the following snippet but its not working.
df = df[df['Name']].tail(2)
Use GroupBy.tail:
df1 = df.groupby('Name').tail(2)
Just one more way to solve this using GroupBy.nth:
df1 = df.groupby('Name').nth([-1,-2]) ## this will pick the last 2 rows
I need to find columns names if they contain one of these words COMPLETE, UPDATED and PARTIAL
This is my code, not working.
import pandas as pd
df=pd.DataFrame({'col1': ['', 'COMPLETE',''],
'col2': ['UPDATED', '',''],
'col3': ['','PARTIAL','']},
)
print(df)
items=["COMPLETE", "UPDATED", "PARTIAL"]
if x in items:
print (df.columns)
this is the desired output:
I tried to get inspired by this question Get column name where value is something in pandas dataframe but I couldn't wrap my head around it
We can do isin and stack and where:
s=df.where(df.isin(items)).stack().reset_index(level=0,drop=True).sort_index()
s
col1 COMPLETE
col2 UPDATED
col3 PARTIAL
dtype: object
Here's one way to do it.
# check each column for any matches from the items list.
matched = df.isin(items).any(axis=0)
# produce a list of column labels with a match.
matches = list(df.columns[matched])
I have the following spreadsheet that I am bringing in to pandas:
Excel Spreadsheet
I import it with:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
Jupyter shows it like this:
Panda Dataframe 1
I then transpose the dataframe with
df = df.T
Which results in this
Transposed DataFrame
At this stage how can I now change the text in the leftmost index column? I want to change the word Day to the word Service, but I am not sure how to address that cell/header. I can't refer to column 0 and change the header for that.
Likewise how could i then go on to change the A, B, C, D text which is now the index column?
You could first assign to the columns attribute, and then apply the transposition.
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df.columns = ['Service','AA', 'BB', 'CC', 'DD']
df = df.T
Renaming the columns before transposing would work. To do exactly what you want, you can use the the rename function. In the documentation it also has a helpful example on how to rename the index.
Your example in full:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service'}
df.rename(index = dict_rename)
To extend this to more index values, you merely need to adjust the dict_rename argument before renaming.
Full sample:
import pandas as pd
df = pd.read_excel("sessions.xlsx")
df = df.T
dict_rename = {'Day': 'Service','A':'AA','B':'BB','C':'CC','D':'DD'}
df.rename(index = dict_rename)