I want to iterate over all the index rows of my first dataframe.
And if this index exists in the indexes of the second dataframe, I want to return this line.
I see that df1.loc[2] returns the data in the row where the index is 2.
How can I iterate over all of the indexes in both dataframes?
You can use .join between dataframes to get the rows with same indexes.
In [1]: import pandas as pd
...: a = pd.DataFrame({'a': [1, 3]}, index=[1, 2])
...:
...: b = pd.DataFrame({'b': [3, 4]}, index=[2, 5])
...: a.join(b, how='inner')
Out[1]:
a b
2 3 3
Related
I am trying to scan a column in a df that only contains values that have 0-9. I want to exclude or flag columns in this dataframe that contain aplha/numerical
df_analysis[df_analysis['unique_values'].astype(str).str.contains(r'^[0-9]*$', na=True)]
import pandas as pd
df = pd.DataFrame({"string": ["asdf", "lj;k", "qwer"], "numbers": [6, 4, 5], "more_numbers": [1, 2, 3], "mixed": ["wef", 8, 9]})
print(df.select_dtypes(include=["int64"]).columns.to_list())
print(df.select_dtypes(include=["object"]).columns.to_list())
Create dataframe with multiple columns. Use .select_dtypes to find the columns that are integers and return them as a list. You can add "float64" or any other numeric type to the include list.
Output:
I can put multiple dataframes into one xlsx sheet and put multiple dataframes across separate tabs sheet as clearly explained here Putting many python pandas dataframes to one excel worksheet. However I cannot figure out how to do both in one go without generating the files first and then combine them together.
I have 4 dataframes and I world like to have 2 of them in one sheet and the other 2 in another sheet.
df1 = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df1a = pd.DataFrame({'col1a': [1, 2], 'col2a': [3, 4]})
df2 = pd.DataFrame({'col3': [1, 2], 'col4': [3, 4]})
df2a = pd.DataFrame({'col3a': [1, 2], 'col4a': [3, 4]})
dic = {'sheet_1': [df1, df1a],
'sheet_2': [df2, df2a]}
def multiple_dfs(df_list, sheets):
writer = pd.ExcelWriter('test1.xlsx', engine='xlsxwriter')
row = 0
for dataframe in df_list:
dataframe.to_excel(writer,sheet_name=sheets)
row = row + len(dataframe.index) + 4
writer.save()
for k, v in dic.items():
multiple_dfs(v, k)
I have following dataframe df with 3 rows where 3rd row consists of all empty strings. I am trying to drop all the rows which has all the columns empty but somehow the rows are not getting dropped. Below is my snippet.
import pandas as pd
d = {'col1': [1, 2, ''], 'col2': [3, 4, '']}
df = pd.DataFrame(data=d)
df = df.dropna(how='all')
Please suggest where I am doing wrong?
You don't have NaN values. You have '', which is not NaN. So:
df[df.ne('').any(1)]
I want to add a list of unique values to a DataFrame column. There is the code:
IDs = set(Remedy['Ticket ID'])
log['ID Incidencias'] = IDs
But I obtain the following error:
ValueError: Length of values does not match length of index
Any idea about how could I add a list of unique values to an existing DataFrame column?
Thanks
Not sure if this is what you really need, but to add a list or set of values to each row of an existing dataframe column you can use:
log['ID Incidencias'] = [IDs] * len(log)
Example:
df = pd.DataFrame({'col1': list('abc')})
IDs = set((1,2,3,4))
df['col2'] = [IDs] * len(df)
print(df)
# col1 col2
#0 a {1, 2, 3, 4}
#1 b {1, 2, 3, 4}
#2 c {1, 2, 3, 4}
I have a dataframe with three columns containing 220 datapoints. Now I need to make one column the key and the other column the value and remove the third column. How do I do that?
I have created the dataframe by scraping Wikipedia in order to create a Keyword Search. Now I need to create an index of terms contained, for which dictionaries are the most effective. How do I create a dictionaries out of a dataframe where one column in the key for another column?
I have used a sample dataframe having 3 columns and 3 rows as you have not provided the actual data. You can replace it with your data and column names.
I have used for loop with iterrows() to loop over each row.
Code:
import pandas as pd
df = pd.DataFrame (
{'Alphabet': ['A', 'B','C'] ,
'Number': [1,2,3],
'To_Remove': [10, 15, 8]})
sample_dictionary = {}
for index,row in df.iterrows():
sample_dictionary[row['Alphabet']] = row['Number']
print(sample_dictionary)
Output:
{'A': 1, 'B': 2, 'C': 3}
You can use the Pandas function,
pd.Dataframe.to_dict
Documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html
Example
import pandas as pd
# Original dataframe
df = pd.DataFrame({'col1': [1, 2, 3],
'col2': [0.5, 0.75, 1.0],
'col3':[0.1, 0.9, 1.9]},
index=['a', 'b', 'c'])
# To dictonary
dictionary = df.to_dict(df)