I am trying to scan a column in a df that only contains values that have 0-9. I want to exclude or flag columns in this dataframe that contain aplha/numerical
df_analysis[df_analysis['unique_values'].astype(str).str.contains(r'^[0-9]*$', na=True)]
import pandas as pd
df = pd.DataFrame({"string": ["asdf", "lj;k", "qwer"], "numbers": [6, 4, 5], "more_numbers": [1, 2, 3], "mixed": ["wef", 8, 9]})
print(df.select_dtypes(include=["int64"]).columns.to_list())
print(df.select_dtypes(include=["object"]).columns.to_list())
Create dataframe with multiple columns. Use .select_dtypes to find the columns that are integers and return them as a list. You can add "float64" or any other numeric type to the include list.
Output:
Related
I have a dataset catalog with 3 columns: product id, brand name and product class.
import pandas as pd
catalog = {'product_id': [1, 2, 3, 1, 2, 4, 3, 5, 6],
'brand_name': ['FW', 'GW', 'FK','FW','GW','WU','FK','MU', 'AS'],
'product_class': ['ACCESSORIES', 'DRINK', 'FOOD', 'ACCESSORIES', 'DRINK', 'FURNITURE','FOOD', 'ELECTRONICS', 'APPAREL']}
df = pd.DataFrame(data=catalog)
Assume I have a list of product id prod = [1,3,4]. Now, with Python, I want to list all the brand names corresponding to this list prod based on the product_id. How can I do this using only groupby() and get_group() functions? I can do this using pd.DataFrame() combined with the zip() function, but it is too inefficient, as I would need to obtain each column individually.
Expected output (in dataframe)
Product_id Brand_name
1 'FW'
3 'FK'
4 'WU'
Can anyone give some help on this?
You can use pandas functions isin() and drop_duplicates() to achieve this:
prod = [1,3,4]
print(df[df.product_id.isin(prod)][["product_id", "brand_name"]].drop_duplicates())
Output:
product_id brand_name
0 1 FW
2 3 FK
5 4 WU
I have following dataframe df with 3 rows where 3rd row consists of all empty strings. I am trying to drop all the rows which has all the columns empty but somehow the rows are not getting dropped. Below is my snippet.
import pandas as pd
d = {'col1': [1, 2, ''], 'col2': [3, 4, '']}
df = pd.DataFrame(data=d)
df = df.dropna(how='all')
Please suggest where I am doing wrong?
You don't have NaN values. You have '', which is not NaN. So:
df[df.ne('').any(1)]
I want to add a list of unique values to a DataFrame column. There is the code:
IDs = set(Remedy['Ticket ID'])
log['ID Incidencias'] = IDs
But I obtain the following error:
ValueError: Length of values does not match length of index
Any idea about how could I add a list of unique values to an existing DataFrame column?
Thanks
Not sure if this is what you really need, but to add a list or set of values to each row of an existing dataframe column you can use:
log['ID Incidencias'] = [IDs] * len(log)
Example:
df = pd.DataFrame({'col1': list('abc')})
IDs = set((1,2,3,4))
df['col2'] = [IDs] * len(df)
print(df)
# col1 col2
#0 a {1, 2, 3, 4}
#1 b {1, 2, 3, 4}
#2 c {1, 2, 3, 4}
I want to iterate over all the index rows of my first dataframe.
And if this index exists in the indexes of the second dataframe, I want to return this line.
I see that df1.loc[2] returns the data in the row where the index is 2.
How can I iterate over all of the indexes in both dataframes?
You can use .join between dataframes to get the rows with same indexes.
In [1]: import pandas as pd
...: a = pd.DataFrame({'a': [1, 3]}, index=[1, 2])
...:
...: b = pd.DataFrame({'b': [3, 4]}, index=[2, 5])
...: a.join(b, how='inner')
Out[1]:
a b
2 3 3
I have a dataframe with three columns containing 220 datapoints. Now I need to make one column the key and the other column the value and remove the third column. How do I do that?
I have created the dataframe by scraping Wikipedia in order to create a Keyword Search. Now I need to create an index of terms contained, for which dictionaries are the most effective. How do I create a dictionaries out of a dataframe where one column in the key for another column?
I have used a sample dataframe having 3 columns and 3 rows as you have not provided the actual data. You can replace it with your data and column names.
I have used for loop with iterrows() to loop over each row.
Code:
import pandas as pd
df = pd.DataFrame (
{'Alphabet': ['A', 'B','C'] ,
'Number': [1,2,3],
'To_Remove': [10, 15, 8]})
sample_dictionary = {}
for index,row in df.iterrows():
sample_dictionary[row['Alphabet']] = row['Number']
print(sample_dictionary)
Output:
{'A': 1, 'B': 2, 'C': 3}
You can use the Pandas function,
pd.Dataframe.to_dict
Documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html
Example
import pandas as pd
# Original dataframe
df = pd.DataFrame({'col1': [1, 2, 3],
'col2': [0.5, 0.75, 1.0],
'col3':[0.1, 0.9, 1.9]},
index=['a', 'b', 'c'])
# To dictonary
dictionary = df.to_dict(df)