looping over function with multiple argument - python-3.x

Beginning python user here. This seems like a simple question but I haven't been able to find an answer or at least recognize an answer.
I have a function as follows:
def standardize(columns, dictionary):
for x in columns:
df.iloc[:,x] = df.iloc[:,x].map(dictionary)
The function takes a list of columns and recodes all the values in that column according to the associated dictionary.
Rather than calling the function a dozen times for each list of columns and its associated dictionary:
standardize([15,19,27], dict1)
standardize([47,65,108], dict2)
standardize([49,53,55,90], dict3)
ideally I'd like it to loop over. a list of all the column lists and a list of all the dictionaries. something like:
for column_list in [[list1], [list2], [list3]]:
standardize(column_list, associated_dictionary)
How would I go about this?

No, there's no need to loop when considering single replacement. You can use applymap instead:
col_list = [...]
df.iloc[:, columns] = df.iloc[:, columns].applymap(dictionary.get)
Now, for multiple sets, you can zip your column lists and dictionaries and iterate:
column_lists = [col_list1, col_list2, ...]
dictionaries = [dict1, dict2, ...]
for c, d in zip(column_lists, dictionaries):
df.iloc[:, c] = df.iloc[:, c].applymap(d.get)

Related

How can I subset filtering a row per categories with a for loop

How can i subset these lines of code with a for loop
I'm trying to subset these lines of code but I couldn't, I think that it could be done with a group by and a dictionary but I'm couldn't
df_belgium = df_sales[df_sales["Country"]=="Belgium"]
df_norway = df_sales[df_sales["Country"]=="Norway"]
df_portugal = df_sales[df_sales["Country"]=="portugal"]
The most straightforward way would be to loop through ["Belgium","Norway","portugal"], but trying to create objects with variable variable names like df_{country_name} is highly discouraged (see here), so I would recommend creating a dictionary to store your subset dataframes with the country names as keys.
You can use a dict comprehension:
df_sales_by_country = {country_name: df_sales[df_sales["Country"]==country_name] for country_name in ["Belgium","Norway","portugal"]}
The ideal is to use groupby and to store the sub-DataFrames in a dictionary:
d = dict(df.groupby('Country'))
Then access d['Belgium'] for example.
If you need to filter a subset of the countries:
# use a set for efficiency
keep = {'Belgium', 'Norway', 'Portugal'}
d = {key: g for key, g in df.groupby('Country') if country in keep}
or:
keep = ['Belgium', 'Norway', 'Portugal']
d = dict(df[df['Country'].isin(keep)].groupby('Country'))

Assign values from list to specific values in a list of dictionaries

I have list of values:
list = [value1, value2, value3]
And a list of dictionaries where on specific keys I must set the corresponding values:
dictionaries = [{"key1":{"key2":{"key3":position_value1}}},{"key1":{"key2":{"key3":position_value2}}}]
I'm trying to assign the values avoiding solutions that requires do explicit iteration over the numerical indexes of list and dictionaries.
I find the next pseudo-solution iterating over two iterables at the same time using for-each loops
for (dict, value) in zip(dictionaries, list):
dict['key1']['key2']['key3'] = value
print(dictionaries)
But doesn't work due to, all dictionaries take only the last value of the list of values, obtaining the next result:
[{"key1":{"key2":{"key3":position_value3}}},{"key1":{"key2":{"key3":position_value3}}}]
It's important to note that when creating the list of dictionaries the dict.copy() method was used, but maybe this doesn't take affect in the reference allocated in nested dictionaries.
Dictionary list creation
base_dict = {"key1": {"key2":{"key3": None}}}
dictionaries = [base_dict.copy() for n in range(3)]
I appreciate any compact solution, even solutions based on unpacking.
base_dict = {"key1": {"key2":{"key3": None}}}
dictionaries = [base_dict.copy() for n in range(3)]
Will create shallow copies of base_dict. That means that while these are indepenedent at the top level, their values are copied by reference; hence, the inner dictionaries {"key2":{"key3": None}} are still all the same object. When rebinding key3, all references will be affected.
You can avoid that by making a deepcopy:
from copy import deepcopy
dictionaries = [deepcopy(base_dict) for _ in range(3)]

Python sort list of dictionary by float values

The following is my list
List_data = [{'cases_covers': 0.1625}, {'headphone': 0.1988}, {'laptop': 0.2271}, {'mobile': 0.2501}, {'perfume': 0.4981}, {'shoe': 0.1896}, {'sunglass': 0.1693}]
Final answer should be like this:
[{'perfume': 0.4981}, {'mobile': 0.2501}, {'laptop': 0.2271}, {'headphone': 0.1988},{'shoe': 0.1896}, {'sunglass': 0.1693},{'cases_covers': 0.1625}]
i want them to be sorted based on the value of the key in descending order
You can get the list of values of a dictionary d with d.values(). Since your dictionaries have only one entry each, these lists will be singletons. You can use these singleton-lists to sort List_data by supplying a keyword argument to the sort function.
Please note that in your example, "perfume", "mobile", "laptop" are the keys and 0.4981, 0.2501 are the values, according to the standard vocabulary for dictionaries in python.
List_data = [{'cases_covers': 0.1625}, {'headphone': 0.1988}, {'laptop': 0.2271}, {'mobile': 0.2501}, {'perfume': 0.4981}, {'shoe': 0.1896}, {'sunglass': 0.1693}]
List_data.sort(key=lambda d: list(d.values()), reverse=True)
print(List_data)
Output:
[{'perfume': 0.4981}, {'mobile': 0.2501}, {'laptop': 0.2271}, {'headphone': 0.1988}, {'shoe': 0.1896}, {'sunglass': 0.1693}, {'cases_covers': 0.1625}]
Important remark
The previous piece of code was answering your question literally, without knowledge of the context in which you are trying to sort this list of dictionaries.
I am under the impression that your use of lists and dictionaries is not optimal. Of course, without knowledge of the context, I am only guessing. But perhaps using only one dictionary would better suit your needs:
dictionary_data = {'cases_covers': 0.1625, 'headphone': 0.1988, 'laptop': 0.2271, 'mobile': 0.2501, 'perfume': 0.4981, 'shoe': 0.1896, 'sunglass': 0.1693}
list_data = sorted(dictionary_data.items(), key=lambda it: it[1], reverse=True)
print(list_data)
Output:
[('perfume', 0.4981), ('mobile', 0.2501), ('laptop', 0.2271), ('headphone', 0.1988), ('shoe', 0.1896), ('sunglass', 0.1693), ('cases_covers', 0.1625)]

Look up a number inside a list within a pandas cell, and return corresponding string value from a second DF

(I've edited the first column name in the labels_df for clarity)
I have two DataFrames, train_df and labels_df. train_df has integers that map to attribute names in the labels_df. I would like to look up each number within a given train_df cell and return in the adjacent cell, the corresponding attribute name from the labels_df.
So fore example, the first observation in train_df has attribute_ids of 147, 616 and 813 which map to (in the labels_df) culture::french, tag::dogs, tag::men. And I would like to place those strings inside one cell on the same row as the corresponding integers.
I've tried variations of the function below but fear I am wayyy off:
def my_mapping(df1, df2):
tags = df1['attribute_ids']
for i in tags.iteritems():
df1['new_col'] = df2.iloc[i]
return df1
The data are originally from two csv files:
train.csv
labels.csv
I tried this from #Danny :
sample_train_df['attribute_ids'].apply(lambda x: [sample_labels_df[sample_labels_df['attribute_name'] == i]
['attribute_id_num'] for i in x])
*please note - I am running the above code on samples of each DF due to run times on the original DFs.
which returned:
I hope this is what you are looking for. i am sure there's a much more efficient way using look up.
df['new_col'] = df['attribute_ids'].apply(lambda x: [labels_df[labels_df['attribute_id'] == i]['attribute_name'] for i in x])
This is super ugly and one day, hopefully sooner than later, i'll be able to accomplish this task in an elegant fashion though, until then, this is what got me the result I need.
split train_df['attribute_ids'] into their own cell/column
helper_df = train_df['attribute_ids'].str.split(expand=True)
combine train_df with the helper_df so I have the id column (they are photo id's)
train_df2 = pd.concat([train_df, helper_df], axis=1)
drop the original attribute_ids column
train_df2.drop(columns = 'attribute_ids', inplace=True)
rename the new columns
train_df2.rename(columns = {0:'attr1', 1:'attr2', 2:'attr3', 3:'attr4', 4:'attr5', 5:'attr6',
6:'attr7', 7:'attr8', 8:'attr9', 9:'attr10', 10:'attr11'})
convert the labels_df into a dictionary
def create_file_mapping(df):
mapping = dict()
for i in range(len(df)):
name, tags = df['attribute_id_num'][i], df['attribute_name'][i]
mapping[str(name)] = tags
return mapping
map and replace the tag numbers with their corresponding tag names
train_df3 = train_df2.applymap(lambda s: my_map.get(s) if s in my_map else s)
create a new column of the observations tags in a list of concatenated values
helper1['new_col'] = helper1[helper1.columns[0:10]].apply(lambda x: ','.join(x.astype(str)), axis = 1)

Remove values from dictionary

I have a large dictionary and I am trying to remove values from keys if they start with certain values. Below is a small example of the dictionary.
a_data = {'78567908': {'26.01.17', '02.03.24', '26.01.12', '04.03.03', '01.01.13', '02.03.01', '01.01.10', '26.01.21'}, '85789070': {'26.01.02', '09.01.04', '02.05.04', '02.03.17', '02.05.01'}, '87140110': {'03.15.25', '03.15.24', '03.15.19'}, '87142218': {'26.17.13', '02.11.01', '02.03.22'}, '87006826': {'28.01.03'}}
After I read in the dictionary, I want to remove values from all keys that start with '26.' or '02.' It is possible that would leave a key with no values (an empty set).
I do have code that works:
exclude = ('26.', '02.')
f_a_data = {}
for k, v in a_data.items():
f_a_data.setdefault(k,[])
for code in v:
print (k, code, not code.startswith(exclude))
if not code.startswith(exclude):
f_a_data[k].append(code)
print('Filtered dict:')
print(f_a_data)
This returns a filtered dict:
Filtered dict:
{'78567908': ['04.03.03', '01.01.13', '01.01.10'], '85789070': ['09.01.04'], '87140110': ['03.15.25', '03.15.24', '03.15.19'], '87142218': [], '87006826': ['28.01.03']}
Question 1: Is this the best way to filter a dictionary?
Question 2: How could i modify the above snippet to return values in a set like the original dict?
Your code is quite all right in complexity terms but can be "pythonized" a little and still remain readable.
My proposal: you can rebuild a dictionary using nested comprehensions and all to test if you should include the values:
a_data = {'78567908': {'26.01.17', '02.03.24', '26.01.12', '04.03.03', '01.01.13', '02.03.01', '01.01.10', '26.01.21'}, '85789070': {'26.01.02', '09.01.04', '02.05.04', '02.03.17', '02.05.01'}, '87140110': {'03.15.25', '03.15.24', '03.15.19'}, '87142218': {'26.17.13', '02.11.01', '02.03.22'}, '87006826': {'28.01.03'}}
exclude = ('26.', '02.')
new_data = {k:{x for x in v if all(s not in x for s in exclude)} for k,v in a_data.items()}
result:
>>> new_data
{'78567908': {'01.01.10', '01.01.13', '04.03.03'},
'85789070': {'09.01.04'},
'87006826': {'28.01.03'},
'87140110': {'03.15.19', '03.15.25', '03.15.24'},
'87142218': set()}
(here using a dictionary comprehension embedding a set comprehension (since you need a set) using a generator comprehension in all)

Resources