Appending values to dictionary/list - python-3.x

I have a mylist = [[a,b,c,d],...[]] with 650 lists inside. I am trying to insert this into a relational database with dictionaries. I have the following code:
for i in mylist:
if len(i) == 4:
cve_ent = {'state':[], 'muni':[], 'area':[]}
cve_ent['state'].append(i[1])
cve_ent['muni'].append(i[2])
cve_ent['area'].append(i[3])
However this code just yields the last list in mylist in the dictionary. I have tried also with a counter and a while loop but I cannot make it run.
I do not know if this is the fastest way to store the data, what I will do is compare the values of the first and second keys with other tables to multiply the values of the third key.

First of all, pull
cve_ent = {'state':[], 'muni':[], 'area':[]}
out of your for loop. That will solve issues with re-writing things.

Related

How to extract several dataframes from dictionary

I currently trying to extract several dataframes from a dictionary. The problem is, that the number of dataframes will vary, sometimes I'll have two dataframes in there and sometimes 30.
At the beginning I create a dictionary (dict_of_exceptions) from a dataframe (exceptions_df). In this dictionary I'll have several dataframes depending on how many different 'Source Wells' I have. With the current code I can extract the first dataframe from the dictionary which is j:
dict_of_exceptions = {k: v for k, v in exceptions_df.groupby('Source Well') }
print (dict_of_exceptions)
for k in dict_of_exceptions.keys():
j = dict_of_exceptions[k]
Could someone help me modify the last line to go trough the dictionary and extract each dataframe (and name them like the corresponding key)?
I think I get your intention, but could not really read your intentions from your code though. Currently, as #cyrilb38 stated in comments, your loop is overriding j, so you would only be able to see the result of last iteration. Anyways, rather transforming use dataframe instead, and I think (may be wrong) that you call your row a dataframe. Replacing a groupby object with dict is not what you wanted, or it is just prolonging the process for nothing.
If you want to see the info of Well X only for example, try this
exceptions_df[exceptins_df['Source Well'] == 'Well X']

Python3 print selected values of dict

In this simple code to read a tsv file of many columes:
InColnames = ['Chr','Pos','Ref','Alt']
tsvin = csv.DictReader(fin, delimiter='\t')
for row in tsvin:
print(', '.join(row[InColnames]))
How can I make the print work ?
The following will do:
for row in tsvin:
print(', '.join(row[col] for col in InCOlNames))
You cannot pass a list of keys to the dict's item-lookup and magically get a list of values. You have to somehow iterate the keys and retrieve each one's value individually. The approach at hand uses a generator expression for that.

Slow loop aggregating rows and columns

I have a DataFrame with a column named 'UserNbr' and a column named 'Spclty', which is composed of elements like this:
[['104', '2010-01-31'], ['215', '2014-11-21'], ['352', '2016-07-13']]
where there can be 0 or more elements in the list.
Some UserNbr keys appear in multiple rows, and I wish to collapse each such group into a single row such that 'Spclty' contains all the unique dicts like those in the list shown above.
To save overhead on appending to a DataFrame, I'm appending each output row to a list, instead to the DataFrame.
My code is working, but it's taking hours to run on 0.7M rows of input. (Actually, I've never been able to keep my laptop open long enough for it to finish executing.)
Is there a better way to aggregate into a structure like this, maybe using a library that provides more data reshaping options instead looping over UserNbr? (In R, I'd use the data.table and dplyr libraries.)
# loop over all UserNbr:
# consolidate specialty fields into dict-like sets (to remove redundant codes);
# output one row per user to new data frame
out_rows = list()
spcltycol = df_tmp.column.get_loc('Spclty')
all_UserNbr = df_tmp['UserNbr'].unique()
for user in all_UserNbr:
df_user = df_tmp.loc[df_tmp['UserNbr'] == user]
if df_user.shape[0] > 0:
open_combined = df_user_open.iloc[0, spcltycol] # capture 1st row
for row in range(1, df_user.shape[0]): # union with any subsequent rows
open_combined = open_combined.union(df_user.iloc[row, spcltycol])
new_row = df_user.drop(['Spclty', 'StartDt'], axis = 1).iloc[0].tolist()
new_row.append(open_combined)
out_rows.append(new_row)
# construct new dataframe with no redundant UserID rows:
df_out = pd.DataFrame(out_rows,
columns = ['UserNbr', 'Spclty'])
# convert Spclty sets to dicts:
df_out['Spclty'] = [dict(df_out['Spclty'][row]) for row in range(df_out.shape[0])]
The conversion to dict gets rid of specialties that are repeated between rows, In the output, a Spclty value should look like this:
{'104': '2010-01-31', '215': '2014-11-21', '352': '2016-07-13'}
except that there may be more key-value pairs than in any corresponding input row (resulting from aggregation over UserNbr).
I withdraw this question.
I had hoped there was an efficient way to use groupby with something else, but I haven't found any examples with a complex data structure like this one and have received no guidance.
For anyone who gets similarly stuck with very slow aggregation problems in Python, I suggest stepping up to PySpark. I am now tackling this problem with a Databricks notebook and am making headway with the pyspark.sql.window Window functions. (Now, it only takes minutes to run a test instead of hours!)
A partial solution is in the answer here:
PySpark list() in withColumn() only works once, then AssertionError: col should be Column

Adding the results of a for loop to the same key in a nested dictionary

I have a dictionary where it includes few sub-dictionaries in it. Each sub-dictionary has many keys. After running a for loop with an if condition too, the results are generated. I want to add ALL the results to under the desired key; but all what my code actually does is adding the result of the last iteration of the loop thereby replacing the value of the previous iteration.
But, actually, i want to print all the results.
for item in list1: #item is a tuple & list1 has tuples in it
if item == node_pair: #node pair is another tuple
high_p[i]["links"] = link_name #"links" is the key
desired output:
"links": [link_name1, link_name2, link_name3]
what i get:
"links" : link_name3
Please guide me..
So each sub-dictionary needs to have lists as values. You could pre-populate each sub-dictionary with lists ahead of time, but it's easier to create them on demand using setdefault.
for item in list1:
if item == node_pair:
high_p[i].setdefault("links", []).append(link_name)

comprehensive dict with multiple or repeated values in python3

I have an extensive list with tuples of pairs. It goes like this:
travels =[(passenger_1, destination_1), (passenger_2, destination_2),(passenger_1, destination_2)...]
And so on. Passengers and destinations may repeat and even the same passenger-destination tuple may repeat.
I want to make a comprehensive dict thay have as key each passenger and as value its most recurrent destination.
My first try was this:
dictionary = {k:v for k,v in travels}
but each key overwrites the last. I was hoping to get multiple values for each key so then i could count for each key. Then I tried like this:
dictionary = {k:v for k,v in travels if k not in dictionary else dictionary[k].append(v)}
but i can't call dictionary inside its own definition. Any ideas on how can i get it done? It's important that it's done comprehensively and not by loops.
That is how it can be done with for loop:
result = dict()
for passenger, destination in travels:
result.setdefault(passenger, list()).append(destination)
result is a single dictionary where keys are passengers, values are lists with destinations.
I doubt you can do the same with a single dictionary comprehesion expression since inside comprehension you can just generate elements but can not freely modify them.
EDIT.
If you want to (or have to) use comprehension expression no matter what then you can do it like this (2 comprehensions and no explicit loops):
result = {
passenger: [destination_
for passenger_, destination_
in travels
if passenger_ == passenger]
for passenger, dummy_destination
in travels}
This is a poor algorithm to get what you want. Its efficiency is O(n^2) while efficiency of the first method is O(n).

Resources