Remove redundant items from a list of dicts - python-3.x

I have a list of dicts based on user selections from a GUI (Plotly returns.) When a user clicks a data point (or group of data points), the datapoint(s) is added to the list.
However, if the user clicks the same data point (or selects a group of datapoints, which includes a datapoint already selected) then redundant dictionaries appear in the list for the redundant data point(s).
I.e.
[
{
"clicked": true,
"selected": true,
"hovered": false,
"x": 0,
"y": 71100.0988957607,
"selected_xcol": "injection_id",
"xvalue": "e54112f9-4497-4a7e-91cd-e26842a4092f",
"selected_ycol": "peak_area",
"yvalue": 71100.0988957607,
"injection_id": "e54112f9-4497-4a7e-91cd-e26842a4092f"
},
{
"clicked": true,
"selected": true,
"hovered": false,
"x": 0,
"y": 75283.2386064552,
"selected_xcol": "injection_id",
"xvalue": "e54112f9-4497-4a7e-91cd-e26842a4092f",
"selected_ycol": "peak_area",
"yvalue": 75283.2386064552,
"injection_id": "e54112f9-4497-4a7e-91cd-e26842a4092f"
},
{ # Redundant, same as first item
"clicked": true,
"selected": true,
"hovered": false,
"x": 0,
"y": 71100.0988957607,
"selected_xcol": "injection_id",
"xvalue": "e54112f9-4497-4a7e-91cd-e26842a4092f",
"selected_ycol": "peak_area",
"yvalue": 71100.0988957607,
"injection_id": "e54112f9-4497-4a7e-91cd-e26842a4092f"
}
]
Because users can select one or multiple datapoints in one GUI stroke, and the code doesn't know which, I simply add the returned list to the cumulative list like so...
LOCAL["selected_data"] += selectable_data_chart(LOCAL["df"],
key = "st_react_plotly_control_main_chart",
custom_data_columns = custom_data_columns,
hovertemplate = hovertemplate,
svgfilename = svgfilename)
I have tried filtering out the redundant items with ...
LOCAL["selected_data"] = list(set(LOCAL["selected_data"]))
...but it raises an error...
TypeError: unhashable type: 'dict'
I have also tried...
result = []
LOCAL["selected_data"] = [result.append(d) for d in LOCAL["selected_data"] if d not in result]
...but it returns null no matter what.
[
null,
null
]

You can't add a mutable value to a set (or use it as a dictionary key)...what if, after adding an item to a set, you changed the values so that it was identical to another set member? That would invalidate the guarantees provided by the set data type.
One possible solution is to transform your dictionaries into a structured type. For example, using the dataclasses module, we could write (assuming that your sample data is contained in the file data.json):
import json
import dataclasses
#dataclasses.dataclass(frozen=True)
class Event:
clicked: bool
selected: bool
hovered: bool
x: float
y: float
selected_xcol: str
xvalue: float
selected_ycol: str
yvalue: float
injection_id: str
with open("data.json") as fd:
data = json.load(fd)
events = set(Event(**item) for item in data)
As #lemon pointed out in a comment, this won't actually work for the sample data in your question, because the third item in the list is not identical to the first item (in the first item, x=0, but in the third item, x="e54112f9-4497-4a7e-91cd-e26842a4092f"). If this was just a typo when entering your question, the solution here will work just fine.
A less structured solution would be to transform each dictionary into a list of tuples using the items() method, turn that into a tuple, and then add those to your "unique" set:
import json
with open("data.json") as fd:
data = json.load(fd)
events = set(tuple(item.items()) for item in data)
In this case, events is a set of tuples; you could transform it back into a list of dictionaries like this:
dict_events = [dict(item) for item in events]

Related

Extracting string from lists of dictionaries (or generator)

I am scraping data with scrapetube to get the video IDs of all the videos from a YouTube channel. The scrape code returns a generator object which I have converted to a list of dictionaries containting other dictionaries, lists and string. The scraping code works, but here still some sample data. I am only interested in the string video Id --> see picture for illustration purposes
How to iterate through all the video IDs in the string videoId and save them in a new variable (list or dataframe) for further processing?
import scrapetube
vid = scrapetube.get_channel('UC_zxivooFdvF4uuBosUnJxQ')
type(vid) #generator file
video = next(vid) #extract values from generator & then convert it
videoL = list(vid) #convert it to a list
#code not working
for item in videoL['videoId']:
entry = {}
videoId = item['videoId']
for i in range(len(videoId)):
entry.append(int(videoId[i][0:10]))
#error message: TypeError: list indices must be integers or slices, not str
I used code snippet from this post but can't seem to make it work.
It's helpful when you know the terminology so let's go through it step by step.
What is a generator?
A generator, like it's name implies, generates values on demand.
Their usefulness in this case is that if you don't want to have all the data in memory, you only iterate over one generated value at a time and only extract what you need.
Consider this:
def gen_one_million():
for i in range(0, 1_000_000):
yield i
for i in gen_one_million():
# do something with i
Rather than having a million elements in a list or some container in memory, you only get one at a time. If you want them all in a list it's very easy to do with list(gen_one_million()) but you're not tied to having them all in memory if you don't need them.
What is a list and how do I use them?
A list in python is a container represented by brackets []. To access elements in a list you can index into it i = my_list[0] or iterate over it.
for i in my_list:
# do something with i
What is a dict and how do I use them?
A dict is a python key/value container type represented by curly braces and a colon between the key and value. {key: value}
To access values in a dict you can reference the key who's value you want i = my_dict[key] where key is a string or integer or some other hashable type. You can also iterate over it.
for key in my_dict:
# do something with the key
for value in my_dict.values():
# do something with the key
for key, value in my_dict.items():
# do something with the key and value
How does my case fit into all this?
Looking at your sample data it looks like you already have it converted from a generator to a list.
[
{
'videoId': '8vCvSmAIv1s',
'thumbnail': {
'thumbnails': [
{
'url': 'https://i.ytimg.com/vi/8vCvSmAIv1s/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLDn3-yb8BvctGrMxqabxa_nH-UYzQ',
'width': 168,
'height': 94}, # etc..
}
]
}
}
]
However, since you just need to iterate over it and access the 'videoID' key in each generated dict, there's no reason to convert.
Just iterate directly over the generator and access the key of each generated dict.
video_ids = []
for item in vid:
video_ids.append(item['videoId'])
Or even better, as a list comprehension.
video_ids = [item['videoId'] for item in vid]

How to dynamic call class instance attribute?

I am doing some parse work with hl7apy parse, and i occurred with one problem.
I use hl7apy to parse hl7 message, which can be parse:
from hl7apy.parser import parse_message
message = "MSH|^~\&|HIS|HIS|MediII|MediII|20170902141711||ORM^O01^ORM_O01|15b 37e7132504a0b95ade4654b596dc5|P|2.4\r"
msg = parse_message(message, find_groups=False)
print(msg.msh.msh_3.msh_3_1.value)
output:
'HIS'
so, how can i get field value dynamically according to field config?
for example, the msh field config :
{
"field": "msh",
"field_index": [3,1]
}
so the value can be find with:
msg.msh.msh_3.msh_3_1.value
and if config change to:
{
"field": "pid",
"field_index": [2,4]
}
the get field line will be:
msg.pid.pid_2.pid_2_4.value
You could combine a few list comprehensions and use getattr recursively.
# recursively get the methods from a list of names
def get_method(method, names):
if names:
return get_method(getattr(method, names[0]), names[1:])
return method
field_config = {
'field': 'msh',
'field_index': [3, 1]
}
# just get the field
field = field_config['field']
# get a list of the indexes as string. ['3', '1']
indexes = [str(i) for i in field_config['field_index']]
# join the indexes with a '_' starting with nothing
# and put it in a list of names. ['msh', 'msh_3', 'msh_3_1']
names = ['_'.join([field] + indexes[:i]) for i in range(len(indexes) + 1)]
# get the method from the recursive function
method = get_method(msg, names)
print(method.value)
As a disclaimer, I had no way of testing it so it may not work exactly as you expect it. But this should be a good starting point.

How to define a function that will convert list into dictionary while also inserting 'None' where there is no value for a key (Python3)?

Say we have a contact_list like this:
[["Joey", 30080],["Miranda"],["Lisa", 30081]]
So essentially "Miranda" doesn't have a zipcode, but with the function I want to define, I'd like it to automatically detect that and add "None" into her value slot, like this:
{
"Joey": 30080,
"Miranda": None,
"Lisa": 30081
}
So far I have this, which just converts the list to a dict:
def user_contacts(contact_list):
dict_contact = dict(contact_list)
print(dict_contact)
Not sure where I go from here, as far as writing in the code to add the None for "Miranda". Currently, I just get an error that says the 1st element ("Miranda") requires two lengths instead of one.
Eventually I want to just a pass any list like the one above in the defined function: user_contacts and, again, be able to get the dictionary above as the output.
user_contacts([["Joey", 30080],["Miranda"],["Lisa", 30081]])
so here is what you can do. you can check to see if the len of a certain element in your list, meets the expectations (in this case, 2 for name and zipcode). then if it fails the expectations, you can add "none":
contacts = [["Joey", 30080], ["Miranda"], ["Lisa", 30081]]
contacts_dict = {}
for c in contacts:
if len(c) < 2:
c.append('None')
contacts_dict.update({c[0]: c[1]})
print(contacts_dict)
and the output is:
{'Joey': 30080, 'Miranda': 'None', 'Lisa': 30081}
Try this:
def user_contacts(contact_list):
dict_contact = dict((ele[0], ele[1] if len(ele) > 1 else None) for ele in
contact_list)
print(dict_contact)

Iterating over a text column in a dataframe

DataFrame
Hi all. I am working on a dataframe (picture above) with over 18000 observations. What I'd like to do is to get the text in the column 'review' one after the other and then do a word count later on it. At the moment I have been trying to iterate over it but I have been getting error like "TypeError: 'float' object is not iterable". Here is the code I used:
def tokenize(text):
for row in text:
for i in row:
if i is not None:
words = i.lower().split()
return words
else:
return None
data['review_two'] = data['review'].apply(tokenize)
Now my question is: how do I iterate effectively and efficiently over the column 'review' so that I can now preprocess each row one after the other before I now perform word count on it?
My hypothesis for the error is that you have missing data, which is NaN and makes tokenize function fail. You can checkt it with pd.isnull(df["review"]), which will show you a boolean array that whether each line is NaN. If any(pd.isnull(df["review"])) is true, then there is a missing value in the column.
I cannot reproduce the error as I don't have the data, but I think your goal can be achieve with this.
from collections import Counter
df = pd.DataFrame([{"name": "A", "review": "No it is not good.", "rating":2},
{"name": "B", "review": "Awesome!", "rating":5},
{"name": "C", "review": "This is fine.", "rating":3},
{"name": "C", "review": "This is fine.", "rating":3}])
# first .lower and then .replace for punctuations and finally .split to get lists
df["splitted"] = df.review.str.lower().str.replace('[^\w\s]','').str.split()
# pass a counter to count every list. Then sum counters. (Counters can be added.)
df["splitted"].transform(lambda x: Counter(x)).sum()
Counter({'awesome': 1,
'fine': 2,
'good': 1,
'is': 3,
'it': 1,
'no': 1,
'not': 1,
'this': 2})
str.replace part is to remove punctuations see the answer Replacing punctuation in a data frame based on punctuation list from #EdChum
I'm not sure what you're trying to do, especially with for i in row. In any case, apply already iterates over the rows of your DataFrame/Series, so there's no need to do it in the function that you pass to apply.
Besides, your code does not return a TypeError for a DataFrame such as yours where the columns contain strings. See here for how to check if your 'review' column contains only text.
Maybe something like this, that gives you the word count, the rest I did not understand what you want.
import pandas as pd
a = ['hello friend', 'a b c d']
b = pd.DataFrame(a)
print(b[0].str.split().str.len())
>> 0 2
1 4

Nested dictionaries in Python: how to make them and how to use them?

I'm still trying to figure it out how nested dictionaries in python really works.
I know that when you're using [] it's a list, () it's a tuple and {} a dict.
But when you want to make a nested dictionaries like this structure (that's what a i want) :
{KeyA :
{ValueA :
[KeyB : ValueB],
[Keyc : ValueC],
[KeyD : ValueD]},
{ValueA for each ValueD]}}
For now I have a dict like:
{KeyA : {KeyB : [ValueB],
KeyC : [ValueC],
KeyD : [ValueD]}}
Here's my code:
json_file = importation()
dict_guy = {}
for key, value in json_file['clients'].items():
n_customerID = normalization(value['shortname'])
if n_customerID not in dict_guy:
dict_guy[n_customerID] = {
'clientsName':[],
'company':[],
'contacts':[], }
dict_guy[n_customerID]['clientsName'].append(n_customerID)
dict_guy[n_customerID]['company'].append(normalization(value['name']))
dict_guy[n_customerID]['contacts'].extend([norma_email(item) for item in v\
alue['contacts']])
Can someone please, give me more informations or really explain to me how a nested dict works?
So, I hope I get it right from our conversation in the comments :)
json_file = importation()
dict_guy = {}
for key, value in json_file['clients'].items():
n_customerID = normalization(value['shortname'])
if n_customerID not in dict_guy:
dict_guy[n_customerID] = {
'clientsName':[],
'company':[],
'contacts':{}, } # Assign empty dict, not list
dict_guy[n_customerID]['clientsName'].append(n_customerID)
dict_guy[n_customerID]['company'].append(normalization(value['name']))
for item in value['contacts']:
normalized_email = norma_email(item)
# Use the contacts dictionary like every other dictionary
dict_guy[n_customerID]['contacts'][normalized_email] = n_customerID
There is no problem to simply assign a dictionary to a key inside another dictionary. That's what I do in this code sample. You can create dictionaries nested as deep as you wish.
How that this helped you. If not, we'll work on it further :)
EDIT:
About list/dict comprehensions. You are almost right that:
I know that when you're using [] it's a list, () it's a tuple and {} a dict.
The {} brackets are a little tricky in Python 3. They can be used to create a dictionary as well as a set!
a = {} # a becomes an empty dictionary
a = set() # a becomes an empty set
a = {1,2,3} # a becomes a set with 3 values
a = {1: 1, 2: 4, 3: 9} # a becomes a dictionary with 3 keys
a = {x for x in range(10)} # a becomes a set with 10 elements
a = {x: x*x for x in range(10)} # a becomes a dictionary with 10 keys
Your line dict_guy[n_customerID] = { {'clientsName':[], 'company':[], 'contacts':[]}} tried to create a set with a single dictionary in it and because dictionaries are not hashable, you got the TypeError exception informing you that something is not hashable :) (sets can store only ements that are hashable)
Check out this page.
example = {'app_url': '', 'models': [{'perms': {'add': True, 'change': True,
'delete': True}, 'add_url': '/admin/cms/news/add/', 'admin_url': '/admin/cms/news/',
'name': ''}], 'has_module_perms': True, 'name': u'CMS'}

Resources