Assigning multiple values to dictionary keys from a file in Python 3 - python-3.x

I'm fairly new to Python but I haven't found the answer to this particular problem.
I am writing a simple recommendation program and I need to have a dictionary where cuisine is a key and name of a restaurant is a value. There are a few instances where I have to split a string of a few cuisine names and make sure all other restaurants (values) which have the same cuisine get assigned to the same cuisine (key). Here's a part of a file:
Georgie Porgie
87%
$$$
Canadian, Pub Food
Queen St. Cafe
82%
$
Malaysian, Thai
Mexican Grill
85%
$$
Mexican
Deep Fried Everything
52%
$
Pub Food
so it's just the first and the last one with the same cuisine but there are more later in the file.
And here is my code:
def new(file):
file = "/.../Restaurants.txt"
d = {}
key = []
with open(file) as file:
lines = file.readlines()
for i in range(len(lines)):
if i % 5 == 0:
if "," not in lines[i + 3]:
d[lines[i + 3].strip()] = [lines[i].strip()]
else:
key += (lines[i + 3].strip().split(', '))
for j in key:
if j not in d:
d[j] = [lines[i].strip()]
else:
d[j].append(lines[i].strip())
return d
It gets all the keys and values printed but it doesn't assign two values to the same key where it should. Also, with this last 'else' statement, the second restaurant is assigned to the wrong key as a second value. This should not happen. I would appreciate any comments or help.

In the case when there is only one category you don't check if the key is in the dictionary. You should do this analogously as in the case of multiple categories and then it works fine.
I don't know why you have file as an argument when you have a file then overwritten.
Additionally you should make 'key' for each result, and not += (adding it to the existing 'key'
when you check if j is in dictionary, clean way is to check if j is in the keys (d.keys())
def new(file):
file = "/.../Restaurants.txt"
d = {}
key = []
with open(file) as file:
lines = file.readlines()
for i in range(len(lines)):
if i % 5 == 0:
if "," not in lines[i + 3]:
if lines[i + 3] not in d.keys():
d[lines[i + 3].strip()] = [lines[i].strip()]
else:
d[lines[i + 3]].append(lines[i].strip())
else:
key = (lines[i + 3].strip().split(', '))
for j in key:
if j not in d.keys():
d[j] = [lines[i].strip()]
else:
d[j].append(lines[i].strip())
return d

Normally, I find that if you use names for the dictionary keys, you may have an easier time handling them later.
In the example below, I return a series of dictionaries, one for each restaurant. I also wrap the functionality of processing the values in a method called add_value(), to keep the code more readable.
In my example, I'm using codecs to decode the value. Although not necessary, depending on the characters you are dealing with it may be useful. I'm also using itertools to read the file lines with an iterator. Again, not necessary depending on the case, but might be useful if you are dealing with really big files.
import copy, itertools, codecs
class RestaurantListParser(object):
file_name = "restaurants.txt"
base_item = {
"_type": "undefined",
"_fields": {
"name": "undefined",
"nationality": "undefined",
"rating": "undefined",
"pricing": "undefined",
}
}
def add_value(self, formatted_item, field_name, field_value):
if isinstance(field_value, basestring):
# handle encoding, strip, process the values as you need.
field_value = codecs.encode(field_value, 'utf-8').strip()
formatted_item["_fields"][field_name] = field_value
else:
print 'Error parsing field "%s", with value: %s' % (field_name, field_value)
def generator(self, file_name):
with open(file_name) as file:
while True:
lines = tuple(itertools.islice(file, 5))
if not lines: break
# Initialize our dictionary for this item
formatted_item = copy.deepcopy(self.base_item)
if "," not in lines[3]:
formatted_item['_type'] = lines[3].strip()
else:
formatted_item['_type'] = lines[3].split(',')[1].strip()
self.add_value(formatted_item, 'nationality', lines[3].split(',')[0])
self.add_value(formatted_item, 'name', lines[0])
self.add_value(formatted_item, 'rating', lines[1])
self.add_value(formatted_item, 'pricing', lines[2])
yield formatted_item
def split_by_type(self):
d = {}
for restaurant in self.generator(self.file_name):
if restaurant['_type'] not in d:
d[restaurant['_type']] = [restaurant['_fields']]
else:
d[restaurant['_type']] += [restaurant['_fields']]
return d
Then, if you run:
p = RestaurantListParser()
print p.split_by_type()
You should get:
{
'Mexican': [{
'name': 'Mexican Grill',
'nationality': 'undefined',
'pricing': '$$',
'rating': '85%'
}],
'Pub Food': [{
'name': 'Georgie Porgie',
'nationality': 'Canadian',
'pricing': '$$$',
'rating': '87%'
}, {
'name': 'Deep Fried Everything',
'nationality': 'undefined',
'pricing': '$',
'rating': '52%'
}],
'Thai': [{
'name': 'Queen St. Cafe',
'nationality': 'Malaysian',
'pricing': '$',
'rating': '82%'
}]
}
Your solution is simple, so it's ok. I'd just like to mention a couple of ideas that come to mind when I think about this kind of problem.

Here's another take, using defaultdict and split to simplify things.
from collections import defaultdict
record_keys = ['name', 'rating', 'price', 'cuisine']
def load(file):
with open(file) as file:
data = file.read()
restaurants = []
# chop up input on each blank line (2 newlines in a row)
for record in data.split("\n\n"):
fields = record.split("\n")
# build a dictionary by zipping together the fixed set
# of field names and the values from this particular record
restaurant = dict(zip(record_keys, fields))
# split chops apart the type cuisine on comma, then _.strip()
# removes any leading/trailing whitespace on each type of cuisine
restaurant['cuisine'] = [_.strip() for _ in restaurant['cuisine'].split(",")]
restaurants.append(restaurant)
return restaurants
def build_index(database, key, value):
index = defaultdict(set)
for record in database:
for v in record.get(key, []):
# defaultdict will create a set if one is not present or add to it if one does
index[v].add(record[value])
return index
restaurant_db = load('/var/tmp/r')
print(restaurant_db)
by_type = build_index(restaurant_db, 'cuisine', 'name')
print(by_type)

Related

Data Structure Option

I'm wondering what appropriate data structure I'm going to use to store information about chemical elements that I have in a text file. My program should
read and process input from the user. If the user enters an integer then it program
should display the symbol and name of the element with the number of protons
entered. If the user enters a string then my program should display the number
of protons for the element with that name or symbol.
The text file is formatted as below
# element.txt
1,H,Hydrogen
2,He,Helium
3,Li,Lithium
4,Be,Beryllium
...
I thought of dictionary but figured that mapping a string to a list can be tricky as my program would respond based on whether the user provides an integer or a string.
You shouldn't be worried about the "performance" of looking for an element:
There are no more than 200 elements, which is a small number for a computer;
Since the program interacts with a human user, the human will be orders of magnitude slower than the computer anyway.
Option 1: pandas.DataFrame
Hence I suggest a simple pandas DataFrame:
import pandas as pd
df = pd.read_csv('element.txt')
df.columns = ['Number', 'Symbol', 'Name']
def get_column_and_key(s):
s = s.strip()
try:
k = int(s)
return 'Number', k
except ValueError:
if len(s) <= 2:
return 'Symbol', s
else:
return 'Name', s
def find_element(s):
column, key = get_column_and_key(s)
return df[df[column] == key]
def play():
keep_going = True
while keep_going:
s = input('>>>> ')
if s[0] == 'q':
keep_going = False
else:
print(find_element(s))
if __name__ == '__main__':
play()
See also:
Finding elements in a pandas dataframe
Option 2: three redundant dicts
One of python's most used data structures is dict. Here we have three different possible keys, so we'll use three dict.
import csv
with open('element.txt', 'r') as f:
data = csv.reader(f)
elements_by_num = {}
elements_by_symbol = {}
elements_by_name = {}
for row in data:
num, symbol, name = int(row[0]), row[1], row[2]
elements_by_num[num] = num, symbol, name
elements_by_symbol[symbol] = num, symbol, name
elements_by_name[name] = num, symbol, name
def get_dict_and_key(s):
s = s.strip()
try:
k = int(s)
return elements_by_num, k
except ValueError:
if len(s) <= 2:
return elements_by_symbol, s
else:
return elements_by_name, s
def find_element(s):
d, key = get_dict_and_key(s)
return d[key]
def play():
keep_going = True
while keep_going:
s = input('>>>> ')
if s[0] == 'q':
keep_going = False
else:
print(find_element(s))
if __name__ == '__main__':
play()
You are right that it is tricky. However, I suggest you just make three dictionaries. You certainly can just store the data in a 2d list, but that'd be way harder to make and access than using three dicts. If you desire, you can join the three dicts into one. I personally wouldn't, but the final choice is always up to you.
weight = {1: ("H", "Hydrogen"), 2: ...}
symbol = {"H": (1, "Hydrogen"), "He": ...}
name = {"Hydrogen": (1, "H"), "Helium": ...}
If you want to get into databases and some QLs, I suggest looking into sqlite3. It's a classic, thus it's well documented.

How to add data with same key in dictionary [duplicate]

I have a text file which contains duplicate car registration numbers with different values, like so:
EDF768, Bill Meyer, 2456, Vet_Parking
TY5678, Jane Miller, 8987, AgHort_Parking
GEF123, Jill Black, 3456, Creche_Parking
ABC234, Fred Greenside, 2345, AgHort_Parking
GH7682, Clara Hill, 7689, AgHort_Parking
JU9807, Jacky Blair, 7867, Vet_Parking
KLOI98, Martha Miller, 4563, Vet_Parking
ADF645, Cloe Freckle, 6789, Vet_Parking
DF7800, Jacko Frizzle, 4532, Creche_Parking
WER546, Olga Grey, 9898, Creche_Parking
HUY768, Wilbur Matty, 8912, Creche_Parking
EDF768, Jenny Meyer, 9987, Vet_Parking
TY5678, Jo King, 8987, AgHort_Parking
JU9807, Mike Green, 3212, Vet_Parking
I want to create a dictionary from this data, which uses the registration numbers (first column) as keys and the data from the rest of the line for values.
I wrote this code:
data_dict = {}
data_list = []
def createDictionaryModified(filename):
path = "C:\Users\user\Desktop"
basename = "ParkingData_Part3.txt"
filename = path + "//" + basename
file = open(filename)
contents = file.read()
print(contents,"\n")
data_list = [lines.split(",") for lines in contents.split("\n")]
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
data_dict[regNumber] = details
print(data_dict,"\n")
print(data_dict.items(),"\n")
print(data_dict.values())
The problem is that the data file contains duplicate values for the registration numbers. When I try to store them in the same dictionary with data_dict[regNumber] = details, the old value is overwritten.
How do I make a dictionary with duplicate keys?
Sometimes people want to "combine" or "merge" multiple existing dictionaries by just putting all the items into a single dict, and are surprised or annoyed that duplicate keys are overwritten. See the related question How to merge dicts, collecting values from matching keys? for dealing with this problem.
Python dictionaries don't support duplicate keys. One way around is to store lists or sets inside the dictionary.
One easy way to achieve this is by using defaultdict:
from collections import defaultdict
data_dict = defaultdict(list)
All you have to do is replace
data_dict[regNumber] = details
with
data_dict[regNumber].append(details)
and you'll get a dictionary of lists.
You can change the behavior of the built in types in Python. For your case it's really easy to create a dict subclass that will store duplicated values in lists under the same key automatically:
class Dictlist(dict):
def __setitem__(self, key, value):
try:
self[key]
except KeyError:
super(Dictlist, self).__setitem__(key, [])
self[key].append(value)
Output example:
>>> d = dictlist.Dictlist()
>>> d['test'] = 1
>>> d['test'] = 2
>>> d['test'] = 3
>>> d
{'test': [1, 2, 3]}
>>> d['other'] = 100
>>> d
{'test': [1, 2, 3], 'other': [100]}
Rather than using a defaultdict or messing around with membership tests or manual exception handling, use the setdefault method to add new empty lists to the dictionary when they're needed:
results = {} # use a normal dictionary for our output
for k, v in some_data: # the keys may be duplicates
results.setdefault(k, []).append(v) # magic happens here!
setdefault checks to see if the first argument (the key) is already in the dictionary. If doesn't find anything, it assigns the second argument (the default value, an empty list in this case) as a new value for the key. If the key does exist, nothing special is done (the default goes unused). In either case though, the value (whether old or new) gets returned, so we can unconditionally call append on it (knowing it should always be a list).
You can't have a dict with duplicate keys for definition!
Instead you can use a single key and, as the value, a list of elements that had that key.
So you can follow these steps:
See if the current element's key (of your initial set) is in the final dict. If it is, go to step 3
Update dict with key
Append the new value to the dict[key] list
Repeat [1-3]
If you want to have lists only when they are necessary, and values in any other cases, then you can do this:
class DictList(dict):
def __setitem__(self, key, value):
try:
# Assumes there is a list on the key
self[key].append(value)
except KeyError: # If it fails, because there is no key
super(DictList, self).__setitem__(key, value)
except AttributeError: # If it fails because it is not a list
super(DictList, self).__setitem__(key, [self[key], value])
You can then do the following:
dl = DictList()
dl['a'] = 1
dl['b'] = 2
dl['b'] = 3
Which will store the following {'a': 1, 'b': [2, 3]}.
I tend to use this implementation when I want to have reverse/inverse dictionaries, in which case I simply do:
my_dict = {1: 'a', 2: 'b', 3: 'b'}
rev = DictList()
for k, v in my_dict.items():
rev_med[v] = k
Which will generate the same output as above: {'a': 1, 'b': [2, 3]}.
CAVEAT: This implementation relies on the non-existence of the append method (in the values you are storing). This might produce unexpected results if the values you are storing are lists. For example,
dl = DictList()
dl['a'] = 1
dl['b'] = [2]
dl['b'] = 3
would produce the same result as before {'a': 1, 'b': [2, 3]}, but one might expected the following: {'a': 1, 'b': [[2], 3]}.
You can refer to the following article:
http://www.wellho.net/mouth/3934_Multiple-identical-keys-in-a-Python-dict-yes-you-can-.html
In a dict, if a key is an object, there are no duplicate problems.
For example:
class p(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
def __str__(self):
return self.name
d = {p('k'): 1, p('k'): 2}
You can't have duplicated keys in a dictionary. Use a dict of lists:
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
if not data_dict.has_key(regNumber):
data_dict[regNumber] = [details]
else:
data_dict[regNumber].append(details)
It's pertty old question but maybe my solution help someone.
by overriding __hash__ magic method, you can save same objects in dict.
Example:
from random import choices
class DictStr(str):
"""
This class behave exacly like str class but
can be duplicated in dict
"""
def __new__(cls, value='', custom_id='', id_length=64):
# If you want know why I use __new__ instead of __init__
# SEE: https://stackoverflow.com/a/2673863/9917276
obj = str.__new__(cls, value)
if custom_id:
obj.id = custom_id
else:
# Make a string with length of 64
choice_str = "abcdefghijklmopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ1234567890"
obj.id = ''.join(choices(choice_str, k=id_length))
return obj
def __hash__(self) -> int:
return self.id.__hash__()
Now lets create a dict:
>>> a_1 = DictStr('a')
>>> a_2 = DictStr('a')
>>> a_3 = 'a'
>>> a_1
a
>>> a_2
a
>>> a_1 == a_2 == a_3
True
>>> d = dict()
>>> d[a_1] = 'some_data'
>>> d[a_2] = 'other'
>>> print(d)
{'a': 'some_data', 'a': 'other'}
NOTE: This solution can apply to any basic data structure like (int, float,...)
EXPLANATION :
We can use almost any object as key in dict class (or mostly known as HashMap or HashTable in other languages) but there should be a way to distinguish between keys because dict have no idea about objects.
For this purpose objects that want to add to dictionary as key somehow have to provide a unique identifier number(I name it uniq_id, it's actually a number somehow created with hash algorithm) for themself.
Because dictionary structure widely use in most of solutions,
most of programming languages hide object uniq_id generation inside a hash name buildin method that feed dict in key search
So if you manipulate hash method of your class you can change behaviour of your class as dictionary key
Dictionary does not support duplicate key, instead you can use defaultdict
Below is the example of how to use defaultdict in python3x to solve your problem
from collections import defaultdict
sdict = defaultdict(list)
keys_bucket = list()
data_list = [lines.split(",") for lines in contents.split("\n")]
for data in data_list:
key = data.pop(0)
detail = data
keys_bucket.append(key)
if key in keys_bucket:
sdict[key].append(detail)
else:
sdict[key] = detail
print("\n", dict(sdict))
Above code would produce output as follow:
{'EDF768': [[' Bill Meyer', ' 2456', ' Vet_Parking'], [' Jenny Meyer', ' 9987', ' Vet_Parking']], 'TY5678': [[' Jane Miller', ' 8987', ' AgHort_Parking'], [' Jo King', ' 8987', ' AgHort_Parking']], 'GEF123': [[' Jill Black', ' 3456', ' Creche_Parking']], 'ABC234': [[' Fred Greenside', ' 2345', ' AgHort_Parking']], 'GH7682': [[' Clara Hill', ' 7689', ' AgHort_Parking']], 'JU9807': [[' Jacky Blair', ' 7867', ' Vet_Parking'], [' Mike Green', ' 3212', ' Vet_Parking']], 'KLOI98': [[' Martha Miller', ' 4563', ' Vet_Parking']], 'ADF645': [[' Cloe Freckle', ' 6789', ' Vet_Parking']], 'DF7800': [[' Jacko Frizzle', ' 4532', ' Creche_Parking']], 'WER546': [[' Olga Grey', ' 9898', ' Creche_Parking']], 'HUY768': [[' Wilbur Matty', ' 8912', ' Creche_Parking']]}

How do I convert a text file into a dictionary

I would like to parse the following text file into a dictionary:
Train A
Travelled 150km
No longer in use
Stored in warehouse
Train B
Travelled 100km
Used by X company
Daily usage
Actively upgrading
The end result dictionary should have Train A and Train B as keys, and the rest of values as list of values:
{
'Train A': ['Travelled 150km', 'No longer in use', 'Stored in warehouse'],
'Train B': ['Travelled 100km', 'Used by X company', 'Daily usage', 'Actively upgrading']
}
I've currently tried
with open('file.txt') as f:
data = f.read().split('\n')
dict = {}
for i in data:
key = i[0]
value = i[1:]
d[key] = value
print(dict)
Really not too sure where im wrong. I want to split the \n after Train A, where Train A is Key and all the other information listed is the value
Your units are separated by blank lines - so, you should first split by two newlines, not by one. The following implementation is somewhat inefficient (it splits the same variable twice), but it works, and you can improve it if you want:
[{x.split("\n")[0]: x.split("\n")[1:]} for x in data.split("\n\n")]
#[{'Train A': ['Travelled 150km', 'No longer in use',
# 'Stored in warehouse']},
# {'Train B': ['Travelled 100km', 'Used by X company',
# 'Daily usage', 'Actively upgrading'}]
You are close. You need to split the files using empty line first ('\n\n'), then continue on with your idea.
with open('file.txt') as f:
data = f.read().split('\n\n') # <=== this is what's missing
print(data)
d = {}
for i in data:
i = i.split('\n')
key = i[0]
value = i[1:]
d[key] = value
print(d)

Python 3 Having an error saving a file, opening it by a different program and comparing list values

I am ultimately trying to save variables that my browser finds to a file in order for it to be later recalled in order to compare if it has went through those values before. Before I reach that step, I am testing my code and have been running into issues:
First part of my code with no error:
import shelve
shelfFile = shelve.open('mydata')
cats = ['Zophie', 'Pooka', 'Simon']
shelfFile['cats'] = cats
shelfFile.close()
This does what it is intended to do, saves cats to a file.
import shelve
shelfFile = shelve.open('mydata')
cats = shelfFile['cats']
shelfFile.close()
new = 'Zophie', 'Pooka', 'Simon'
if new in cats:
print('Found it!')
else:
print("There is an error")
When I run the code it tells me there is an error rather than saying that it found it. Since the list variables are the same, why are they not matching?
I haven't seen a declaration of variables through the "comma-separated list" as you did: new = 'Zophie', 'Pooka', 'Simon'.
I'm pretty sure you just did typo and should use an array for names:
new = ['Zophie', 'Pooka', 'Simon']
for item in new:
if item in cats:
print('Found it!')
else:
print("Not found")
or:
new = ['Zophie', 'Pooka', 'Simon']
for item in new:
if item in cats:
print(f'Found {item}!')
else:
print(f'{item} is not found')
You are checking a tuple available in a list
new = 'Zophie', 'Pooka', 'Simon' is a tuple object
and you are expecting a cats = ['Zophie', 'Pooka', 'Simon'] as a list.
If cats = shelfFile['cats'] returns the list of
cats then you need to do is
for n in new:
if n in cats:
print('Found it!')
else:
print('Not found')
You can demonstrate the logic by running this script
>>> x = "a", "b", "c"
>>> print(x)
('a', 'b', 'c')
>>> y = ["a", "b", "c"]
>>> x in y
False
>>> for n in x:
... if n in y:
... print("found")
found
found
found

Never resets list

I am trying to create a calorie counter the standard input goes like this:
python3 calories.txt < test.txt
Inside calories the food is the following format: apples 500
The problem I am having is that whenever I calculate the values for the person it seems to never return to an empty list..
import sys
food = {}
eaten = {}
finished = {}
total = 0
#mappings
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1]
food[key] = value
def calculate(x):
a = []
for keys,values in x.items():
for c in values:
try:
a.append(int(food[c]))
except:
a.append(100)
print("before",a)
a = []
total = sum(a) # Problem here
print("after",a)
print(total)
def main():
calories(sys.argv[1])
for line in sys.stdin:
lines = line.strip().split(',')
for c in lines:
values = lines[0]
keys = lines[1:]
eaten[values] = keys
calculate(eaten)
if __name__ == '__main__':
main()
Edit - forgot to include what test.txt would look like:
joe,almonds,almonds,blue cheese,cabbage,mayonnaise,cherry pie,cola
mary,apple pie,avocado,broccoli,butter,danish pastry,lettuce,apple
sandy,zuchini,yogurt,veal,tuna,taco,pumpkin pie,macadamia nuts,brazil nuts
trudy,waffles,waffles,waffles,chicken noodle soup,chocolate chip cookie
How to make it easier on yourself:
When reading the calories-data, convert the calories to int() asap, no need to do it every time you want to sum up somthing that way.
Dictionary has a .get(key, defaultvalue) accessor, so if food not found, use 100 as default is a 1-liner w/o try: ... except:
This works for me, not using sys.stdin but supplying the second file as file as well instead of piping it into the program using <.
I modified some parsings to remove whitespaces and return a [(name,cal),...] tuplelist from calc.
May it help you to fix it to your liking:
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1].strip() # ensure no whitespaces in
food[key] = int(value)
def getCal(foodlist, defValueUnknown = 100):
"""Get sum / total calories of a list of ingredients, unknown cost 100."""
return sum( food.get(x,defValueUnknown ) for x in foodlist) # calculate it, if unknown assume 100
def calculate(x):
a = []
for name,foods in x.items():
a.append((name, getCal(foods))) # append as tuple to list for all names/foods eaten
return a
def main():
calories(sys.argv[1])
with open(sys.argv[2]) as f: # parse as file, not piped in via sys.stdin
for line in f:
lines = line.strip().split(',')
for c in lines:
values = lines[0].strip()
keys = [x.strip() for x in lines[1:]] # ensure no whitespaces in
eaten[values] = keys
calced = calculate(eaten) # calculate after all are read into the dict
print (calced)
Output:
[('joe', 1400), ('mary', 1400), ('sandy', 1600), ('trudy', 1000)]
Using sys.stdin and piping just lead to my console blinking and waiting for manual input - maybe VS related...

Resources