I would like to parse the following text file into a dictionary:
Train A
Travelled 150km
No longer in use
Stored in warehouse
Train B
Travelled 100km
Used by X company
Daily usage
Actively upgrading
The end result dictionary should have Train A and Train B as keys, and the rest of values as list of values:
{
'Train A': ['Travelled 150km', 'No longer in use', 'Stored in warehouse'],
'Train B': ['Travelled 100km', 'Used by X company', 'Daily usage', 'Actively upgrading']
}
I've currently tried
with open('file.txt') as f:
data = f.read().split('\n')
dict = {}
for i in data:
key = i[0]
value = i[1:]
d[key] = value
print(dict)
Really not too sure where im wrong. I want to split the \n after Train A, where Train A is Key and all the other information listed is the value
Your units are separated by blank lines - so, you should first split by two newlines, not by one. The following implementation is somewhat inefficient (it splits the same variable twice), but it works, and you can improve it if you want:
[{x.split("\n")[0]: x.split("\n")[1:]} for x in data.split("\n\n")]
#[{'Train A': ['Travelled 150km', 'No longer in use',
# 'Stored in warehouse']},
# {'Train B': ['Travelled 100km', 'Used by X company',
# 'Daily usage', 'Actively upgrading'}]
You are close. You need to split the files using empty line first ('\n\n'), then continue on with your idea.
with open('file.txt') as f:
data = f.read().split('\n\n') # <=== this is what's missing
print(data)
d = {}
for i in data:
i = i.split('\n')
key = i[0]
value = i[1:]
d[key] = value
print(d)
Related
I am learning Python. I tried to get the keys of a dictionary. But I only get the last key. In my understanding, method keys() is used to get all keys in the dictionary.
Following are my questions?
1. Why I cannot get all keys?
2. If I have a dictionary, how can I get the value if I know the key? e.g. dict = {'Ben':8, 'Joe':7, 'Mary' : 9}. How can I input the key = "Ben", so the program can output the value 8? The tutorial shows that the key must be immutable. This constraint is very inconvenient when trying to get a value with a given key.
Any suggestion would be highly appreciated.
Here are my code.
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
input_control= work_path + "/" + input_control_file
#open control file if file exist
#read setting info
try:
#if the file does not exist,
#then it would throw an IOError
f = open(input_control, 'r')
#define dictionary/hash table
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic = {lst[0].strip():lst[1].strip()}
except IOError:
# print(os.error) will <class 'OSError'>
print("Reading file error. File " + input_control + " does not exist.")
#get keys
def getkeys(dict):
return list(dict.keys())
print("l39")
print(getkeys(dic))
print("end")
Below are the outputs.
l39
['source_type']
end
The reason is that you are reassigning variable dic again in for loop. You are not updating or adding the dictionary, instead you are reassigning the variable. In that case, dic will have only the last entry. You can change your for loop to:
dic = {}
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic.update({lst[0].strip():lst[1].strip()}) # update the dictionary with new values.
For your other question, if you have the dictionary dic = {'Ben':8, 'Joe':7, 'Mary' : 9}, then you can get the value by: dic['Ben']. It will return the value 8 or will raise KeyError if key Ben is not found in the dictionary. To avoid KeyError, you can use the get() method of dictionary. It will return None if provided key is not found in the dictionary.
val = dic['Ben'] # returns 8
val = dic['Hen'] # will raise KeyError
val = dic.get('Hen') # will return None
In your for loop, you are re-initializing the dictionary value, while you need to update the dictionary, i.e., append the key-value pair to the pre-existing dictionary. For this, use
dic.update({lst[0].strip() : lst[1].strip()})
This will update the key-value pair to the dictionary. Now, when you use dic.keys(), you will get all the keys of dic, as a list.
As for your second question, access the dictionary, just like accessing a list, except that list is accessed with indices, and dictionary will be accessed by keys. Say, you have a list and a dictionary as
lst = [1, 2, 3, 4, 5]
dic = {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4, 'e' : 5}
To get value 2 from list, you do lst[1], i.e., value at index 1. Similarly, if you want to get the value 2 from dictionary, do dic['b'], i.e., value of key 'b'. It is as simple as that.
I have data in csv - 2 columns, 1st column contains member id and second contains characteristics in Key-Value pairs (nested one under another).
I have seen online codes which convert a simple Key-value pairs but not able to transform data like what i have shown above
I want to transform this data into a excel table as below
I did it with this XlsxWriter package, so first you have to install it by running pip install XlsxWriter command.
import csv # to read csv file
import xlsxwriter # to write xlxs file
import ast
# you can change this names according to your local ones
csv_file = 'data.csv'
xlsx_file = 'data.xlsx'
# read the csv file and get all the JSON values into data list
data = []
with open(csv_file, 'r') as csvFile:
# read line by line in csv file
reader = csv.reader(csvFile)
# convert every line into list and select the JSON values
for row in list(reader)[1:]:
# csv are comma separated, so combine all the necessary
# part of the json with comma
json_to_str = ','.join(row[1:])
# convert it to python dictionary
str_to_dict = ast.literal_eval(json_to_str)
# append those completed JSON into the data list
data.append(str_to_dict)
# define the excel file
workbook = xlsxwriter.Workbook(xlsx_file)
# create a sheet for our work
worksheet = workbook.add_worksheet()
# cell format for merge fields with bold and align center
# letters and design border
merge_format = workbook.add_format({
'bold': 1,
'border': 1,
'align': 'center',
'valign': 'vcenter'})
# other cell format to design the border
cell_format = workbook.add_format({
'border': 1,
})
# create the header section dynamically
first_col = 0
last_col = 0
for index, value in enumerate(data[0].items()):
if isinstance(value[1], dict):
# this if mean the JSON key has something else
# other than the single value like dict or list
last_col += len(value[1].keys())
worksheet.merge_range(first_row=0,
first_col=first_col,
last_row=0,
last_col=last_col,
data=value[0],
cell_format=merge_format)
for k, v in value[1].items():
# this is for go in deep the value if exist
worksheet.write(1, first_col, k, merge_format)
first_col += 1
first_col = last_col + 1
else:
# 'age' has only one value, so this else section
# is for create normal headers like 'age'
worksheet.write(1, first_col, value[0], merge_format)
first_col += 1
# now we know how many columns exist in the
# excel, and set the width to 20
worksheet.set_column(first_col=0, last_col=last_col, width=20)
# filling values to excel file
for index, value in enumerate(data):
last_col = 0
for k, v in value.items():
if isinstance(v, dict):
# this is for handle values with dictionary
for k1, v1 in v.items():
if isinstance(v1, list):
# this will capture last 'type' list (['Grass', 'Hardball'])
# in the 'conditions'
worksheet.write(index + 2, last_col, ', '.join(v1), cell_format)
else:
# just filling other values other than list
worksheet.write(index + 2, last_col, v1, cell_format)
last_col += 1
else:
# this is handle single value other than dict or list
worksheet.write(index + 2, last_col, v, cell_format)
last_col += 1
# finally close to create the excel file
workbook.close()
I commented out most of the line to get better understand and reduce the complexity because you are very new to Python. If you didn't get any point let me know, I'll explain as much as I can. Additionally I used enumerate() python Built-in Function. Check this small example which I directly get it from original documentation. This enumerate() is useful when numbering items in the list.
Return an enumerate object. iterable must be a sequence, an iterator, or some other object which supports iteration. The __next__() method of the iterator returned by enumerate() returns a tuple containing a count (from start which defaults to 0) and the values obtained from iterating over iterable.
>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
Here is my csv file,
and here is the final output of the excel file. I just merged the duplicate header values (matchruns and conditions).
I'm trying to make a function that opens a textfile, counts the words and then returns the top 7 words in percent. However when I run the code with "return" it only retrieves the first value but "print" gives me what I need + a "None".
def count_wfrequency(fname):
with open(fname,"r") as f:
dictionary = dict()
for line in f:
line = line.rstrip()
delaorden = line.split()
for word in delaorden:
dictionary[word] = dictionary.get(word,0) + 1
tmp = list()
for key, value in dictionary.items():
tmp.append((value, key))
tmp.sort(reverse=True)
for value, key in tmp[:7]:
b = ("{0:.0f}%".format(value / len(tmp)* 100))
print(key, b, end=", ")
the 9%, to 6%, of 5%, and 4%, in 3%, a 3%, would 2%, Frequency of words: None
What am I doing wrong?
edit: To clarify, I want to make it so that it prints out "Frequency of words: the 9%, to 6%...and so on.
Trying to use the function like this:
if args.word_frequency:
print("Frequency of words: {}".format(funk.count_wfrequency(args.filename)))
I'm fairly new to Python but I haven't found the answer to this particular problem.
I am writing a simple recommendation program and I need to have a dictionary where cuisine is a key and name of a restaurant is a value. There are a few instances where I have to split a string of a few cuisine names and make sure all other restaurants (values) which have the same cuisine get assigned to the same cuisine (key). Here's a part of a file:
Georgie Porgie
87%
$$$
Canadian, Pub Food
Queen St. Cafe
82%
$
Malaysian, Thai
Mexican Grill
85%
$$
Mexican
Deep Fried Everything
52%
$
Pub Food
so it's just the first and the last one with the same cuisine but there are more later in the file.
And here is my code:
def new(file):
file = "/.../Restaurants.txt"
d = {}
key = []
with open(file) as file:
lines = file.readlines()
for i in range(len(lines)):
if i % 5 == 0:
if "," not in lines[i + 3]:
d[lines[i + 3].strip()] = [lines[i].strip()]
else:
key += (lines[i + 3].strip().split(', '))
for j in key:
if j not in d:
d[j] = [lines[i].strip()]
else:
d[j].append(lines[i].strip())
return d
It gets all the keys and values printed but it doesn't assign two values to the same key where it should. Also, with this last 'else' statement, the second restaurant is assigned to the wrong key as a second value. This should not happen. I would appreciate any comments or help.
In the case when there is only one category you don't check if the key is in the dictionary. You should do this analogously as in the case of multiple categories and then it works fine.
I don't know why you have file as an argument when you have a file then overwritten.
Additionally you should make 'key' for each result, and not += (adding it to the existing 'key'
when you check if j is in dictionary, clean way is to check if j is in the keys (d.keys())
def new(file):
file = "/.../Restaurants.txt"
d = {}
key = []
with open(file) as file:
lines = file.readlines()
for i in range(len(lines)):
if i % 5 == 0:
if "," not in lines[i + 3]:
if lines[i + 3] not in d.keys():
d[lines[i + 3].strip()] = [lines[i].strip()]
else:
d[lines[i + 3]].append(lines[i].strip())
else:
key = (lines[i + 3].strip().split(', '))
for j in key:
if j not in d.keys():
d[j] = [lines[i].strip()]
else:
d[j].append(lines[i].strip())
return d
Normally, I find that if you use names for the dictionary keys, you may have an easier time handling them later.
In the example below, I return a series of dictionaries, one for each restaurant. I also wrap the functionality of processing the values in a method called add_value(), to keep the code more readable.
In my example, I'm using codecs to decode the value. Although not necessary, depending on the characters you are dealing with it may be useful. I'm also using itertools to read the file lines with an iterator. Again, not necessary depending on the case, but might be useful if you are dealing with really big files.
import copy, itertools, codecs
class RestaurantListParser(object):
file_name = "restaurants.txt"
base_item = {
"_type": "undefined",
"_fields": {
"name": "undefined",
"nationality": "undefined",
"rating": "undefined",
"pricing": "undefined",
}
}
def add_value(self, formatted_item, field_name, field_value):
if isinstance(field_value, basestring):
# handle encoding, strip, process the values as you need.
field_value = codecs.encode(field_value, 'utf-8').strip()
formatted_item["_fields"][field_name] = field_value
else:
print 'Error parsing field "%s", with value: %s' % (field_name, field_value)
def generator(self, file_name):
with open(file_name) as file:
while True:
lines = tuple(itertools.islice(file, 5))
if not lines: break
# Initialize our dictionary for this item
formatted_item = copy.deepcopy(self.base_item)
if "," not in lines[3]:
formatted_item['_type'] = lines[3].strip()
else:
formatted_item['_type'] = lines[3].split(',')[1].strip()
self.add_value(formatted_item, 'nationality', lines[3].split(',')[0])
self.add_value(formatted_item, 'name', lines[0])
self.add_value(formatted_item, 'rating', lines[1])
self.add_value(formatted_item, 'pricing', lines[2])
yield formatted_item
def split_by_type(self):
d = {}
for restaurant in self.generator(self.file_name):
if restaurant['_type'] not in d:
d[restaurant['_type']] = [restaurant['_fields']]
else:
d[restaurant['_type']] += [restaurant['_fields']]
return d
Then, if you run:
p = RestaurantListParser()
print p.split_by_type()
You should get:
{
'Mexican': [{
'name': 'Mexican Grill',
'nationality': 'undefined',
'pricing': '$$',
'rating': '85%'
}],
'Pub Food': [{
'name': 'Georgie Porgie',
'nationality': 'Canadian',
'pricing': '$$$',
'rating': '87%'
}, {
'name': 'Deep Fried Everything',
'nationality': 'undefined',
'pricing': '$',
'rating': '52%'
}],
'Thai': [{
'name': 'Queen St. Cafe',
'nationality': 'Malaysian',
'pricing': '$',
'rating': '82%'
}]
}
Your solution is simple, so it's ok. I'd just like to mention a couple of ideas that come to mind when I think about this kind of problem.
Here's another take, using defaultdict and split to simplify things.
from collections import defaultdict
record_keys = ['name', 'rating', 'price', 'cuisine']
def load(file):
with open(file) as file:
data = file.read()
restaurants = []
# chop up input on each blank line (2 newlines in a row)
for record in data.split("\n\n"):
fields = record.split("\n")
# build a dictionary by zipping together the fixed set
# of field names and the values from this particular record
restaurant = dict(zip(record_keys, fields))
# split chops apart the type cuisine on comma, then _.strip()
# removes any leading/trailing whitespace on each type of cuisine
restaurant['cuisine'] = [_.strip() for _ in restaurant['cuisine'].split(",")]
restaurants.append(restaurant)
return restaurants
def build_index(database, key, value):
index = defaultdict(set)
for record in database:
for v in record.get(key, []):
# defaultdict will create a set if one is not present or add to it if one does
index[v].add(record[value])
return index
restaurant_db = load('/var/tmp/r')
print(restaurant_db)
by_type = build_index(restaurant_db, 'cuisine', 'name')
print(by_type)
I tried modifying this example code on python 3.x.
import csv
def cmp(a, b):
return (a > b) - (a < b)
# write stocks data as comma-separated values
f = open('stocks.csv', 'w')
writer = csv.writer(f)
writer.writerows([
('GOOG', 'Google, Inc.', 505.24, 0.47, 0.09),
('YHOO', 'Yahoo!, Inc.', 27.38, 0.33, 1.22),
('CNET', 'CNET Networks, Inc.', 8.62, -0.13, -1.49)
])
f.close()
# read stocks data, print status messages
f = open('stocks.csv', 'r')
stocks = csv.reader(f)
status_labels = {-1: 'down', 0: 'unchanged', 1: 'up'}
for ticker, name, price, change, pct in stocks:
status = status_labels[cmp(float(change), 0.0)]
print('%s is %s (%s%%)' % (name, status, pct))
f.close()
With suggestions from #glibdud, and #bernie, I have updated my code.
Am getting the below error:
ValueError: not enough values to unpack (expected 5, got 0)
What am I missing?
Note: Removed my question about double quotes in CSV file for string. Double quotes will be there if we have comma separated string, otherwise not.
The Problem occurs during writing the file.
The problem is the newline handling of the csv module. See this and footnote 1
if you add print(*stocks, sep='\n') between line 19 ans 20 you will get following output:
['GOOG', 'Google, Inc.', '505.24', '0.47', '0.09']
[]
['YHOO', 'Yahoo!, Inc.', '27.38', '0.33', '1.22']
[]
['CNET', 'CNET Networks, Inc.', '8.62', '-0.13', '-1.49']
[]
You see... an empty list can not have 5 values to unpack
#bernie already gave you the solution in his comment.
Change line 7 to:
f = open('stocks.csv', 'w', newline='')
^^^^^^^^^^^^
and you're fine.