Dictionary to a txt file - python-3.x

I am trying to write a dictionary to a .txt file. I haven't found an efficient way to add multiple values for keys to a text doc.
players = {}
def save_roster(players):
with open("Team Roster.txt", "wt") as out_file:
for k, v in players.items():
out_file.write(str(k) + ', ' + str(v) + '\n\n')
display_menu()
I have a dictionary that has multiple values for the key. This part of the program leave me with:
Bryce, <__main__.Roster object at 0x00000167D6DB6550>
Where the out put i am aiming for is:
Bryce, 23, third, 23

Python doesn't inherently understand how to print an object. You need to define the __str__ method in order to tell python how to represent your object as a string; otherwise it will default to the representation you're getting. In your case, I might go with something like
def __str__(self):
return str(self.position)+", "+str(self.jersey)
or whichever attributes you want to print.
And to read the data back in from the text file:
with open("Team Roster.txt", "r") as in_file:
for line in in_file:
player = Roster(*(line.split(", "))
#do something with player, like store it in a list
Assuming Roster.__init__() is set up appropriately, i.e. a Roster object is initialized by passing in the parameters in each line of the file in order.

Related

Why are the values in my dictionary returning as a list within a list for each element?

I've got a file with an id and lineage info for species.
For example:
162,Bacteria,Spirochaetes,Treponemataceae,Spirochaetia,Treponema
174,Bacteria,Spirochaetes,Leptospiraceae,Spirochaetia,Leptospira
192,Bacteria,Proteobacteria,Azospirillaceae,Alphaproteobacteria,Azospirillum
195,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
197,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
199,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
201,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
2829358,,,,,
2806529,Eukaryota,Nematoda,,,
I'm writing a script where I need to get the counts for each lineage depending on user input (i.e. if genus, then I would be looking at the last word in each line such as Treponema, if class, then the fourth, etc).
I'll later need to convert the counts into a dataframe but first I am trying to turn this file of lineage info into a dictionary where depending on user input, that lineage info (i.e. let's say genus) is the key, and the id is the value. This is because there can be multiple ids that match to the same lineage info such as ids 195, 197, 199, 201 would all return a hit for Campylobacter.
Here is my code:
def create_dicts(filename):
'''Transforms the unique_taxids_lineage_allsamples file into a dictionary.
Note: There can be multiple ids mapping to the same lineage info. Therefore ids should be values.'''
# Creating a genus_dict
unique_ids_dict={} # the main dict to return
phylum_dict=2 # third item in line
family_dict=3 # fourth item in line
class_dict=4 # fifth item in line
genus_dict=5 # sixth item in line
type_of_dict=input("What type of dict? 2=phylum, 3=family, 4=class, 5=genus\n")
with open(filename, 'r') as f:
content = f.readlines()
for line in content:
key = line.split(",")[int(type_of_dict)].strip("\n") # lineage info
value = line.split(",")[0].split("\n") # the id, there can be multiple mapping to the same key
if key in unique_ids_dict: # if the lineage info is already a key, skip
unique_ids_dict[key].append(value)
else:
unique_ids_dict[key]=value
return unique_ids_dict
I had to add the .split("\n") at the end of value because I kept getting the error where str object doesn't have attribute append.
I am trying to get a dictionary like the following if the user input was 5 for genus:
unique_ids_dict={'Treponema': ['162'], 'Leptospira': ['174'], 'Azospirillum': ['192'], 'Campylobacter': ['195', '197', '199', '201'], '': ['2829358', '2806529']}
But instead I am getting the following:
unique_ids_dict={'Treponema': ['162'], 'Leptospira': ['174'], 'Azospirillum': ['192'], 'Campylobacter': ['195', ['197'], ['199'], ['201']], '': ['2829358', ['2806529']]} ##missing str "NONE" haven't figured out how to convert empty strings to say "NONE"
Also, if anyone knows how to convert all empty hits into "NONE" or something of the following that would be great. This is sort of a secondary question so if needed I can open this as a separate question.
Thank you!
SOLVED ~~~~
NEeded to use extend instead of append.
To change emtpy string into a variable I used dict.pop so after my if statement
unique_ids_dict["NONE"] = unique_ids_dict.pop("")
Thank you!
def create_dicts(filename):
'''Transforms the unique_taxids_lineage_allsamples file into a dictionary.
Note: There can be multiple ids mapping to the same lineage info. Therefore ids should be values.'''
# Creating a genus_dict
unique_ids_dict = {} # the main dict to return
phylum_dict = 2 # third item in line
family_dict = 3 # fourth item in line
class_dict = 4 # fifth item in line
genus_dict = 5 # sixth item in line
type_of_dict = input("What type of dict? 2=phylum, 3=family, 4=class, 5=genus\n")
with open(filename, 'r') as f:
content = f.readlines()
for line in content:
key = line.split(",")[int(type_of_dict)].strip("\n") # lineage info
value = line.split(",")[0].split("\n") # the id, there can be multiple mapping to the same key
if key in unique_ids_dict: # if the lineage info is already a key, skip
unique_ids_dict[key].**extend**(value)
else:
unique_ids_dict[key] = value
return unique_ids_dict
This worked for me. Using extend on list not append.
I suggest that you work with Pandas, it's much simpler, and also it's good to assure header names:
import pandas as pd
def create_dicts(filename):
"""
Transforms the unique_taxids_lineage_allsamples file into a
dictionary.
Note: There can be multiple ids mapping to the same lineage info.
Therefore ids should be values.
"""
# Reading File:
content = pd.read_csv(
filename,
names=("ID", "Kingdom", "Phylum", "Family", "Class", "Genus")
)
# Printing input and choosing clade to work with:
print("\nWhat type of dict?")
print("- Phylum")
print("- Family")
print("- Class")
print("- Genus")
clade = input("> ").capitalize()
# Replacing empty values with string 'None':
content = content.where(pd.notnull(content), "None")
# Selecting columns and aggregating accordingly to the chosen
# clade and ID:
series = content.groupby(clade).agg("ID").unique()
# Creating dict:
content_dict = series.to_dict()
# If you do not want to work with Numpy arrays, just create
# another dict of lists:
content_dict = {k:list(v) for k, v in content_dict.items()}
return content_dict
if __name__ == "__main__":
d = create_dicts("temp.csv")
print(d)
temp.csv:
162,Bacteria,Spirochaetes,Treponemataceae,Spirochaetia,Treponema
174,Bacteria,Spirochaetes,Leptospiraceae,Spirochaetia,Leptospira
192,Bacteria,Proteobacteria,Azospirillaceae,Alphaproteobacteria,Azospirillum
195,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
197,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
199,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
201,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
829358,,,,,
2806529,Eukaryota,Nematoda,,,
I hope this is what you wanted to do.

python: How to read a file and store each line using map function?

I'm trying to reconvert a program that I wrote but getting rid of all for loops.
The original code reads a file with thousands of lines that are structured like:
Ex. 2 lines of a file:
As you can see, the first line starts with LPPD;LEMD and the second line starts with DAAE;LFML. I'm only interested in the very first and second element of each line.
The original code I wrote is:
# Libraries
import sys
from collections import Counter
import collections
from itertools import chain
from collections import defaultdict
import time
# START
# #time=0
start = time.time()
# Defining default program argument
if len(sys.argv)==1:
fileName = "file.txt"
else:
fileName = sys.argv[1]
takeOffAirport = []
landingAirport = []
# Reading file
lines = 0 # Counter for file lines
try:
with open(fileName) as file:
for line in file:
words = line.split(';')
# Relevant data, item1 and item2 from each file line
origin = words[0]
destination = words[1]
# Populating lists
landingAirport.append(destination)
takeOffAirport.append(origin)
lines += 1
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
airports_dict = defaultdict(list)
# Merge lists into a dictionary key:value
for key, value in chain(Counter(takeOffAirport).items(),
Counter(landingAirport).items()):
# 'AIRPOT_NAME':[num_takeOffs, num_landings]
airports_dict[key].append(value)
# Sum key values and add it as another value
for key, value in airports_dict.items():
#'AIRPOT_NAME':[num_totalMovements, num_takeOffs, num_landings]
airports_dict[key] = [sum(value),value]
# Sort dictionary by the top 10 total movements
airports_dict = sorted(airports_dict.items(),
key=lambda kv:kv[1], reverse=True)[:10]
airports_dict = collections.OrderedDict(airports_dict)
# Print results
print("\nAIRPORT"+ "\t\t#TOTAL_MOVEMENTS"+ "\t#TAKEOFFS"+ "\t#LANDINGS")
for k in airports_dict:
print(k,"\t\t", airports_dict[k][0],
"\t\t\t", airports_dict[k][1][1],
"\t\t", airports_dict[k][1][0])
# #time=1
end = time.time()- start
print("\nAlgorithm execution time: %0.5f" % end)
print("Total number of lines read in the file: %u\n" % lines)
airports_dict.clear
takeOffAirport.clear
landingAirport.clear
My goal is to simplify the program using map, reduce and filter. So far I have sorted teh creation of the two independent lists, one for each first element of each file line and another list with the second element of each file line by using:
# Creates two independent lists with the first and second element from each line
takeOff_Airport = list(map(lambda sub: (sub[0].split(';')[0]), lines))
landing_Airport = list(map(lambda sub: (sub[0].split(';')[1]), lines))
I was hoping to find the way to open the file and achieve the exact same result as the original code by been able to opemn the file thru a map() function, so I could pass each list to the above defined maps; takeOff_Airport and landing_Airport.
So if we have a file as such
line 1
line 2
line 3
line 4
and we do like this
open(file_name).read().split('\n')
we get this
['line 1', 'line 2', 'line 3', 'line 4', '']
Is this what you wanted?
Edit 1
I feel this is somewhat reduntant but since map applies a function to each element of an iterator we will have to have our file name in a list, and we ofcourse define our function
def open_read(file_name):
return open(file_name).read().split('\n')
print(list(map(open_read, ['test.txt'])))
This gets us
>>> [['line 1', 'line 2', 'line 3', 'line 4', '']]
So first off, calling split('\n') on each line is silly; the line is guaranteed to have at most one newline, at the end, and nothing after it, so you'd end up with a bunch of ['all of line', ''] lists. To avoid the empty string, just strip the newline. This won't leave each line wrapped in a list, but frankly, I can't imagine why you'd want a list of one-element lists containing a single string each.
So I'm just going to demonstrate using map+strip to get rid of the newlines, using operator.methodcaller to perform the strip on each line:
from operator import methodcaller
def readFile(fileName):
try:
with open(fileName) as file:
return list(map(methodcaller('strip', '\n'), file))
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
Sadly, since your file is context managed (a good thing, just inconvenient here), you do have to listify the result; map is lazy, and if you didn't listify before the return, the with statement would close the file, and pulling data from the map object would die with an exception.
To get around that, you can implement it as a trivial generator function, so the generator context keeps the file open until the generator is exhausted (or explicitly closed, or garbage collected):
def readFile(fileName):
try:
with open(fileName) as file:
yield from map(methodcaller('strip', '\n'), file)
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
yield from will introduce a tiny amount of overhead over directly iterating the map, but not much, and now you don't have to slurp the whole file if you don't want to; the caller can just iterate the result and get a split line on each iteration without pulling the whole file into memory. It does have the slight weakness that opening the file will be done lazily, so you won't see the exception (if there is any) until you begin iterating. This can be worked around, but it's not worth the trouble if you don't really need it.
I'd generally recommend the latter implementation as it gives the caller flexibility. If they want a list anyway, they just wrap the call in list and get the list result (with a tiny amount of overhead). If they don't, they can begin processing faster, and have much lower memory demands.
Mind you, this whole function is fairly odd; replacing IOErrors with prints and (implicitly) returning None is hostile to API consumers (they now have to check return values, and can't actually tell what went wrong). In real code, I'd probably just skip the function and insert:
with open(fileName) as file:
for line in map(methodcaller('strip', '\n'), file)):
# do stuff with line (with newline pre-stripped)
inline in the caller; maybe define split_by_newline = methodcaller('split', '\n') globally to use a friendlier name. It's not that much code, and I can't imagine that this specific behavior is needed in that many independent parts of your file, and inlining it removes the concerns about when the file is opened and closed.

how would i go about reading a .txt file then ordering the data in descending order

ok so i would like it that when the user wants to check the high scores the output would print the data in descending order keep in mind that there are both names and numbers on the .txt file which is why im finding this so hard. If there is anything else you need please tell me in the
def highscore():
global line #sets global variable
for line in open('score.txt'):
print(line)
#=================================================================================
def new_highscores():
global name, f #sets global variables
if score >= 1:#if score is equal or more than 1 run code below
name = input('what is your name? ')
f = open ('score.txt', 'a') #opens score.txt file and put it into append mode
f.write (str(name)) #write name on .txt file
f.write (' - ') #write - on .txt file
f.write (str(score)) #write score on .txt file
f.write ('\n') #signifies end of line
f.close() #closes .txtfile
if score <= 0: #if score is equal to zero go back to menu 2
menu2()
I added this just in case there was a problem in the way i was writing on the file
The easiest thing to do is just maintain the high scores file in a sorted state. That way every time you want to print it out, just go ahead and do it. When you add a score, sort the list again. Here's a version of new_highscores that accomplishes just that:
def new_highscores():
""""Adds a global variable score to scores.txt after asking for name"""
# not sure you need name and f as global variables without seeing
# the rest of your code. This shouldn't hurt though
global name, f # sets global variables
if score >= 1: # if score is equal or more than 1 run code below
name = input('What is your name? ')
# here is where you do the part I was talking about:
# get the lines from the file
with open('score.txt') as f:
lines = f.readlines()
scores = []
for line in lines:
name_, score_ = line.split(' - ')
# turn score_ from a string to a number
score_ = float(score_)
# store score_ first so that we are sorting by score_ later
scores.append((score_, name_))
# add the data from the user
scores.append((score, name))
# sort the scores
scores.sort(reverse=True)
# erase the file
with open('score.txt', 'w') as f:
# write the new data
for score_, name_ in scores:
f.write('{} - {}\n'.format(name_, score_))
if score <= 0: # if score is equal to zero go back to menu 2
menu2()
You'll notice I'm using the with statement. You can learn more about that here, but essentially it works like this:
with open(filename) as file:
# do stuff with file
# file is **automatically** closed once you get down here.
Even if you leave the block for another reason (an Exception is thrown, you return from a function early, etc.) Using a with statement is a safer way to deal with files, because you're basically letting Python handle the closing of the file for you. And Python will never forget like a programmer will.
You can read more about split and format here and here
P.S., There is a method called binary search that would be more efficient, but I get the feeling you're just starting so I wanted to keep it simple. Essentially what you would do is search for the location in the file where the new score should be inserted by halving the search area at each point. Then when you write back to the file, only write the stuff that's different (from the new score onward.)

Python file i/0: error -- How to write to a file when the filename given is a string?

I have a function that accepts d (dictionary that must be sorted asciibetically by key,) and filename (file that may or may not exist.) I have to have exact format written to this file and the function must return None.
Format:
Every key-value pair of the dictionary should be output as: a string that starts with key, followed by ":", a tab, then the integers from the value list. Every integer should be followed by a "," and a tab except for the very last one, which should be followed by a newline.
The issue is when I go to close the file and run my testers, it tells me this error:
'str' object has no attribute 'close'
Obviously that means my file isn't a file, it's a string. How do I fix this?
Here is my current functions that work together to accept the dictionary, sort the dictionary, open/create file for writing, write the dictionary to the file in specified format that can be read as a string, and then close the file:
def format_item(key,value):
return key+ ":\t"+",\t".join(str(x) for x in value)
def format_dict(d):
return sorted(format_item(key,value) for key, value in d.items())
def store(d,filename):
with open(filename, 'w') as f:
f.write("\n".join(format(dict(d))))
filename.close()
return None
Example of expected output:
IN: d = {'orange':[1,3],'apple':[2]}"
OUT: store(d,"out.txt")
the file contents should be read as this string: "apple:\t2\norange:\t1,\t3\n"
You have actually set the file handle to f but you are trying to close filename.
so your close command should be f.close()

Python trouble debugging i/0, how do I get the correct format?

I am attempting to make a dictionary into a formatted string and then write it to a file, however my entire formatting seems to be incorrect. I'm not sure how to debug since all my tester cases are given different files. I was able to use the interactive mode in python to find out what my function is actually writing to the file, and man is it so wrong! Can you help me correctly format?
Given a sorted dictionary, I created it into a string. I need the function to return it like so:
Dictionary is : {'orange':[1,3],'apple':[2]}
"apple:\t2\norange:\t1,\t3\n"
format is: Every key-value pair of the dictionary
should be output as: a string that starts with key, followed by ":", a tab, then the integers from the
value list. Every integer should be followed by a "," and a tab except for the very last one, which should be followed by a newline
Here is my function that I thought would work:
def format_item(key,value):
return key+ ":\t"+",\t".join(str(x) for x in value)
def format_dict(d):
return sorted(format_item(key,value) for key, value in d.items())
def store(d,filename):
with open(filename, 'w') as f:
f.write("\n".join(format_dict(d)))
f.close()
return None
I now have too many tabs on the last line. How do I edit the last line only out of the for loop?
ex input:
d = {'orange':[1,3],'apple':[2]}
my function gives: ['apple:\t2', 'orange:\t1,\t3']
but should give: "apple:\t2\norange:\t1,\t3\n"
Adding the newline character to the end of the return statement in format_item seems to yield the correct output.
return key+ ":\t"+",\t".join(str(x) for x in value) + '\n'
In [10]: format_dict(d)
Out[10]: ['apple:\t2\n', 'orange:\t1,\t3\n']

Resources