Reading multiple lines from another file into a tuple - python-3.x

I can't seem to pull each individual line from a .txt file into a tuple. The 'city-data.txt' file is just a list of the 50 states, capitols, and their lat/longs. I need to create a tuple of all the states.
This is my code so far -
def read_cities(file_name):
file_name = open('city-data.txt' , 'r')
for line in file_name:
road_map = ((line.split('\t')))
return road_map
file_name.close()
print(read_cities('city-data.txt'))
When it's run, it only prints the very first line from the .txt file, as such:
['Alabama', 'Montgomery', '32.361538', '-86.279118\n']

The reason it prints only the very first line is because of this
for line in file_name:
road_map = ((line.split('\t')))
return road_map
You are returning immediately after you consume the first line. This is why it only prints the very first line.
Instead, you need to store these in a list, and return that list in the end.
def read_cities(file_name):
file_name = open('city-data.txt' , 'r')
road_maps = []
for line in file_name:
road_map = ((line.split('\t')))
road_maps.append(road_map)
file_name.close()
# road_maps is a list, since you wanted a tuple we convert it to that
return tuple(road_maps)
print(read_cities('city-data.txt'))
I need to create a tuple of all the states.
Does this mean you only want the first column from each line ? If so, modify it to
def read_cities(file_name):
# notice I've changed this to use file_name instead of
# the hard-coded filename string
file_name = open(file_name , 'r')
# if you need uniqueness, use states=set() and use .add instead
# of .append
states = []
for line in file_name:
line_split = line.split('\t')
# line_split is a list and the 0th index is the state column
state = line_split[0]
# use states.add if you used a set instead of a list
states.append(state)
file_name.close()
return tuple(states)
print(read_cities('city-data.txt'))

Related

Python 3 split string multiple times

Text file input:
10G/Host_IP,UID,PWD,Host-Name,15-2-7
10G/Host_IP,UID,PWD,Host-Name,12-2-7
root = tk.Tk()
root.attributes("-topmost", True)
root.withdraw()
file = tkinter.filedialog.askopenfilename()
def _10g_script (params):
print (type(params)) ## says params is a str
for items in params:
params1 = items.split(",")
## print(IP, UID, PWD, TID, SH_SL_PT) ## is what I am wanting here,
##then I will split the SH_SL_PT
print (type(params1)) ## says params is a list
with open(file,"r") as fh:
for lines in fh:
rate, param = lines.strip().split("/")
if rate == "10G":
_10g_script(param)
print (type(param)) ## says param is a str
What I am trying to is split the line from the text file the rate and the rest of the parameters, rate and other parameters into separate variables. Pass the rate into the function then split the variable params further into more variables (Host_IP, UID, PWD, Host-Name, SH_SL_PT).
The first split in is a str and after the split, but when I try the second split it says it is a list.
I have tried join, but it puts every character as its own string with a "," in between characters
Any help would be appreciated
Let's walk through the code. Your code starts here:
with open(file,"r") as fh:
for lines in fh:
rate, param = lines.strip().split("/")
if rate == "10G":
_10g_script(param)
print (type(param)) ## says param is a str
We first open the file and then jump into the for loop. This loop splits up the document into lines, and puts these lines into a list that it goes through, meaning that the variable lines is a string of one line of the document, and every iteration we go to the next line.
Next we split the our line using "/". This split creates a list containing two elements, with lines.strip().split("/") = ["10G","Host_IP,UID,PWD,Host-Name,12-2-7"]. However, on the left side you put two variables, rate and param, so python sets rate = "10G" and param = "Host_IP,UID,PWD,Host-Name,12-2-7".
Going into your function, params as you saw is a string. So when you try to loop through it, python assumes that you want each iteration of your loop to go through a single character.
So, instead of writing the function _10g_script, what you can do is:
with open(file,"r") as fh:
for lines in fh:
rate, param = lines.strip().split("/")
if rate == "10G":
#IP = "Host_IP", UID = "UID", TID = "TID", SH_SL_PT
IP, UID, PWD, TID, SH_SL_PT = param.split(",")
print(IP,UID,TID,SH_SL_PT)
Then you would do the same for SH_SL_PT, writing:
SH,SL,PT = SH_SL_PT.split("-")
Wherever you needed that.

Why are the values in my dictionary returning as a list within a list for each element?

I've got a file with an id and lineage info for species.
For example:
162,Bacteria,Spirochaetes,Treponemataceae,Spirochaetia,Treponema
174,Bacteria,Spirochaetes,Leptospiraceae,Spirochaetia,Leptospira
192,Bacteria,Proteobacteria,Azospirillaceae,Alphaproteobacteria,Azospirillum
195,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
197,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
199,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
201,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
2829358,,,,,
2806529,Eukaryota,Nematoda,,,
I'm writing a script where I need to get the counts for each lineage depending on user input (i.e. if genus, then I would be looking at the last word in each line such as Treponema, if class, then the fourth, etc).
I'll later need to convert the counts into a dataframe but first I am trying to turn this file of lineage info into a dictionary where depending on user input, that lineage info (i.e. let's say genus) is the key, and the id is the value. This is because there can be multiple ids that match to the same lineage info such as ids 195, 197, 199, 201 would all return a hit for Campylobacter.
Here is my code:
def create_dicts(filename):
'''Transforms the unique_taxids_lineage_allsamples file into a dictionary.
Note: There can be multiple ids mapping to the same lineage info. Therefore ids should be values.'''
# Creating a genus_dict
unique_ids_dict={} # the main dict to return
phylum_dict=2 # third item in line
family_dict=3 # fourth item in line
class_dict=4 # fifth item in line
genus_dict=5 # sixth item in line
type_of_dict=input("What type of dict? 2=phylum, 3=family, 4=class, 5=genus\n")
with open(filename, 'r') as f:
content = f.readlines()
for line in content:
key = line.split(",")[int(type_of_dict)].strip("\n") # lineage info
value = line.split(",")[0].split("\n") # the id, there can be multiple mapping to the same key
if key in unique_ids_dict: # if the lineage info is already a key, skip
unique_ids_dict[key].append(value)
else:
unique_ids_dict[key]=value
return unique_ids_dict
I had to add the .split("\n") at the end of value because I kept getting the error where str object doesn't have attribute append.
I am trying to get a dictionary like the following if the user input was 5 for genus:
unique_ids_dict={'Treponema': ['162'], 'Leptospira': ['174'], 'Azospirillum': ['192'], 'Campylobacter': ['195', '197', '199', '201'], '': ['2829358', '2806529']}
But instead I am getting the following:
unique_ids_dict={'Treponema': ['162'], 'Leptospira': ['174'], 'Azospirillum': ['192'], 'Campylobacter': ['195', ['197'], ['199'], ['201']], '': ['2829358', ['2806529']]} ##missing str "NONE" haven't figured out how to convert empty strings to say "NONE"
Also, if anyone knows how to convert all empty hits into "NONE" or something of the following that would be great. This is sort of a secondary question so if needed I can open this as a separate question.
Thank you!
SOLVED ~~~~
NEeded to use extend instead of append.
To change emtpy string into a variable I used dict.pop so after my if statement
unique_ids_dict["NONE"] = unique_ids_dict.pop("")
Thank you!
def create_dicts(filename):
'''Transforms the unique_taxids_lineage_allsamples file into a dictionary.
Note: There can be multiple ids mapping to the same lineage info. Therefore ids should be values.'''
# Creating a genus_dict
unique_ids_dict = {} # the main dict to return
phylum_dict = 2 # third item in line
family_dict = 3 # fourth item in line
class_dict = 4 # fifth item in line
genus_dict = 5 # sixth item in line
type_of_dict = input("What type of dict? 2=phylum, 3=family, 4=class, 5=genus\n")
with open(filename, 'r') as f:
content = f.readlines()
for line in content:
key = line.split(",")[int(type_of_dict)].strip("\n") # lineage info
value = line.split(",")[0].split("\n") # the id, there can be multiple mapping to the same key
if key in unique_ids_dict: # if the lineage info is already a key, skip
unique_ids_dict[key].**extend**(value)
else:
unique_ids_dict[key] = value
return unique_ids_dict
This worked for me. Using extend on list not append.
I suggest that you work with Pandas, it's much simpler, and also it's good to assure header names:
import pandas as pd
def create_dicts(filename):
"""
Transforms the unique_taxids_lineage_allsamples file into a
dictionary.
Note: There can be multiple ids mapping to the same lineage info.
Therefore ids should be values.
"""
# Reading File:
content = pd.read_csv(
filename,
names=("ID", "Kingdom", "Phylum", "Family", "Class", "Genus")
)
# Printing input and choosing clade to work with:
print("\nWhat type of dict?")
print("- Phylum")
print("- Family")
print("- Class")
print("- Genus")
clade = input("> ").capitalize()
# Replacing empty values with string 'None':
content = content.where(pd.notnull(content), "None")
# Selecting columns and aggregating accordingly to the chosen
# clade and ID:
series = content.groupby(clade).agg("ID").unique()
# Creating dict:
content_dict = series.to_dict()
# If you do not want to work with Numpy arrays, just create
# another dict of lists:
content_dict = {k:list(v) for k, v in content_dict.items()}
return content_dict
if __name__ == "__main__":
d = create_dicts("temp.csv")
print(d)
temp.csv:
162,Bacteria,Spirochaetes,Treponemataceae,Spirochaetia,Treponema
174,Bacteria,Spirochaetes,Leptospiraceae,Spirochaetia,Leptospira
192,Bacteria,Proteobacteria,Azospirillaceae,Alphaproteobacteria,Azospirillum
195,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
197,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
199,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
201,Bacteria,Proteobacteria,Campylobacteraceae,Epsilonproteobacteria,Campylobacter
829358,,,,,
2806529,Eukaryota,Nematoda,,,
I hope this is what you wanted to do.

Merge only if two consecutives lines startwith at python and write the rest of text normally

Input
02000|42163,54|
03100|4|6070,00
03110|||6070,00|00|00|
00000|31751150201912001|01072000600074639|
02000|288465,76|
03100|11|9060,00
03110|||1299,00|00|
03110||||7761,00|00|
03100|29|14031,21
03110|||14031,21|00|
00000|31757328201912001|01072000601021393|
Code
prev = ''
with open('out.txt') as f:
for line in f:
if prev.startswith('03110') and line.startswith('03110'):
print(prev.strip()+ '|03100|XX|PARCELA|' + line)
prev = line
Hi, I have this code that search if two consecutives lines startswith 03110 and print those line, but I wanna transforme the code so it prints or write at .txt also the rest of the lines
Output should be like this
02000|42163,54|
03100|4|6070,00
03110|||6070,00|00|00|
00000|31751150201912001|01072000600074639|
02000|288465,76|
03100|11|9060,00
03110|||1299,00|00|3100|XX|PARCELA|03110||||7761,00|00|
03100|29|14031,21
03110|||14031,21|00|
00000|31757328201912001|01072000601021393|
I´m know that I´m getting only those two lines merged, because that is the command at print()
03110|||1299,00|00|3100|XX|PARCELA|03110||||7761,00|00|
But I don´t know to make the desire output, can anyone help me with my code?
# I assume the input is in a text file:
with open('myFile.txt', 'r') as my_file:
splited_line = [line.rstrip().split('|') for line in my_file] # this will split every line as a separate list
new_list = []
for i in range(len(splited_line)):
try:
if splited_line[i][0] == '03110' and splited_line[i-1][0] == '03110': # if the current line and the previous line start with 03110
first = '|'.join(splited_line[i-1])
second = '|'.join(splited_line[i])
newLine = first + "|03100|XX|PARCELA|"+ second
new_list.append(newLine)
elif splited_line[i][0] == '03110' and splited_line[i+1][0] == '03110': # to escape duplicating in the list
pass
else:
line = '|'.join(splited_line[i])
new_list.append(line)
except IndexError:
pass
# To write the new_list to text files
with open('new_file' , 'a') as f:
for item in new_list:
print(item)
f.write(item + '\n')

Check for non-floats in a csv file python3

I'm trying to read a csv file, and create a 2 dimensional list from the values stored inside.
However I'm running into trouble when I try to check whether or not the values stored can be converted into floats.
Here is the function I have written, which reads the file and creates a list.
def readfile(amount, name):
tempfile = open(name).readlines()[1:] #First value in line is never a float, hence the [1:]
rain_list = []
count = 0.0
for line in tempfile:
line = line.rstrip()
part = line.split(",")
try:
part = float(part)
except ValueError:
print("ERROR: invalid float in line: {}".format(line))
rain_list.append(part[amount])
count += 1
if count == 0:
print("ERROR in reading the file.")
tempfile.close()
return rain_list
It might be a little messy, since it's essentially a patchwork of different possible solutions I have tried.
The values it gets are the name of the file (name) and the amount of values it reads from the file (amount).
Has anyone got an idea why this does not work as I expect it to work?
part is a list of strings. To check & convert for all floats, you'd have to do:
part = [float(x) for x in part]
(wrapped in your exception block)
BTW you should use the csv module to read comma-separated files. It's built-in. Also using enumerate would allow to be able to print the line where the error occurs, not only the data:
reader = csv.reader(tempfile) # better: pass directly the file handle
# and use next(reader) to discard the title line
for lineno,line in enumerate(reader,2): # lineno starts at 2 because of title line
try:
line = [float(x) for x in line]
except ValueError:
print("ERROR: invalid float in line {}: {}".format(lineno,line))

File input frequency sorting

so I have to write a program that:
Takes the filename as an argument.
Reads the file and counts, for each band, how many albums of that band are listed in the file. (http://vlm1.uta.edu/~cconly/teaching/cse1310_spring2015/assignments/assignment7/albums.txt)
Prints on the screen, in descending order of number of albums, a line for each band. Each line should contain the name of the band, followed by a colon and space, and then the number of albums for that band. This would look like this:
band1: number1
band2: number2
band3: number3
so there is my code below, but I keep getting tremendous errors that tells me that things aren't defined when they are, and I'll get this one as well --> TypeError: 'NoneType' object is not iterable, any help would be great!
import fileinput
import os
filename = open("albums.txt", "r") # open album.txt file
def process_line(line):
line = line.lower()
new_line = ""
for letter in line:
if letter in (""",.!"'()"""):
continue
elif letter == '-':
letter = ' '
new_line = new_line + letter
words = new_line.split()
return words
def count_words(filename):
if (os.path.isfile(filename) == False):
print("\nError: file " + filename + " does not exist.\n")
return
#in_file = open(filename, "r")
result = {}
for line in filename:
words = process_line(line)
for word in words:
if (word in result):
result[word] += 1
else:
result[word] = 1
def print_word_frequencies(dictionary):
print()
inverse = inverse_dictionary(dictionary)
frequencies = inverse.keys()
frequencies = list(frequencies) # convert frequencies to a list, so that we can sort it.
frequencies.sort() # sorting the list
frequencies.reverse() # reverse the sorting of the list
for frequency in frequencies: # for words with the same frequency, we want them sorted in
list_of_words = inverse[frequency]
list_of_words.sort() # sorting in alphabetical order
for word in list_of_words:
print(word + ":", frequency)
def inverse_dictionary(in_dictionary):
out_dictionary = {}
for key in in_dictionary:
value = in_dictionary[key]
if (value in out_dictionary):
list_of_keys = out_dictionary[value]
list_of_keys.append(key)
else:
out_dictionary[value] = [key]
return out_dictionary
def main():
filename = "albums.txt"
dictionary = count_words(filename)
print_word_frequencies(dictionary)
main()
Since this is an assignment, I will not give you the full code, but just point out some errors.
First, your indentation is all wrong, and indentation is important in Python! This may just have happened when you pasted your code into the question editor, but maybe not. Particularly, make sure your are not mixing tabs and spaces!
Your count_words method does not return anything, thus dictionary is None and you get TypeError: 'NoneType' object is not iterable in inverse_dictionary
When you do for line in filename, you are iterating the characters in the file name, not the lines in the file, as the global variable filename is shadowed by the filename parameter. Open the file in that method using with open(filename) as the_file:
Your process_line method seems odd. It seems like you remove all the special characters, but then how do you plan to separate band name and album name? You seem to just count words, not albums per band. Try line.split(" - ")[0] to get the band.
All that dictionary-inversing is not needed at all. In print_word_frequencies, just sort the items from the dictionary using some custom key function to sort by the count.
With those hints, you should be able to fix your program. (In case you want to know, I got your program down to about ten lines of code.)

Resources