set function with file- python3 - python-3.x

I have a text file with given below content
Credit
Debit
21/12/2017
09:10:00
Written python code to convert text into set and discard \n.
with open('text_file_name', 'r') as file1:
same = set(file1)
print (same)
print (same.discard('\n'))
for first print statement print (same). I get correct result:
{'Credit\n','Debit\n','21/12/2017\n','09:10:00\n'}
But for second print statement print (same.discard('\n')) . I am getting result as
None.
Can anybody help me to figure out why I am getting None. I am using same.discard('\n') to discard \n in the set.
Note:
I am trying to understand the discard function with respect to set.

The discard method will only remove an element from the set, since your set doesn't contain just \n it can't discard it. What you are looking for is a map that strips the \n from each element like so:
set(map(lambda x: x.rstrip('\n'), same))
which will return {'Credit', 'Debit', '09:10:00', '21/12/2017'} as the set. This sample works by using the map builtin which applies it's first argument to each element in the set. The first argument in our map usage is lambda x: x.rstrip('\n') which is simply going to remove any occurrences of \n on the right-hand side of each string.

discard removes the given element from the set only if it presents in it.
In addition, the function doesn't return any value as it changes the set it was ran from.
with open('text_file_name', 'r') as file1:
same = set(file1)
print (same)
same = {elem[:len(elem) - 1] for elem in same if elem.endswith('\n')}
print (same)

There are 4 elements in the set, and none of them are newline.
It would be more usual to use a list in this case, as that preserves order while a set is not guaranteed to preserve order, plus it discards duplicate lines. Perhaps you have your reasons.
You seem to be looking for rstrip('\n'). Consider processing the file in this way:
s = {}
with open('text_file_name') as file1:
for line in file1:
s.add(line.rstrip('\n'))
s.discard('Credit')
print(s) # This displays 3 elements, without trailing newlines.

Related

Best way to fix inconsistent csv file in python

I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida
Let's say that you have ① wrong.csv and want to produce ② fixed.csv.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])

white space added in the beginning of new line when printing in new line

When I try to print whatever data on several lines using python 3, a single whitespace gets added to the beginning of all the lines except first one. for example:
[in] print('a','\n','b','\n','c')
the output will be:
a
b
c
but my desired output is:
a
b
c
so far I've only been able to do this by doing three print commands. Anyone has any thoughts?
From the docs:
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)
Print objects to the text stream file, separated by sep and followed by end.
sep, end, file and flush, if present, must be given as keyword
arguments.
Calling print('a', '\n', 'b') will print each of those three items with a space in between, which is what you are seeing.
You can change the separator argument to get what you want:
print('a', 'b', sep='\n')
Also see the format method.

Python trouble debugging i/0, how do I get the correct format?

I am attempting to make a dictionary into a formatted string and then write it to a file, however my entire formatting seems to be incorrect. I'm not sure how to debug since all my tester cases are given different files. I was able to use the interactive mode in python to find out what my function is actually writing to the file, and man is it so wrong! Can you help me correctly format?
Given a sorted dictionary, I created it into a string. I need the function to return it like so:
Dictionary is : {'orange':[1,3],'apple':[2]}
"apple:\t2\norange:\t1,\t3\n"
format is: Every key-value pair of the dictionary
should be output as: a string that starts with key, followed by ":", a tab, then the integers from the
value list. Every integer should be followed by a "," and a tab except for the very last one, which should be followed by a newline
Here is my function that I thought would work:
def format_item(key,value):
return key+ ":\t"+",\t".join(str(x) for x in value)
def format_dict(d):
return sorted(format_item(key,value) for key, value in d.items())
def store(d,filename):
with open(filename, 'w') as f:
f.write("\n".join(format_dict(d)))
f.close()
return None
I now have too many tabs on the last line. How do I edit the last line only out of the for loop?
ex input:
d = {'orange':[1,3],'apple':[2]}
my function gives: ['apple:\t2', 'orange:\t1,\t3']
but should give: "apple:\t2\norange:\t1,\t3\n"
Adding the newline character to the end of the return statement in format_item seems to yield the correct output.
return key+ ":\t"+",\t".join(str(x) for x in value) + '\n'
In [10]: format_dict(d)
Out[10]: ['apple:\t2\n', 'orange:\t1,\t3\n']

str.format places last variable first in print

The purpose of this script is to parse a text file (sys.argv[1]), extract certain strings, and print them in columns. I start by printing the header. Then I open the file, and scan through it, line by line. I make sure that the line has a specific start or contains a specific string, then I use regex to extract the specific value.
The matching and extraction work fine.
My final print statement doesn't work properly.
import re
import sys
print("{}\t{}\t{}\t{}\t{}".format("#query", "target", "e-value",
"identity(%)", "score"))
with open(sys.argv[1], 'r') as blastR:
for line in blastR:
if line.startswith("Query="):
queryIDMatch = re.match('Query= (([^ ])+)', line)
queryID = queryIDMatch.group(1)
queryID.rstrip
if line[0] == '>':
targetMatch = re.match('> (([^ ])+)', line)
target = targetMatch.group(1)
target.rstrip
if "Score = " in line:
eValue = re.search(r'Expect = (([^ ])+)', line)
trueEvalue = eValue.group(1)
trueEvalue = trueEvalue[:-1]
trueEvalue.rstrip()
print('{0}\t{1}\t{2}'.format(queryID, target, trueEvalue), end='')
The problem occurs when I try to print the columns. When I print the first 2 columns, it works as expected (except that it's still printing new lines):
#query target e-value identity(%) score
YAL002W Paxin1_129011
YAL003W Paxin1_167503
YAL005C Paxin1_162475
YAL005C Paxin1_167442
The 3rd column is a number in scientific notation like 2e-34
But when I add the 3rd column, eValue, it breaks down:
#query target e-value identity(%) score
YAL002W Paxin1_129011
4e-43YAL003W Paxin1_167503
1e-55YAL005C Paxin1_162475
0.0YAL005C Paxin1_167442
0.0YAL005C Paxin1_73182
I have removed all new lines, as far I know, using the rstrip() method.
At least three problems:
1) queryID.rstrip and target.rstrip are lacking closing ()
2) Something like trueEValue.rstrip() doesn't mutate the string, you would need
trueEValue = trueEValue.rstrip()
if you want to keep the change.
3) This might be a problem, but without seeing your data I can't be 100% sure. The r in rstrip stands for "right". If trueEvalue is 4e-43\n then it is true the trueEValue.rstrip() would be free of newlines. But the problem is that your values seem to be something like \n43-43. If you simply use .strip() then newlines will be removed from either side.

Skipping over array elements of certain types

I have a csv file that gets read into my code where arrays are generated out of each row of the file. I want to ignore all the array elements with letters in them and only worry about changing the elements containing numbers into floats. How can I change code like this:
myValues = []
data = open(text_file,"r")
for line in data.readlines()[1:]:
myValues.append([float(f) for f in line.strip('\n').strip('\r').split(',')])
so that the last line knows to only try converting numbers into floats, and to skip the letters entirely?
Put another way, given this list,
list = ['2','z','y','3','4']
what command should be given so the code knows not to try converting letters into floats?
You could use try: except:
for i in list:
try:
myVal.append(float(i))
except:
pass

Resources