Issue with text file importing and manipulation - python-3.x

I have a txt file with this in it:
GWashington 83
JAdams 86
What I need to do is read the file, add 5 to the numbers and save it to a new file.
newFile = open('scores2.txt', 'w')
stdLines = [line.strip() for line in open('class_scores.txt')]
scrSep = [line.split(',') for line in stdLines]
print(stdLines, scrSep)
def convert_numbers(s):
if not s:
return s
try:
f = float(s)
i = int(f)
return i if f == i else f
except ValueError:
return s
g = list(map(convert_numbers, scrSep))
print(s)
print(scrSep)
Thank you in advance for your help.
what should happen with this is it should open the file, seperate the lines and seperate the components so then I can turn the numbers into ints and manipulate them. But strip and split are making it harder for the items to be accessed.

Never mind I fixed it, just had to call from deeper in the list.
So instead of list[x] I had to do list[x][y]
That uhh... slipped my mind.

Related

Printing an entire list, instead of one line

I am having trouble writing the entire list into an outfile. Here is the code:
with open(infline, "r") as f:
lines = f.readlines()
for l in lines:
if "ATOM" in l :
split = l.split()
if split[-1] == "1":
print(split)
#print(type(split))
with open( newFile,"w") as f:
f.write("Model Number One" + "\n")
f.write(str(split))
When I use print(split) it allows me to see the entire list (image below):
with open(infile, "r") as f:
lines = f.readlines()
for l in lines:
if "ATOM" in l :
split = l.split()
if split[-1] == "1":
#print(split)
print(type(split))
with open( newFile,"w") as f:
f.write("Model Number One" + "\n")
for i in range(len(split)):
f.write(str(split))
However, when I try to use f.write(split) I get an error because the function can only take a str not a list. So, I used f.write(str(split)) and it worked. The only issue now is that it only writes the last item in the list, not the whole list.
The function print is slightly more permissible than the method f.write, in the sense that it can accept lists and various types of objects as input. f.write is usually called by passing pre-formatted strings, as you noticed.
I think the issue with the code is that the write routine is nested inside the code. This causes Python to erase any contents stored inside newFile, and write only the last line read (l).
The problem can be easily fixed by changing the open call to open( newFile,"a"). The flag "a" tells Python to append the new contents to the existing file newFile (without erasing information). If newFile does not exist yet, Python will automatically create it.

Check for non-floats in a csv file python3

I'm trying to read a csv file, and create a 2 dimensional list from the values stored inside.
However I'm running into trouble when I try to check whether or not the values stored can be converted into floats.
Here is the function I have written, which reads the file and creates a list.
def readfile(amount, name):
tempfile = open(name).readlines()[1:] #First value in line is never a float, hence the [1:]
rain_list = []
count = 0.0
for line in tempfile:
line = line.rstrip()
part = line.split(",")
try:
part = float(part)
except ValueError:
print("ERROR: invalid float in line: {}".format(line))
rain_list.append(part[amount])
count += 1
if count == 0:
print("ERROR in reading the file.")
tempfile.close()
return rain_list
It might be a little messy, since it's essentially a patchwork of different possible solutions I have tried.
The values it gets are the name of the file (name) and the amount of values it reads from the file (amount).
Has anyone got an idea why this does not work as I expect it to work?
part is a list of strings. To check & convert for all floats, you'd have to do:
part = [float(x) for x in part]
(wrapped in your exception block)
BTW you should use the csv module to read comma-separated files. It's built-in. Also using enumerate would allow to be able to print the line where the error occurs, not only the data:
reader = csv.reader(tempfile) # better: pass directly the file handle
# and use next(reader) to discard the title line
for lineno,line in enumerate(reader,2): # lineno starts at 2 because of title line
try:
line = [float(x) for x in line]
except ValueError:
print("ERROR: invalid float in line {}: {}".format(lineno,line))

How can I expand List capacity in Python?

read = open('700kLine.txt')
# use readline() to read the first line
line = read.readline()
aList = []
for line in read:
try:
num = int(line.strip())
aList.append(num)
except:
print ("Not a number in line " + line)
read.close()
print(aList)
There is 700k Line in that file (every single line has max 2 digits number)
I can only get ~280k Line in that file to in my aList.
So, How can I expand aList capacity 280k to 700k or more? (Is there a different solution for this case?)
Hello, I just solved that problem. Thanks for all your helps. That was an obvious buffer problem.
Solution is just increasing the size of buffer.
link is here
Increase output buffer when running or debugging in PyCharm
Please try this.
filename = '700kLine.txt'
with open(filename) as f:
data = f.readlines()
print(data)
print(type(data)) #stores the data in a list
Yes, you can.
Once a list is defined, you can add, edit or delete its elements. To add more elements at the end, use the append function:
MyList.append(data)
Where MyList is the name of the list and data is the element you want to add.
I tried to re-create your problem:
# creating 700kLine file
with open('700kLine.txt', 'w') as f:
for i in range(700000):
f.write(str(i+1) + '\n')
# creating list from file entries
aList = []
with open('700kLine.txt', 'r') as f:
for line in f:
num = int(line.strip())
aList.append(num)
# print(aList)
print(aList[:30])
Jupyter notebook throws an error while printing all 700K lines due to too much memory used. If you really want to print all 700k values, run the python script from terminal.
It could be that your computer ran out of memory processing the file? I have tried generating an infinite loop appending a single digit to the list and I ended up with 47 million-ish len(list) >> 47119572, the code I use to test as below.
I tried this code on an online REPL and it came to a significantly lower 'len(list)`.
list = []
while True:
try:
if len(list) > 0:
list.append(list[-1] + 1)
else:
list.append(1)
except MemoryError:
print("memory error, last count is: ", list[-1])
raise MemoryError
Maybe try saving bits of data read instead of reading the whole file at once?
Just my assumption.

python: editing specific lines in a text file.File not being read after first edit

I am new to python. Right now I'm trying to learn how to edit text files(overwrite them).
So, I have a text file, which stores these ints just like that:
1
2
3
4
5
then when I do this
with open('badgeNumbers.txt', 'r') as f:
lines = f.readlines()
self.firstBadge = lines[0].strip()
self.secondBadge = lines[1].strip()
self.thirdBadge = lines[2].strip()
self.fourthBadge = lines[3].strip()
self.fifthBadge = lines[4].strip()
int(self.thirdBadge)
lines[2] = 56
out = open('badgeNumbers.txt', 'w')
out.writelines(str(lines))
out.close()
it works and changes the number.
in text file it is now saved like this:
['1\n', '2\n', 56, '3\n', '4\n', '5']
However, later if I want to run this again, it gives me this error:
self.secondBadge = lines[1].strip()
IndexError: list index out of range
I just need for it to be able to do the same thing as before the first text file edit.
Can somebody please help?
Thanks
The first problem is that 56 does not have a new line at the end. That means that it and the next line will be displayed on the same line. The second problem is that you are writing the string representation of the list onto one line instead of writing each string in the list on separate lines. Change lines[2] = 56 to lines[2] = "56\n", and change out.writelines(str(lines)) to out.writelines(lines)

Python IndexError: list index out of range large file

I have a very large file ~40GB and 674,877,098 lines I want to read and extract specific columns from. I can get about 3GB of data transferred then I get the following error.
Traceback (most recent call last):
File "C:\Users\Codes\Read_cat_write.py", line 44, in <module>
tid = int(columns[2])
IndexError: list index out of range
Sample of data that is being read in.
1,100000000,100000000,39,2.704006988169216e15,310057,0
2,100000001,100000000,38,2.650346740514816e15,303904,0.01
3,100000002,100000000,37,2.136985003098112e15,245039,0.03
4,100000003,100000000,36,2.29479163101184e15,263134,0.05
5,100000004,100000000,35,1.834645477916672e15,210371,0.06
6,100000005,100000000,34,1.814063860416512e15,208011,0.08
7,100000006,100000000,33,1.808883592986624e15,207417,0.1
8,100000007,100000000,32,1.806241248575488e15,207114,0.12
9,100000008,100000000,31,1.651783621410816e15,189403,0.14
10,100000009,100000000,30,1.634821184946176e15,187458,0.16
Code
from itertools import islice
F = r'C:\Users\Outfiles\comp_cat_raw.txt'
w = open(r'C:\Users\Outfiles\comp_cat_3col.txt','a')
def filesave(TID,M,R):
X = str(TID)
Y = str(M)
Z = str(R)
w.write(X)
w.write('\t')
w.write(Y)
w.write('\t')
w.write(Z)
w.write('\n')
N = 680000000
f = open(F) #Opens file
f.readline() # Strips Header
nlines = islice(f, N) #slices file to only read N lines
for line in nlines:
if line !='':
line = line.strip()
line = line.replace(',',' ') # Replace comma with space
columns = line.split() # Splits into column
tid = int(columns[2])
m = float(columns[4])
r = float(columns[6])
filesave(tid,m,r)
w.close()
I have looked at the file being read in at the point where the error occurs, but I don't see anything wrong with the file so I am at a loss as to the cause of this error.
Chances are, there is some line with maybe one single comma in there, or none, or an empty line, whatever. Probably just put a try-except statement around the statement and catch the index error, probably printing out the line in question, and you should be done. Besides that, there are some things in your code, that might be worth to improve.
Have a look at the csv module especially. It has some optimized C-code exactly for what you want to do, so it should be much faster. This answer shows mainly how to write the iteration with csv.
This whole slice construction seems to be superfluous. A simple for line in f: will do and is the most efficient way to handle this iteration.
Use line.split(',') directly, instead of replacing them first with spaces.
Use with open(F) as f: instead of calling close yourself. For this script it might make no difference, but this way you make sure, that you e.g. don't create open file handles in case of errors.

Resources