Python 3 going through a file until EOF. File is not just a set of similar lines needing processing - python-3.x

The answers to questions of the type "How do I do "while not eof(file)""
do not quite cover my issue
I have a file with a format like
header block
data
another header block
more data (with arbitrary number of data lines in each data block)
...
I do not know how many header-data sets there are
I have successfully read the first block, then a set of data using loops that look for the blank line at the end of the data block.
I can't just use the "for each line in openfile" type approach as I need to read the header-data blocks one at a time and then process them.
How can I detect the last header-data block.
My current approach is to use a try except construction and wait for the exception. Not terribly elegant.

It's hard to answer without seeing any of your code...
But my guess is that you are reading the file with fp.read():
fp = open("a.txt")
while True:
data = fp.read()
Instead:
try to pass always the length of data you spected
Check if the read chunck is a empty string, not None
For example:
fp = open("a.txt")
while True:
header = fp.read(headerSize)
if header is '':
# End of file
break
read_dataSize_from_header
data = fp.read(dataSize)
if data is '':
# Error reading file
raise FileError('Error reading file')
process_your_data(data)

This is some time later but I post this for others who do this search.
The following script, suitably adjusted, will read a file and deliver lines until the EOF.
"""
Script to read a file until the EOF
"""
def get_all_lines(the_file):
for line in the_file:
if line.endswith('\n'):
line = line[:-1]
yield line
line_counter = 1
data_in = open('OAall.txt')
for line in get_all_lines(data_in):
print(line)
print(line_counter)
line_counter += 1
data_in.close()

Related

not getting return value through function or method in python

in this program i am iterating the function and adding the result into the file it works fine, no issue whatsoever but when i am trying to take the value from the return of last call, it just return nothing even though the variable is not empty.because the else part only runs for a single time.
#this is an ipynb file so spacing means they are getting executed from different blocks
def intersection(pre,i=0,point=0,count=0,result=dt):
index=-1
prefer=[]
# print(i)
if(0<i):
url = "../data/result.csv"
result= pd.read_csv(url,names=["a","b","c","d","e"])
if(i<len(pre)):
for j in result[pre[i]]:
index=index+1
if(demand[pre[i]][1] >= j):
prefer.append(result.iloc[index,:])
i=i+1
file = open('../data/result.csv', 'w+', newline ='')
header = ["a","b","c","d","e"]
writer = csv.DictWriter(file, fieldnames = header)
# writing data row-wise into the csv file
writer.writeheader()
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(prefer)
count=count+1
# print(prefer,count) print the outputs step by step
intersection(pre,i,point,count,result)
else:
print("Else Part",type(result))
print(result)
return result
#
pre=["a","b","c"]
rec=intersection(pre)
print(rec)
Output
it prints all the value of result from else part i have excluded it in snapshot because it was too vast and i have few fields here but it wil not effect, for the problem which i am getting... please answer if you know how can i take the value of result into rec.
OK. The code is a bit more complex than I thought. I was trying to work through it just now, and I hit some bugs. Maybe you can clear them up for me.
In the function call, def intersection(pre,i=0,point=0,count=0,result=dt):, dt isn't defined. What should it be?
On the fourth line, i<0 - the default value of i is zero so, unless i is given a value on calling the function, this piece of code will never run.
I notice that the file being read and the file being written are the same: ../data/result.csv - is this correct?
There's another undefined variable, demand, on line 14. Can you fill that in?
Let's see where we are after that.

Stuck in infinite loop while trying to read all lines in proc.stdout.readline

I am trying to read each line in proc.stdout.readline and send the lines over the network, for example:
data = b''
for line in iter(proc.stdout.readline, ''):
data += line
clientsocket.send(data)
When I run this code I seem to be stuck in a inifinite loop unable to escape to the line:
clientsocket.send(data)
Is there a more efficient way to read the data? I've tried also with a while loop and breaking 'if not line':
data = b''
while True:
line += proc.stdout.readline()
data += line
if not line:
break
clientsocket.send(data)
This seems to also produce the same results. Is there a more efficient way to read all of the data from proc.stdout.readline?
I've encountered the same very problem. The strange thing that in Python 2.7 it had no problem to converge and actually stop iterating.
During debug (in Python 3.5) I've noticed that all true lines returned with the '\n' character, whereas the line that wasn't suppose to arrive returned as an empty string, i.e. ''. So, I just added an if-clause checking against '' and breaking the loop if positive.
My final version looks as follows:
lines = []
for _line in iter(process.stdout.readline, b''):
if _line == '':
break
lines.append(_line)
One thing that might be worth to mention, is that I used universal_newlines=True argument upon subprocess.Popen(..) call.
The statement: iter(proc.stdout.readline, "") will do a blocking read until it recieves an EOF.
If you want to read all the lines, then you can just do:
data = b''
data = b"".join(proc.stdout.readlines())
There is no other solution than for the proc to produce lines faster.
If you want, you can read lines with timeout (i.e. you can wait to read a select number of characters, or timeout if that number of characters are not read).
Those answers can be found here:
https://stackoverflow.com/a/10759061/6400614 .
https://stackoverflow.com/a/5413588/6400614

Reading file and getting values from a file. It shows only first one and others are empty

I am reading a file by using a with open in python and then do all other operation in the with a loop. While calling the function, I can print only the first operation inside the loop, while others are empty. I can do this by using another approach such as readlines, but I did not find why this does not work. I thought the reason might be closing the file, but with open take care of it. Could anyone please suggest me what's wrong
def read_datafile(filename):
with open(filename, 'r') as f:
a = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==2]
b = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==3]
c = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==2]
return a, b, c
read_datafile('data_file_name')
I only get values for a and all others are empty. When 'a' is commented​, I get value for b and others are empty.
Updates
The file looks like this:
-0.6908270760153553 -0.4493128078936575 0.5090918714784820
0.6908270760153551 -0.2172871921063448 0.5090918714784820
-0.0000000000000000 0.6666999999999987 0.4597549674638203
0.3097856229862140 -0.1259623621214220 0.5475896447896115
0.6902143770137859 0.4593623621214192 0.5475896447896115
The construct
with open(filename) as handle:
a = [line for line in handle if condition]
b = [line for line in handle]
will always return an empty b because the iterator in a already consumed all the data from the open filehandle. Once you reach the end of a stream, additional attempts to read anything will simply return nothing.
If the input is seekable, you can rewind it and read all the same lines again; or you can close it (explicitly, or implicitly by leaving the with block) and open it again - but a much more efficient solution is to read it just once, and pick the lines you actually want from memory. Remember that reading a byte off a disk can easily take several orders of magnitude more time than reading a byte from memory. And keep in mind that the data you read could come from a source which is not seekable, such as standard output from another process, or a client on the other side of a network connection.
def read_datafile(filename):
with open(filename, 'r') as f:
lines = [line for line in f]
a = lines[2]
b = lines[3]
c = lines[2]
return a, b, c
If the file could be too large to fit into memory at once, you end up with a different set of problems. Perhaps in this scenario, where you only seem to want a few lines from the beginning, only read that many lines into memory in the first place.
What exactly are you trying to do with this script? The lines variable here may not contain what you want: it will contain a single line because the file gets enumerated by lines.

Python IndexError: list index out of range large file

I have a very large file ~40GB and 674,877,098 lines I want to read and extract specific columns from. I can get about 3GB of data transferred then I get the following error.
Traceback (most recent call last):
File "C:\Users\Codes\Read_cat_write.py", line 44, in <module>
tid = int(columns[2])
IndexError: list index out of range
Sample of data that is being read in.
1,100000000,100000000,39,2.704006988169216e15,310057,0
2,100000001,100000000,38,2.650346740514816e15,303904,0.01
3,100000002,100000000,37,2.136985003098112e15,245039,0.03
4,100000003,100000000,36,2.29479163101184e15,263134,0.05
5,100000004,100000000,35,1.834645477916672e15,210371,0.06
6,100000005,100000000,34,1.814063860416512e15,208011,0.08
7,100000006,100000000,33,1.808883592986624e15,207417,0.1
8,100000007,100000000,32,1.806241248575488e15,207114,0.12
9,100000008,100000000,31,1.651783621410816e15,189403,0.14
10,100000009,100000000,30,1.634821184946176e15,187458,0.16
Code
from itertools import islice
F = r'C:\Users\Outfiles\comp_cat_raw.txt'
w = open(r'C:\Users\Outfiles\comp_cat_3col.txt','a')
def filesave(TID,M,R):
X = str(TID)
Y = str(M)
Z = str(R)
w.write(X)
w.write('\t')
w.write(Y)
w.write('\t')
w.write(Z)
w.write('\n')
N = 680000000
f = open(F) #Opens file
f.readline() # Strips Header
nlines = islice(f, N) #slices file to only read N lines
for line in nlines:
if line !='':
line = line.strip()
line = line.replace(',',' ') # Replace comma with space
columns = line.split() # Splits into column
tid = int(columns[2])
m = float(columns[4])
r = float(columns[6])
filesave(tid,m,r)
w.close()
I have looked at the file being read in at the point where the error occurs, but I don't see anything wrong with the file so I am at a loss as to the cause of this error.
Chances are, there is some line with maybe one single comma in there, or none, or an empty line, whatever. Probably just put a try-except statement around the statement and catch the index error, probably printing out the line in question, and you should be done. Besides that, there are some things in your code, that might be worth to improve.
Have a look at the csv module especially. It has some optimized C-code exactly for what you want to do, so it should be much faster. This answer shows mainly how to write the iteration with csv.
This whole slice construction seems to be superfluous. A simple for line in f: will do and is the most efficient way to handle this iteration.
Use line.split(',') directly, instead of replacing them first with spaces.
Use with open(F) as f: instead of calling close yourself. For this script it might make no difference, but this way you make sure, that you e.g. don't create open file handles in case of errors.

User input after file input in Python?

First year Comp Sci student here.
I have an assignment that is asking us to make a simple game using Python, which takes an input file to create the game-world (2D grid). You're then supposed to give movement commands via user input afterwards. My program reads the input file one line at a time to create the world using:
def getFile():
try:
line = input()
except EOFError:
line = EOF
return line
...after which it creates a list to represent the line, with each member being a character in the line, and then creates a list containing each of these lists (amounting to a grid with row and column coordinates).
The thing is, I later need to take input in order to move the character, and I can't do this because it still wants to read the file input, and the last line from the file is an EOF character, causing an error. Specifically the "EOF when reading a line" error.
How can I get around this?
Sounds like you are reading the file directly from stdin -- something like:
python3 my_game.py < game_world.txt
Instead, you need to pass the file name as an argument to your program, that way stdin will still be connected to the console:
python3 my_game.py game_world.txt
and then get_file looks more like:
def getFile(file_name):
with open(file_name) as fh:
for line in fh:
return line
File interaction is python3 goes like this:
# the open keyword opens a file in read-only mode by default
f = open("path/to/file.txt")
# read all the lines in the file and return them in a list
lines = f.readlines()
#or iterate them at the same time
for line in f:
#now get each character from each line
for char_in_line in line:
#do something
#close file
f.close()
line terminator for the file is by default \n
If you want something else you pass it as a parameter to the open method (the newline parameter. Default=None='\n'):
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

Resources