Stuck in infinite loop while trying to read all lines in proc.stdout.readline - python-3.x

I am trying to read each line in proc.stdout.readline and send the lines over the network, for example:
data = b''
for line in iter(proc.stdout.readline, ''):
data += line
clientsocket.send(data)
When I run this code I seem to be stuck in a inifinite loop unable to escape to the line:
clientsocket.send(data)
Is there a more efficient way to read the data? I've tried also with a while loop and breaking 'if not line':
data = b''
while True:
line += proc.stdout.readline()
data += line
if not line:
break
clientsocket.send(data)
This seems to also produce the same results. Is there a more efficient way to read all of the data from proc.stdout.readline?

I've encountered the same very problem. The strange thing that in Python 2.7 it had no problem to converge and actually stop iterating.
During debug (in Python 3.5) I've noticed that all true lines returned with the '\n' character, whereas the line that wasn't suppose to arrive returned as an empty string, i.e. ''. So, I just added an if-clause checking against '' and breaking the loop if positive.
My final version looks as follows:
lines = []
for _line in iter(process.stdout.readline, b''):
if _line == '':
break
lines.append(_line)
One thing that might be worth to mention, is that I used universal_newlines=True argument upon subprocess.Popen(..) call.

The statement: iter(proc.stdout.readline, "") will do a blocking read until it recieves an EOF.
If you want to read all the lines, then you can just do:
data = b''
data = b"".join(proc.stdout.readlines())
There is no other solution than for the proc to produce lines faster.
If you want, you can read lines with timeout (i.e. you can wait to read a select number of characters, or timeout if that number of characters are not read).
Those answers can be found here:
https://stackoverflow.com/a/10759061/6400614 .
https://stackoverflow.com/a/5413588/6400614

Related

not getting return value through function or method in python

in this program i am iterating the function and adding the result into the file it works fine, no issue whatsoever but when i am trying to take the value from the return of last call, it just return nothing even though the variable is not empty.because the else part only runs for a single time.
#this is an ipynb file so spacing means they are getting executed from different blocks
def intersection(pre,i=0,point=0,count=0,result=dt):
index=-1
prefer=[]
# print(i)
if(0<i):
url = "../data/result.csv"
result= pd.read_csv(url,names=["a","b","c","d","e"])
if(i<len(pre)):
for j in result[pre[i]]:
index=index+1
if(demand[pre[i]][1] >= j):
prefer.append(result.iloc[index,:])
i=i+1
file = open('../data/result.csv', 'w+', newline ='')
header = ["a","b","c","d","e"]
writer = csv.DictWriter(file, fieldnames = header)
# writing data row-wise into the csv file
writer.writeheader()
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(prefer)
count=count+1
# print(prefer,count) print the outputs step by step
intersection(pre,i,point,count,result)
else:
print("Else Part",type(result))
print(result)
return result
#
pre=["a","b","c"]
rec=intersection(pre)
print(rec)
Output
it prints all the value of result from else part i have excluded it in snapshot because it was too vast and i have few fields here but it wil not effect, for the problem which i am getting... please answer if you know how can i take the value of result into rec.
OK. The code is a bit more complex than I thought. I was trying to work through it just now, and I hit some bugs. Maybe you can clear them up for me.
In the function call, def intersection(pre,i=0,point=0,count=0,result=dt):, dt isn't defined. What should it be?
On the fourth line, i<0 - the default value of i is zero so, unless i is given a value on calling the function, this piece of code will never run.
I notice that the file being read and the file being written are the same: ../data/result.csv - is this correct?
There's another undefined variable, demand, on line 14. Can you fill that in?
Let's see where we are after that.

python: How to read a file and store each line using map function?

I'm trying to reconvert a program that I wrote but getting rid of all for loops.
The original code reads a file with thousands of lines that are structured like:
Ex. 2 lines of a file:
As you can see, the first line starts with LPPD;LEMD and the second line starts with DAAE;LFML. I'm only interested in the very first and second element of each line.
The original code I wrote is:
# Libraries
import sys
from collections import Counter
import collections
from itertools import chain
from collections import defaultdict
import time
# START
# #time=0
start = time.time()
# Defining default program argument
if len(sys.argv)==1:
fileName = "file.txt"
else:
fileName = sys.argv[1]
takeOffAirport = []
landingAirport = []
# Reading file
lines = 0 # Counter for file lines
try:
with open(fileName) as file:
for line in file:
words = line.split(';')
# Relevant data, item1 and item2 from each file line
origin = words[0]
destination = words[1]
# Populating lists
landingAirport.append(destination)
takeOffAirport.append(origin)
lines += 1
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
airports_dict = defaultdict(list)
# Merge lists into a dictionary key:value
for key, value in chain(Counter(takeOffAirport).items(),
Counter(landingAirport).items()):
# 'AIRPOT_NAME':[num_takeOffs, num_landings]
airports_dict[key].append(value)
# Sum key values and add it as another value
for key, value in airports_dict.items():
#'AIRPOT_NAME':[num_totalMovements, num_takeOffs, num_landings]
airports_dict[key] = [sum(value),value]
# Sort dictionary by the top 10 total movements
airports_dict = sorted(airports_dict.items(),
key=lambda kv:kv[1], reverse=True)[:10]
airports_dict = collections.OrderedDict(airports_dict)
# Print results
print("\nAIRPORT"+ "\t\t#TOTAL_MOVEMENTS"+ "\t#TAKEOFFS"+ "\t#LANDINGS")
for k in airports_dict:
print(k,"\t\t", airports_dict[k][0],
"\t\t\t", airports_dict[k][1][1],
"\t\t", airports_dict[k][1][0])
# #time=1
end = time.time()- start
print("\nAlgorithm execution time: %0.5f" % end)
print("Total number of lines read in the file: %u\n" % lines)
airports_dict.clear
takeOffAirport.clear
landingAirport.clear
My goal is to simplify the program using map, reduce and filter. So far I have sorted teh creation of the two independent lists, one for each first element of each file line and another list with the second element of each file line by using:
# Creates two independent lists with the first and second element from each line
takeOff_Airport = list(map(lambda sub: (sub[0].split(';')[0]), lines))
landing_Airport = list(map(lambda sub: (sub[0].split(';')[1]), lines))
I was hoping to find the way to open the file and achieve the exact same result as the original code by been able to opemn the file thru a map() function, so I could pass each list to the above defined maps; takeOff_Airport and landing_Airport.
So if we have a file as such
line 1
line 2
line 3
line 4
and we do like this
open(file_name).read().split('\n')
we get this
['line 1', 'line 2', 'line 3', 'line 4', '']
Is this what you wanted?
Edit 1
I feel this is somewhat reduntant but since map applies a function to each element of an iterator we will have to have our file name in a list, and we ofcourse define our function
def open_read(file_name):
return open(file_name).read().split('\n')
print(list(map(open_read, ['test.txt'])))
This gets us
>>> [['line 1', 'line 2', 'line 3', 'line 4', '']]
So first off, calling split('\n') on each line is silly; the line is guaranteed to have at most one newline, at the end, and nothing after it, so you'd end up with a bunch of ['all of line', ''] lists. To avoid the empty string, just strip the newline. This won't leave each line wrapped in a list, but frankly, I can't imagine why you'd want a list of one-element lists containing a single string each.
So I'm just going to demonstrate using map+strip to get rid of the newlines, using operator.methodcaller to perform the strip on each line:
from operator import methodcaller
def readFile(fileName):
try:
with open(fileName) as file:
return list(map(methodcaller('strip', '\n'), file))
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
Sadly, since your file is context managed (a good thing, just inconvenient here), you do have to listify the result; map is lazy, and if you didn't listify before the return, the with statement would close the file, and pulling data from the map object would die with an exception.
To get around that, you can implement it as a trivial generator function, so the generator context keeps the file open until the generator is exhausted (or explicitly closed, or garbage collected):
def readFile(fileName):
try:
with open(fileName) as file:
yield from map(methodcaller('strip', '\n'), file)
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
yield from will introduce a tiny amount of overhead over directly iterating the map, but not much, and now you don't have to slurp the whole file if you don't want to; the caller can just iterate the result and get a split line on each iteration without pulling the whole file into memory. It does have the slight weakness that opening the file will be done lazily, so you won't see the exception (if there is any) until you begin iterating. This can be worked around, but it's not worth the trouble if you don't really need it.
I'd generally recommend the latter implementation as it gives the caller flexibility. If they want a list anyway, they just wrap the call in list and get the list result (with a tiny amount of overhead). If they don't, they can begin processing faster, and have much lower memory demands.
Mind you, this whole function is fairly odd; replacing IOErrors with prints and (implicitly) returning None is hostile to API consumers (they now have to check return values, and can't actually tell what went wrong). In real code, I'd probably just skip the function and insert:
with open(fileName) as file:
for line in map(methodcaller('strip', '\n'), file)):
# do stuff with line (with newline pre-stripped)
inline in the caller; maybe define split_by_newline = methodcaller('split', '\n') globally to use a friendlier name. It's not that much code, and I can't imagine that this specific behavior is needed in that many independent parts of your file, and inlining it removes the concerns about when the file is opened and closed.

Reading file and getting values from a file. It shows only first one and others are empty

I am reading a file by using a with open in python and then do all other operation in the with a loop. While calling the function, I can print only the first operation inside the loop, while others are empty. I can do this by using another approach such as readlines, but I did not find why this does not work. I thought the reason might be closing the file, but with open take care of it. Could anyone please suggest me what's wrong
def read_datafile(filename):
with open(filename, 'r') as f:
a = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==2]
b = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==3]
c = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==2]
return a, b, c
read_datafile('data_file_name')
I only get values for a and all others are empty. When 'a' is commented​, I get value for b and others are empty.
Updates
The file looks like this:
-0.6908270760153553 -0.4493128078936575 0.5090918714784820
0.6908270760153551 -0.2172871921063448 0.5090918714784820
-0.0000000000000000 0.6666999999999987 0.4597549674638203
0.3097856229862140 -0.1259623621214220 0.5475896447896115
0.6902143770137859 0.4593623621214192 0.5475896447896115
The construct
with open(filename) as handle:
a = [line for line in handle if condition]
b = [line for line in handle]
will always return an empty b because the iterator in a already consumed all the data from the open filehandle. Once you reach the end of a stream, additional attempts to read anything will simply return nothing.
If the input is seekable, you can rewind it and read all the same lines again; or you can close it (explicitly, or implicitly by leaving the with block) and open it again - but a much more efficient solution is to read it just once, and pick the lines you actually want from memory. Remember that reading a byte off a disk can easily take several orders of magnitude more time than reading a byte from memory. And keep in mind that the data you read could come from a source which is not seekable, such as standard output from another process, or a client on the other side of a network connection.
def read_datafile(filename):
with open(filename, 'r') as f:
lines = [line for line in f]
a = lines[2]
b = lines[3]
c = lines[2]
return a, b, c
If the file could be too large to fit into memory at once, you end up with a different set of problems. Perhaps in this scenario, where you only seem to want a few lines from the beginning, only read that many lines into memory in the first place.
What exactly are you trying to do with this script? The lines variable here may not contain what you want: it will contain a single line because the file gets enumerated by lines.

Python 3 going through a file until EOF. File is not just a set of similar lines needing processing

The answers to questions of the type "How do I do "while not eof(file)""
do not quite cover my issue
I have a file with a format like
header block
data
another header block
more data (with arbitrary number of data lines in each data block)
...
I do not know how many header-data sets there are
I have successfully read the first block, then a set of data using loops that look for the blank line at the end of the data block.
I can't just use the "for each line in openfile" type approach as I need to read the header-data blocks one at a time and then process them.
How can I detect the last header-data block.
My current approach is to use a try except construction and wait for the exception. Not terribly elegant.
It's hard to answer without seeing any of your code...
But my guess is that you are reading the file with fp.read():
fp = open("a.txt")
while True:
data = fp.read()
Instead:
try to pass always the length of data you spected
Check if the read chunck is a empty string, not None
For example:
fp = open("a.txt")
while True:
header = fp.read(headerSize)
if header is '':
# End of file
break
read_dataSize_from_header
data = fp.read(dataSize)
if data is '':
# Error reading file
raise FileError('Error reading file')
process_your_data(data)
This is some time later but I post this for others who do this search.
The following script, suitably adjusted, will read a file and deliver lines until the EOF.
"""
Script to read a file until the EOF
"""
def get_all_lines(the_file):
for line in the_file:
if line.endswith('\n'):
line = line[:-1]
yield line
line_counter = 1
data_in = open('OAall.txt')
for line in get_all_lines(data_in):
print(line)
print(line_counter)
line_counter += 1
data_in.close()

Searching through a huge list of short strings

I have a HUGE plaintext file with 1 billion strings, where average string length is around 10-12, with potential duplicates, and each string is on a different line. My task is that, when given a query string, find the line of first match if the string exists in my file, or return "not found."
A natural solution is to run grep -m1 -n '^querystring$' every time, which takes around 15-20 seconds, and this does not require extra storage and is not memory-intensive. Is this a good solution, or is there something much better?
(N.B. As a rough guide, my storage requirement: <10GB, and memory requirement: <16GB)
you can use a simple python code for that:
file = 'file.txt'
queryLine = 0
with open(file, 'r') as f:
for line in f:
if <YOUR QUERY> in line: return queryLine
else: queryLine += 1
this way, you break when finding a match, instead using grep and every time go over the whole file.
Here's a Python solution:
When you open a file you get an iterator giving you one line at a time, which is very memory efficient. My suggestion is to enumerate the file and get the first line line meeting your criterion like this:
def first_occurrence(filename, query):
with open(filename) as f:
filtered = (i for i, line in enumerate(f, 1) if query in line)
return next(filtered, 'not found')
If there's no such line, the default value 'not found' is returned. filtered = (...) builds a generator by employing a generator expression. Generators are iterators, so this part is memory efficient as well.

Resources