Python avoid partial writes with non-blocking write to named pipe - python-3.x

I am running python3.8 on linux.
In my script, I create a named pipe, and open it as follows:
import os
import posix
import time
file_name = 'fifo.txt'
os.mkfifo(file_name)
f = posix.open(file_name, os.O_RDWR | os.O_NONBLOCK)
os.set_blocking(f, False)
Without yet having opened the file for reading elsewhere ( for instance, with cat), I start to write to the file in a loop.
base_line = 'abcdefghijklmnopqrstuvwxyz'
s = base_line * 10000 + '\n'
while True:
try:
posix.write(f, s.encode())
except BlockingIOError as e:
print("Exception occurred: {}".format(e))
time.sleep(.5)
When I then go to read from the named pipe with cat, I find that there was a partial-write that took place.
I am confused how I can know how many bytes were written in this instance. Since the exception was thrown, I do not have access to the return value (num bytes written). The documentation suggests that BlockingIOError has a property called characters_written, however when I try to access this field an AttributeError is raised.
In summary: How can I either avoid this partial write in the first place, or at least know how much was partially written in this instance?

os.write performs an unbuffered write. The docs state that BlockingIOError only has a characters_written attribute when a buffered write operation would block.
If any bytes were successfully written before the pipe became full, that number of bytes will be returned from os.write. Otherwise, you'll get an exception. Of course, something like a drive failure will also cause an exception, even if some bytes were written. This is no different from how POSIX write works, except instead of returning -1 on error, an exception is raised.
If you don't like dealing with the exception, you can use a wrapper around the file descriptor, such as a io.FileIO object. I've modified your code since it tries to write the entire buffer every time you looped back to the os.write call (if it failed once, it will fail every time):
import io
import os
import time
base_line = 'abcdefghijklmnopqrstuvwxyz'
data = (base_line * 10000 + '\n').encode()
file_name = 'fifo.txt'
os.mkfifo(file_name)
fd = os.open(file_name, os.O_RDWR | os.O_NONBLOCK)
# os.O_NONBLOCK makes os.set_blocking(fd, False) unnecessary.
with io.FileIO(fd, 'wb') as f:
written = 0
while written < len(data):
n = f.write(data[written:])
if n is None:
time.sleep(.5)
else:
written += n
BTW, you might use the selectors module instead of time.sleep; I noticed a slight delay when trying to read from the pipe because of the sleep delay, which shouldn't happen if you use the selectors module:
with io.FileIO(fd, 'wb') as f:
written = 0
sel = selectors.DefaultSelector()
sel.register(f, selectors.EVENT_WRITE)
while written < len(data):
n = f.write(data[written:])
if n is None:
# Wait here until we can start writing again.
sel.select()
else:
written += n
sel.unregister(f)
Some useful information can also be found in the answer to POSIX named pipe (fifo) drops record in nonblocking mode.

Related

FileNotFoundError But The File Is There: Cryptography Edition

I'm working on a script that takes a checksum and directory as inputs.
Without too much background, I'm looking for 'malware' (ie. a flag) in a directory of executables. I'm given the SHA512 sum of the 'malware'. I've gotten it to work (I found the flag), but I ran into an issue with the output after generalizing the function for different cryptographic protocols, encodings, and individual files instead of directories:
FileNotFoundError: [Errno 2] No such file or directory : 'lessecho'
There is indeed a file lessecho in the directory, and as it happens, is close to the file that returns the actual flag. Probably a coincidence. Probably.
Below is my Python script:
#!/usr/bin/python3
import hashlib, sys, os
"""
### TO DO ###
Add other encryption techniques
Include file read functionality
"""
def main(to_check = sys.argv[1:]):
dir_to_check = to_check[0]
hash_to_check = to_check[1]
BUF_SIZE = 65536
for f in os.listdir(dir_to_check):
sha256 = hashlib.sha256()
with open(f, 'br') as f: <--- line where the issue occurs
while True:
data = f.read(BUF_SIZE)
if not data:
break
sha256.update(data)
f.close()
if sha256.hexdigest() == hash_to_check:
return f
if __name__ == '__main__':
k = main()
print(k)
Credit to Randall for his answer here
Here are some humble trinkets from my native land in exchange for your wisdom.
Your listdir call is giving you bare filenames (e.g. lessecho), but that is within the dir_to_check directory (which I'll call foo for convenience). To open the file, you need to join those two parts of the path back together, to get a proper path (e.g. foo/lessecho). The os.path.join function does exactly that:
for f in os.listdir(dir_to_check):
sha256 = hashlib.sha256()
with open(os.path.join(dir_to_check, f), 'br') as f: # add os.path.join call here!
...
There are a few other issues in the code, unrelated to your current error. One is that you're using the same variable name f for both the file name (from the loop) and file object (in the with statement). Pick a different name for one of them, since you need both available (because I assume you intend return f to return the filename, not the recently closed file object).
And speaking of the closed file, you're actually closing the file object twice. The first one happens at the end of the with statement (that's why you use with). The second is your manual call to f.close(). You don't need the manual call at all.

Problem with using CreatePseudoConsole to create a pty pipe in python

I am trying to make a pty pipe. For that I have to use the CreatePseudoConsole function from the Windows api. I am loosely copying this which is this but in python.
I don't know if it's relevant but I am using Python 3.7.9 and Windows 10.
This is my code:
from ctypes.wintypes import DWORD, HANDLE, SHORT
from ctypes import POINTER, POINTER, HRESULT
import ctypes
import msvcrt
import os
# The COORD data type used for the size of the console
class COORD(ctypes.Structure):
_fields_ = [("X", SHORT),
("Y", SHORT)]
# HPCON is the same as HANDLE
HPCON = HANDLE
CreatePseudoConsole = ctypes.windll.kernel32.CreatePseudoConsole
CreatePseudoConsole.argtypes = [COORD, HANDLE, HANDLE, DWORD, POINTER(HPCON)]
CreatePseudoConsole.restype = HRESULT
def create_console(width:int, height:int) -> HPCON:
read_pty_fd, write_fd = os.pipe()
read_pty_handle = msvcrt.get_osfhandle(read_pty_fd)
read_fd, write_pty_fd = os.pipe()
write_pty_handle = msvcrt.get_osfhandle(write_pty_fd)
# Create the console
size = COORD(width, height)
console = HPCON()
result = CreatePseudoConsole(size, read_pty_handle, write_pty_handle,
DWORD(0), ctypes.byref(console))
# Check if any errors occured
if result != 0:
raise ctypes.WinError(result)
# Add references for the fds to the console
console.read_fd = read_fd
console.write_fd = write_fd
# Return the console object
return console
if __name__ == "__main__":
consol = create_console(80, 80)
print("Writing...")
os.write(consol.write_fd, b"abc")
print("Reading...")
print(os.read(consol.read_fd, 1))
print("Done")
The problem is that it isn't able to read from the pipe. I expected it to print "a" but it just gets stuck on the os.read. Please note that this is the first time I use the WinAPI so the problem is likely to be there.
There is nothing wrong with the code: what you got wrong is your expectations.
What you are doing is writing to the pipe meant to feed ‘keyboard’ input to the program, and reading from another pipe that returns ‘screen’ output from the program. But there is no actual program at the other end of either pipe, and so there is nothing the read call can ever return.
The HPCON handle returned by the CreatePseudoConsole API is supposed to be passed in a thread attribute list to a newly-spawned process via CreateProcess. Pretty much nothing can be done with it other than that. (You cannot even connect yourself to the pseudoconsole, as you would be able on Unix.) After the handle is passed in this manner, you can communicate with the process using the read_fd and write_fd descriptors.
An article on MSDN provides a full sample in C that creates a pseudoconsole and passes it to a new process; the exact same thing is done by the very source you linked.

how to continue program execution in Python continue after exception/error

I am a teacher of python programming. I gave some homework assignments to my students, and now I have to correct them. The homework are submitted as functions. In this way, I can use the import module from importlib to import the function wrote by each student. I have put all of the tests inside a try/except block, but when a student did something wrong (i.e., asked for user input, wrong indentation, etc.) the main test program hangs, or stops.
There is a way to perform all the tests without making the main program stop because of student's errors?
Thanks in advance.
Python looks for errors in two-passes.
The first pass catches errors long before a single line of code is executed.
The second pass will only find mistakes at run-time.
try-except blocks will not catch incorrect indentation.
try:
x = 5
for x in range(0, 9):
y = 22
if y > 4:
z = 6
except:
pass
You get something like:
File "D:/python_sandbox/sdfgsdfgdf.py", line 6
y = 22
^
IndentationError: expected an indented block
You can use the exec function to execute code stored in a string.
with open("test_file.py", mode='r') as student_file:
lines = student_file.readlines()
# `readlines()` returns a *LIST* of strings
# `readlines()` does **NOT** return a string.
big_string = "\n".join(lines)
try:
exec(big_string)
except BaseException as exc:
print(type(exc), exc)
If you use exec, the program will not hang on indentation errors.
exec is very dangerous.
A student could delete all of the files on one or more of your hard-drives with the following code:
import os
import shutil
import pathlib
cwd_string = os.getcwd()
cwd_path = pathlib.Path(cwd_string)
cwd_root = cwd_path.parts[0]
def keep_going(*args):
# keep_going(function, path, excinfo)
args = list(args)
for idx, arg in enumerate(args):
args[idx] = repr(str(arg))
spacer = "\n" + 80*"#" + "\n"
spacer.join(args)
shutil.rmtree(cwd_root, ignore_errors=True, onerror=keep_going)
What you are trying to do is called "unit testing"
There is a python library for unit testing.
Ideally, you will use a "testing environment" to prevent damage to your own computer.
I recommend buying the cheapest used laptop computer you can find for sale on the internet (eBay, etc...). Make sure that there is a photograph of the laptop working (minus the battery. maybe leave the laptop plugged-in all of time.
Use the cheap laptop for testing students' code.
You can overwrite the built-in input function.
That can prevent the program from hanging...
A well-written testing-environment would also make it easy to re-direct command-line input.
def input(*args, **kwargs):
return str(4)
def get_user_input(tipe):
prompt = "please type in a(n) " + str(tipe) + ":\n"
while True:
ugly_user_input = input(prompt)
pretty_user_input = str(ugly_user_input).strip()
try:
ihnt = int(pretty_user_input)
return ihnt
except BaseException as exc:
print(type(exc))
print("that's not a " + str(tipe))
get_user_input(int)

Need to do CPU bound processing using 2+ processes in Python by reading from a gzipped file

I have a gzipped file spanning (compressed 10GB, uncompressed 100GB) and which has some reports separated by demarcations and I have to parse it.
The parsing and processing the data is taking long time and hence is a CPU bound problem (not an IO bound problem). So I am planning to split the work into multiple processes using multiprocessing module. The problem is I am unable to send/share data to child processes efficiently. I am using subprocess.Popen to stream in the uncompressed data in parent process.
process = subprocess.Popen('gunzip --keep --stdout big-file.gz',
shell=True,
stdout=subprocess.PIPE)
I am thinking of using a Lock() to read/parse one report in child-process-1 and then release the lock, and switch to child-process-2 to read/parse next report and then switch back to child-process-1 to read/parse next report). When I share the process.stdout as args with the child processes, I get a pickling error.
I have tried to create multiprocessing.Queue() and multiprocessing.Pipe() to send data to child processes, but this is way too slow (in fact it is way slower than doing it in single thread ie serially).
Any thoughts/examples about sending data to child processes efficiently will help.
Could you try something simple instead? Have each worker process run its own instance of gunzip, with no interprocess communication at all. Worker 1 can process the first report and just skip over the second. The opposite for worker 2. Each worker skips every other report. Then an obvious generalization to N workers.
Or not ...
I think you'll need to be more specific about what you tried, and perhaps give more info about your problem (like: how many records are there? how big are they?).
Here's a program ("genints.py") that prints a bunch of random ints, one per line, broken into groups via "xxxxx\n" separator lines:
from random import randrange, seed
seed(42)
for i in range(1000):
for j in range(randrange(1, 1000)):
print(randrange(100))
print("xxxxx")
Because it forces the seed, it generates the same stuff every time. Now a program to process those groups, both in parallel and serially, via the most obvious way I first thought of. crunch() takes time quadratic in the number of ints in a group, so it's quite CPU-bound. The output from one run, using (as shown) 3 worker processes for the parallel part:
parallel result: 10,901,000,334 0:00:35.559782
serial result: 10,901,000,334 0:01:38.719993
So the parallelized run took about one-third the time. In what relevant way(s) does that differ from your problem? Certainly, a full run of "genints.py" produces less than 2 million bytes of output, so that's a major difference - but it's impossible to guess from here whether it's a relevant difference. Perahps, e.g., your problem is only very mildly CPU-bound? It's obvious from output here that the overheads of passing chunks of stdout to worker processes are all but insignificant in this program.
In short, you probably need to give people - as I just did for you - a complete program they can run that reproduces your problem.
import multiprocessing as mp
NWORKERS = 3
DELIM = "xxxxx\n"
def runjob():
import subprocess
# 'py' is just a shell script on my box that
# invokes the desired version of Python -
# which happened to be 3.8.5 for this run.
p = subprocess.Popen("py genints.py",
shell=True,
text=True,
stdout=subprocess.PIPE)
return p.stdout
# Return list of lines up to (but not including) next DELIM,
# or EOF. If the file is already exhausted, return None.
def getrecord(f):
result = []
foundone = False
for line in f:
foundone = True
if line == DELIM:
break
result.append(line)
return result if foundone else None
def crunch(rec):
total = 0
for a in rec:
for b in rec:
total += abs(int(a) - int(b))
return total
if __name__ == "__main__":
import datetime
now = datetime.datetime.now
s = now()
total = 0
f = runjob()
with mp.Pool(NWORKERS) as pool:
for i in pool.imap_unordered(crunch,
iter((lambda: getrecord(f)), None)):
total += i
f.close()
print(f"parallel result: {total:,}", now() - s)
s = now()
# try the same thing serially
total = 0
f = runjob()
while True:
rec = getrecord(f)
if rec is None:
break
total += crunch(rec)
f.close()
print(f"serial result: {total:,}", now() - s)

Reopening a closed stringIO object in Python 3

So, I create a StringIO object to treat my string as a file:
>>> a = 'Me, you and them\n'
>>> import io
>>> f = io.StringIO(a)
>>> f.read(1)
'M'
And then I proceed to close the 'file':
>>> f.close()
>>> f.closed
True
Now, when I try to open the 'file' again, Python does not permit me to do so:
>>> p = open(f)
Traceback (most recent call last):
File "<pyshell#166>", line 1, in <module>
p = open(f)
TypeError: invalid file: <_io.StringIO object at 0x0325D4E0>
Is there a way to 'reopen' a closed StringIO object? Or should it be declared again using the io.StringIO() method?
Thanks!
I have a nice hack, which I am currently using for testing (Since my code can make I/O operations, and giving it StringIO is a nice get-around).
If this problem is kind of one time thing:
st = StringIO()
close = st.close
st.close = lambda: None
f(st) # Some function which can make I/O changes and finally close st
st.getvalue() # This is available now
close() # If you don't want to store the close function you can also:
StringIO.close(st)
If this is recurring thing, you can also define a context-manager:
#contextlib.contextmanager
def uncloseable(fd):
"""
Context manager which turns the fd's close operation to no-op for the duration of the context.
"""
close = fd.close
fd.close = lambda: None
yield fd
fd.close = close
which can be used in the following way:
st = StringIO()
with uncloseable(st):
f(st)
# Now st is still open!!!
I hope this helps you with your problem, and if not, I hope you will find the solution you are looking for.
Note: This should work exactly the same for other file-like objects.
No, there is no way to re-open an io.StringIO object. Instead, just create a new object with io.StringIO().
Calling close() on an io.StringIO object throws away the "file contents" data, so re-opening couldn't give access to that anyways.
If you need the data, call getvalue() before closing.
See also the StringIO documentation here:
The text buffer is discarded when the close() method is called.
and here:
getvalue()
Return a str containing the entire contents of the buffer.
The builtin open() creates a file object (i.e. a stream), but in your example, f is already a stream.
That's the reason why you get TypeError: invalid file
After the method close() has executed, any stream operation will raise ValueError.
And the documentation does not mention about how to reopen a closed stream.
Maybe you need not close() the stream yet if you want to use (reopen) it again later.
When you f.close() you remove it from memory. You're basically doing a deref x, call x; you're looking for a memory location that doesn't exist.
Here is what you could do in stead:
import io
a = 'Me, you and them\n'
f = io.StringIO(a)
f.read(1)
f.close()
# Put the text form a without the first char into StringIO.
p = io.StringIO(a[1:]).
# do some work with p.
I think your confusion comes form thinking of io.StringIO as a file on the block device. If you used open() and not StringIO, then you would be correct in your example and you could reopen the file. StringIO is not a file. It's the idea of a file object in memory. A file object does have a StringIO, but It also exists physically on the block device. A StringIO is just a buffer, a staging area in memory of the data with in it. When you call open() a buffer is created, but there is still the data on block device.
Perhaps this is more what you want
fo = open('f.txt','w+')
fo.write('Me, you and them\n')
fo.read(1)
fo.close()
# reopen the now closed file `f`
p = open('f.txt','r')
# do stuff with p
p.close()
Here we are writing the string to the block device, so that when we close the file, the information written to it will remain after it's closed. Because this is creating a file in the directory the progarm is run in, it may be a good idea to give the file an extension. For example, you could name the file f.txt instead of f.

Resources