Dataclass that can 'yeld' a new line from a file when requested

Dataclass that can 'yeld' a new line from a file when requested - python-3.x

With something like the following code I've used until now to create a generator that would read a very big file line by line and allow me to work on each line as I wish.
def readfile(self):
with open(self.filename) as infile:
for line in infile:
yield line
What would be a good way to edit this so as to get the new line every time I am calling a function, e.g.:
#dataclass
class Reader:
filename: str
line: int = field(default=None)
def __post_init__(self):
self.file = open(self.filename)
self.line = 1
def __del__(self):
self.file.close()
def next_line(self):
...
So ideally I would call next_line and get back the next line of file filename.

I don't quite understand why you want to create a class just to call a method that returns the next line of a file so I created just a function. If you need to, you can use it as a method in some class as well.
A file object IS already a generator. Therefore you can call the __next()__ method directly on the file object. __next__() returns the next value of a generator.
def next_line():
return f.__next__()
with open('file.txt') as f:
print(next_line()) # a line
second = next_line() # next line
print(next_line()) # next line
Or you can even omit the function completely:
with open('file.txt') as f:
print(f.__next__()) # a line
second = f.__next__() # next line
print(f.__next__()) # next line

Related

how to interact between 2 python scripts?

I'm writing program to run one script that takes pictures and writes the number into txt file and after its done it should tell other file you can read that txt file. I can't seem to import this "Perrasytas" variable to other script. It just says its not defined.
Script1
if line==('echo:SD card ok'):
Perrasytas=0
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
GPIO.output(4,GPIO.LOW)
with open(cnt, 'r') as f:
line = f.read()
num = int((line.split())[0])+1
with open(cnt, 'w') as f:
f.write(str(num))
Perrasytas=1
Script2
import Script1
if Script1.Perrasytas == 1:
cnt2='/home/pi/Prints_photos/counter.txt'
with open(cnt2, 'r') as f:
num2 = f.read()
If I leave "Perrasytas=0" variable at the first line of code it does import, but it doesn't change its state..
is it even possible to do such communication thing between scripts?

Importing a module only runs its code once -- the first time you import it. Since the value of line is probably not equal to "echo:..." when the import happens, the if block is never entered, and the Perrasytas variable is never set.
You could put all of that code into a function, and return the value of Perrasytas from that function. That way, you can execute that code whenever you call the function.
def get_perrasytas(line):
if line==('echo:SD card ok'):
Perrasytas=0
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
GPIO.output(4,GPIO.LOW)
with open(cnt, 'r') as f:
line = f.read()
num = int((line.split())[0])+1
with open(cnt, 'w') as f:
f.write(str(num))
Perrasytas=1
return Perrasytas
Then, you could call it like so:
import Script1
if Script1.get_perrasytas(line) == 1:
cnt2='/home/pi/Prints_photos/counter.txt'
with open(cnt2, 'r') as f:
num2 = f.read()
Note that you will need to have line before you call the function, or include a way to get line inside the function.

Gooey from argument to read file

So the first argument is the file to open and the second argument is the pattern (or text) to search for.
The program is made to scan a document and find items equal to "Pattern" and print the detector address in "DetectorPattern". I got this program working without Gooey but i thought about adding it for ease of use. My problem lies when the argument get passed to the "with open(filename)" line.
This is the error i get:
Traceback (most recent call last):
File "C:/Users/haral/Google Drive (synkroniseres ikke)/Programmering/Programmer/LogSearch/LogSearchGooey.py", line 42, in <module>
main()
File "C:\Users\haral\PycharmProjects\AutomateBoringStuff\venv\lib\site-packages\gooey\python_bindings\gooey_decorator.py", line 134, in <lambda>
return lambda *args, **kwargs: func(*args, **kwargs)
File "C:/Users/haral/Google Drive (synkroniseres ikke)/Programmering/Programmer/LogSearch/LogSearchGooey.py", line 27, in main
with open(filename, 'r') as reader:
TypeError: expected str, bytes or os.PathLike object, not Namespace
import os
import re
from gooey import Gooey, GooeyParser
pattern = ""
# Chosen search pattern
detectorPattern = re.compile(r'\d\d\.\d\d\d')
# Fire alarm detector pattern, etc. 03.040
filename = ""
foundDetector = []
#Gooey
def main():
parser = GooeyParser(description="Testing")
parser.add_argument(
"Filename",
help="Choose a file",
widget="FileChooser"
)
parser.add_argument(
"store",
help="Choose a pattern to search for"
)
filename = parser.parse_args()
with open(filename, 'r') as reader:
# Read and print the entire file line by line
for line in reader:
findLine = re.search(pattern, line)
if findLine is not None:
mo = detectorPattern.search(findLine.string)
mog = mo.group()
if mog not in foundDetector:
foundDetector.append(mog)
for x in foundDetector:
print(x)
if __name__ == '__main__':
main()

How to seek write pointer in append mode?

I am trying to open a file read it's content and write to it by using the contents that were read earlier. I am opening the file in 'a+' mode. I can't use 'r+' mode since it won't create a file if it doesn't exist.

a+ will put the pointer in the end of the file.
You can save it with tell() for later writing.
Then use seek(0,0) to return to file beginning for reading.
tell()
seek()

Default open
Using the default a(+) option, it is not possible, as provided in the documentation:
''mode is an optional string that specifies the mode in which the file
is opened. It defaults to 'r' which means open for reading in text
mode. Other common values are 'w' for writing (truncating the file if
it already exists), 'x' for creating and writing to a new file, and
'a' for appending (which on some Unix systems, means that all writes
append to the end of the file regardless of the current seek position).''
Alternative
Using the default open, this is not possible.However we can of course create our own file handler, that will create a file in r and r+ mode when it doesn't exists.
A minimal working example that works exactly like open(filename, 'r+', *args, **kwargs), would be:
import os
class FileHandler:
def __init__(self, filename, mode='r', buffering=None, encoding=None, errors=None, newline=None, closefd=True):
self.filename = filename
self.mode = mode
self.kwargs = dict(buffering=buffering, encoding=encoding, errors=errors, newline=newline, closefd=closefd)
if self.kwargs['buffering'] is None:
del self.kwargs['buffering']
def __enter__(self):
if self.mode.startswith('r') and not os.path.exists(self.filename):
with open(self.filename, 'w'): pass
self.file = open(self.filename, self.mode, **self.kwargs)
return self.file
def __exit__(self, exc_type, exc_val, exc_tb):
self.file.close()
Now when you use the following code:
with FileHandler("new file.txt", "r+") as file:
file.write("First line\n")
file.write("Second line\n")
file.seek(0, 0)
file.write("Third line\n")
It will generate a new file new file.txt, when it doesn't exists, with the context:
Third line
Second line
If you would use the open you will receive a FileNotFoundError, if the file doesn't exists.
Notes
I am only creating a new file when the mode starts with an r, all other files are handled as would be by the normal open function.
For some reason passing buffering=None, directly to the open function crashes it with an TypeError: an integer is required (got type NoneType), therefore I had to remove it from the key word arguments if it was None. Even though it is the default argument according to the documentation (if any one knows why, please tell me)
Edit
The above code didn't handle the following cases:
file = FileHandler("new file.txt", "r+")
file.seek(0, 0)
file.write("Welcome")
file.close()
In order to support all of the open use cases, the above class can be adjusted by using __getattr__ as follows:
import os
class FileHandler:
def __init__(self, filename, mode='r', buffering=None, encoding=None, errors=None, newline=None, closefd=True):
self.filename = filename
self.mode = mode
self.kwargs = dict(buffering=buffering, encoding=encoding, errors=errors, newline=newline, closefd=closefd)
if self.kwargs['buffering'] is None:
del self.kwargs['buffering']
if self.mode.startswith('r') and not os.path.exists(self.filename):
with open(self.filename, 'w'): pass
self.file = open(self.filename, self.mode, **self.kwargs)
def __enter__(self):
return self.file
def __exit__(self, exc_type, exc_val, exc_tb):
self.file.close()
def __getattr__(self, item):
if hasattr(self.file, item):
return getattr(self.file, item)
raise AttributeError(f"{type(self).__name__}, doesn't have the attribute {item!r}")

Reading from file raises IndexError in python

I am making an app which will return one random line from the .txt file. I made a class to implement this behaviour. The idea was to use one method to open file (which will remain open) and the other method which will close it after the app exits. I do not have much experience in working with files hence the following behaviour is strange to me:
In __init__ I called self.open_file() in order to just open it. And it works fine to get self.len. Now I thought that I do not need to call self.open_file() again, but when I call file.get_term()(returns random line) it raises IndexError (like the file is empty), But, if I call file.open_file() method again, everything works as expected.
In addition to this close_file() method raises AttributeError - object has no attribute 'close', so I assumed the file closes automatically somehow, even if I did not use with open.
import random
import os
class Pictionary_file:
def __init__(self, file):
self.file = file
self.open_file()
self.len = self.get_number_of_lines()
def open_file(self):
self.opened = open(self.file, "r", encoding="utf8")
def get_number_of_lines(self):
i = -1
for i, line in enumerate(self.opened):
pass
return i + 1
def get_term_index(self):
term_line = random.randint(0, self.len-1)
return term_line
def get_term(self):
term_line = self.get_term_index()
term = self.opened.read().splitlines()[term_line]
def close_file(self):
self.opened.close()
if __name__ == "__main__":
print(os.getcwd())
file = Pictionary_file("pictionary.txt")
file.open_file() #WITHOUT THIS -> IndexError
file.get_term()
file.close() #AttributeError
Where is my mistake and how can I correct it?

Here in __init__:
self.open_file()
self.len = self.get_number_of_lines()
self.get_number_of_lines() actually consumes the whole file because it iterates over it:
def get_number_of_lines(self):
i = -1
for i, line in enumerate(self.opened):
# real all lines of the file
pass
# at this point, `self.opened` is empty
return i + 1
So when get_term calls self.opened.read(), it gets an empty string, so self.opened.read().splitlines() is an empty list.
file.close() is an AttributeError, because the Pictionary_file class doesn't have the close method. It does have close_file, though.

Multiprocessing in enumerate loop

I have a script that downloads images from urls, but I would like to parallelise it otherwise it will take hours. With this code:
import requests
from math import floor, log10
import urllib
import time
import multiprocessing
with open('images.csv', 'r') as f:
images = f.readlines()
num_position = floor(log10(len(images)) + 1)
a = time.time()
for i, image in enumerate(images[1:10]):
if (i+1) % 1000 == 0:
print('Downloading {} image'.format(i+1) )
# a = time.time()
with open(str(i).zfill(num_position)+'a.jpg', 'wb') as file:
try:
writing = file.write(requests.get(image.split(',')[2]).content)
p = multiprocessing.Process(target=writing, args=(image,))
p.start()
p.join()
except:
print('Skipping an image!')
pass
b = time.time()
print('multiple process -- {}'.format(b-a))
I get an error :
Process Process-9:
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: 'int' object is not callable
Why am I getting an error but the task is still completed and the code doesn't break? (and by that I mean the piece in try: )
What would be the easiest way to include some kind of paralleling here?

You get the error because AFAIK this line
writing = file.write(requests.get(image.split(',')[2]).content)
has the output of integer type. write returns the number of written characters which is equal to the length of the string-representation of your image. Now you assign that to the variable writing -> writing becomes a number.
p = multiprocessing.Process(target=writing, args=(image,))
calls writing as target function, which raises the error since your are not calling a function but integer-type writing (not callable). The code works since your workers do not have anything to do and close immediatly and the file is already written.
To get things working, your would have to define a function that takes your image as argument and maybe the file name. This function you later call in the setup of your workers. Something like that:
def write_file(image, filename):
filestream = open(filename, mode="w")
filestream.write(requests.get(image.split(',')[2]).content)
filestream.close()
And in your application
p = multiprocessing.Process(target=write_file, args=(image, filename,))
However, that is just the writing part. If you want to do the downloads in separate task too then you have to put the code for that into your separate function.
def download_write(urls):
for url in iter(urls.get, 'STOP'):
#download code here#
filestream = open(filename, mode="w")
filestream.write(requests.get(image.split(',')[2]).content)
filestream.close()
And your main application:
list_urls = [] # your list of urls to download
urls = Queue()
for element in list_urls:
urls.put(element)
p = multiprocessing.Process(target=download_write, args=(urls,))
urls.put("STOP") #signals end of tasks for your workers
p.start() #start worker
p.join() #wait for worker to finish

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Dataclass that can 'yeld' a new line from a file when requested - python-3.x

Related

how to interact between 2 python scripts?

Gooey from argument to read file

How to seek write pointer in append mode?

Reading from file raises IndexError in python

Multiprocessing in enumerate loop

Categories

Resources