I have a 6000 lines long data file which I'm going to load up in buffer parse it and write to another json file. What is a better way to accomplish this task ? Should I load the file in buffer, then parse it , and then write it to the file ? Or should I load chunk of file in buffer, process it, and write it to the while keeping tasks simultaneously ? Is this close to async function in javascript ? Is there examples in python for simple file loading and writing to a file ?
You can use aiofiles:
async with aiofiles.open('filename', mode='r') as f:
async for line in f:
print(line)
They have good usage documentation in their GitHub repo.
Related
I have a huge CSV (1,5GB) which I need to process line by line and construct 2 xml files. When I run the processing alone my program takes about 4 minutes to execute, if I also generate my xml files it takes over 2.5 hours to generate two 9GB xml files.
My code for writing the xml files is really simple, I use fs.appendFileSync to write my opening/closing xml tags and the text inside them. To sanitize the data I run this function on the text inside the xml tags.
function() {
return this.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """)
.replace(/'/g, "'");
};
Is there something I could optimize to reduce the execution time?
fs.appendFileSync() is a relatively expensive operation: it opens the file, appends the data, then closes it again.
It'll be faster to use a writeable stream:
const fs = require('node:fs');
// create the stream
const stream = fs.createWriteStream('output.xml');
// then for each chunk of XML
stream.write(yourXML);
// when done, end the stream to close the file
stream.end();
I drastically reduced the execution time (to 30 minutes) by doing 2 things.
Setting the ENV variable UV_THREADPOOL_SIZE=64
Buffering my writes to the xml file (I flush the buffer to the file after 20,000 closed tags)
I have a dataframe which I need to convert to a CSV file, and then I need to send this CSV to an API. As I'm sending it to an API, I do not want to save it to the local filesystem and need to keep it in memory. How can I do this?
Easy way: convert your dataframe to Pandas dataframe with toPandas(), then save to a string. To save to a string, not a file, you'll have to call to_csv with path_or_buf=None. Then send the string in an API call.
From to_csv() documentation:
Parameters
path_or_bufstr or file handle, default None
File path or object, if None is provided the result is returned as a string.
So your code would likely look like this:
csv_string = df.toPandas().to_csv(path_or_bufstr=None)
Alternatives: use tempfile.SpooledTemporaryFile with a large buffer to create an in-memory file. Or you can even use a regular file, just make your buffer large enough and don't flush or close the file. Take a look at Corey Goldberg's explanation of why this works.
For flask web app, I know I can't read a "file" multiple times from request.files because it's a stream. So when I read it once, I'll empty it. But I need to use the "file" multiple times without saving it locally and I'm having trouble doing it.
For example, from this
image = request.files["image"]
I'd like to have something like
image2 = image.copy
and perform different operations on image and image2.
Can someone please help me with this?
image = request.files["image"]
# Seek the pointer to the beginning of the file to read again
request.files["image"].seek(0)
After reading a file just run "f.stream.seek(0)" this points to the beginning of the file stream and then you are able to read the file from beginning again, you can simply put the following snippet in a loop and see it in action.
f.stream.seek(0)
stream = io.StringIO(f.stream.read().decode("UTF8"), newline=None)
reader = csv.reader(stream)
for row in reader:
print(row)
Hey I am trying to write in the same txt file a few times in my program.
I have some other processes going on that's why I start a multiprocess to calculate some data and write it into the file. I close it afterwards and later on I have to reopen the file and write it in again. I also want to override the old date so I am using the following code
The first time it's working very well but the second(third..) time nothing is written into that file, does anyone whats the reason for this?
file_out = open("Daten.txt", "w")
file_out.write("%.2f %.2f\n" %(distance, time))
file_out.close()
You're re-writing the file again and again so use a or a+ to append it
try this :
file_out = open("Daten.txt", "a+")
file_out.write("%.2f %.2f\n" %(distance, time))
file_out.close()
I have an application that streams data to a file, can I use Node.js to read the file while it's being streamed to?
I tried using createReadStrem, but it only read one chunk and the stream ended
You could try watching for file changes with fs.watchFile(filename[, options], listener) or node-watch. In file change you could just read last line with read-last-lines.
Although I'm not sure how efficient it would be.