I'm reading gzip file from bytes, which I have loaded from AWS S3, now I have tried below code to read:
gzip_bytes = s3.get_file() # for example I have loaded S3
gzip_file = BytesIO(gzip_bytes)
with GzipFile(gzip_file, mode="rb") as file:
# Todo somthing
I'm getting below error:
Traceback (most recent call last):
...
with GzipFile(BytesIO(pre_file_bytes), mode="rb") as pre_zip_file:
File "/usr/lib/python3.6/gzip.py", line 163, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO
How can I resolve that issue? Or maybe I'm missing something
Many Thanks
The gzipfile contructor takes:
class gzip.GzipFile(filename=None, mode=None, compresslevel=9, fileobj=None, mtime=None)
However, you are passing Bytes instead of a string as the filename.
This is explained by the error message:
expected str, bytes or os.PathLike object, not _io.BytesIO
It looks like you should download the file, then provide the filename to the downloaded file.
To decompress and read a downloaded gzip file, perhaps from a URL, pass the corresponding io.BytesIO object to the fileobj parameter instead of the default filename parameter. For example,
from io import BytesIO
from gzip import GzipFile
import urllib.request
url = urllib.request.urlopen("https://oeis.org/names.gz")
with GzipFile(fileobj=BytesIO(url.read()), mode='rb') as f:
# now you may treat f as an uncompressed file
# for example, print first line of file
for l in f:
print(l)
break
(This was already pointed out by the OP in a comment. I'm putting it in an answer so it is easier to find.)
Related
This question already has an answer here:
Remove file after Flask serves it
(1 answer)
Closed 1 year ago.
I want my server to send a file to the user, and then delete the file.
The problem is that in order to return the file to the user, i am using this:
return send_file(pathAndFilename, as_attachment=True, attachment_filename = requestedFile)
Since this returns, how can i delete the file from the os with os.remove(pathAndFilename)?
I also tried this:
send_file(pathAndFilename, as_attachment=True, attachment_filename = requestedFile)
os.remove(pathAndFilename)
return 0
But i got this error:
TypeError: The view function did not return a valid response. The return type must be a string, dict, tuple, Response instance, or WSGI callable, but it was a int.
Since send_file already returns the response from the endpoint, it is no longer possible to execute code afterwards.
However, it is possible to write the file to a stream before the file is deleted and then to send the stream in response.
from flask import send_file
import io, os, shutil
#app.route('/download/<path:filename>')
def download(filename):
path = os.path.join(
app.static_folder,
filename
)
cache = io.BytesIO()
with open(path, 'rb') as fp:
shutil.copyfileobj(fp, cache)
cache.flush()
cache.seek(0)
os.remove(path)
return send_file(cache, as_attachment=True, attachment_filename=filename)
In order to achieve better use of the memory for larger files, I think a temporary file is more suitable as a buffer.
from flask import send_file
import os, shutil, tempfile
#app.route('/download/<path:filename>')
def download(filename):
path = os.path.join(
app.static_folder,
filename
)
cache = tempfile.NamedTemporaryFile()
with open(path, 'rb') as fp:
shutil.copyfileobj(fp, cache)
cache.flush()
cache.seek(0)
os.remove(path)
return send_file(cache, as_attachment=True, attachment_filename=filename)
I hope your conditions are met.
Have fun implementing your project.
haii everyone...
i have one module in odoo8 version its contain Import the products and quantity in inventory adjustment in that read the file code is there
import cStringIO
data = base64.b64decode(self.data)
self.data contain the file path..
file_input = cStringIO.StringIO(data)
it is working fine in odoo8.
I want to implement this module in odoo11
in odoo python version is changes that' way it does not know about the cStringIO. it knows only StringIO
from io import StringIO
import io
data = base64.b64decode(self.data)
file_input = io.StringIO(data)
raise error
TypeError: initial_value must be str or None, not bytes
i changes the code above lines
but it does not take the data. because it is return bytes form, here it want string
then next i had use
file_input = io.BytesIO(data) -->this
raise the error
TypeError: initial_value must be str or None, not bytes
please any one help me to rectify this ...
Thanks in advance..
replace this:
data = base64.b64decode(self.data)
to:
data = base64.b64decode(self.data).decode('utf-8')
def a():
import json
path=open('C:\\Users\\Bishal\\code\\57.json').read()
config=json.load(path)
for key in config:
return key
You have already read the file path=open('C:\Users\Bishal\code\57.json').read(), so when you try to load with json.load(path), the file pointer is at the end of the file; hence nothing gets loaded or parsed.
Either load the file directly into json, or read the contents and then parse the string with json.loads (note the s)
Option 1:
path = open(r'C:\Users\Bishal\code\57.json').read()
config = json.loads(path)
Option 2:
path = open(r'C:\Users\Bishal\code\57.json')
config = json.load(path)
path.close()
Then you can do whatever you like with the result:
for key,item in config.items():
print('{} - {}'.format(key, item))
So, I create a StringIO object to treat my string as a file:
>>> a = 'Me, you and them\n'
>>> import io
>>> f = io.StringIO(a)
>>> f.read(1)
'M'
And then I proceed to close the 'file':
>>> f.close()
>>> f.closed
True
Now, when I try to open the 'file' again, Python does not permit me to do so:
>>> p = open(f)
Traceback (most recent call last):
File "<pyshell#166>", line 1, in <module>
p = open(f)
TypeError: invalid file: <_io.StringIO object at 0x0325D4E0>
Is there a way to 'reopen' a closed StringIO object? Or should it be declared again using the io.StringIO() method?
Thanks!
I have a nice hack, which I am currently using for testing (Since my code can make I/O operations, and giving it StringIO is a nice get-around).
If this problem is kind of one time thing:
st = StringIO()
close = st.close
st.close = lambda: None
f(st) # Some function which can make I/O changes and finally close st
st.getvalue() # This is available now
close() # If you don't want to store the close function you can also:
StringIO.close(st)
If this is recurring thing, you can also define a context-manager:
#contextlib.contextmanager
def uncloseable(fd):
"""
Context manager which turns the fd's close operation to no-op for the duration of the context.
"""
close = fd.close
fd.close = lambda: None
yield fd
fd.close = close
which can be used in the following way:
st = StringIO()
with uncloseable(st):
f(st)
# Now st is still open!!!
I hope this helps you with your problem, and if not, I hope you will find the solution you are looking for.
Note: This should work exactly the same for other file-like objects.
No, there is no way to re-open an io.StringIO object. Instead, just create a new object with io.StringIO().
Calling close() on an io.StringIO object throws away the "file contents" data, so re-opening couldn't give access to that anyways.
If you need the data, call getvalue() before closing.
See also the StringIO documentation here:
The text buffer is discarded when the close() method is called.
and here:
getvalue()
Return a str containing the entire contents of the buffer.
The builtin open() creates a file object (i.e. a stream), but in your example, f is already a stream.
That's the reason why you get TypeError: invalid file
After the method close() has executed, any stream operation will raise ValueError.
And the documentation does not mention about how to reopen a closed stream.
Maybe you need not close() the stream yet if you want to use (reopen) it again later.
When you f.close() you remove it from memory. You're basically doing a deref x, call x; you're looking for a memory location that doesn't exist.
Here is what you could do in stead:
import io
a = 'Me, you and them\n'
f = io.StringIO(a)
f.read(1)
f.close()
# Put the text form a without the first char into StringIO.
p = io.StringIO(a[1:]).
# do some work with p.
I think your confusion comes form thinking of io.StringIO as a file on the block device. If you used open() and not StringIO, then you would be correct in your example and you could reopen the file. StringIO is not a file. It's the idea of a file object in memory. A file object does have a StringIO, but It also exists physically on the block device. A StringIO is just a buffer, a staging area in memory of the data with in it. When you call open() a buffer is created, but there is still the data on block device.
Perhaps this is more what you want
fo = open('f.txt','w+')
fo.write('Me, you and them\n')
fo.read(1)
fo.close()
# reopen the now closed file `f`
p = open('f.txt','r')
# do stuff with p
p.close()
Here we are writing the string to the block device, so that when we close the file, the information written to it will remain after it's closed. Because this is creating a file in the directory the progarm is run in, it may be a good idea to give the file an extension. For example, you could name the file f.txt instead of f.
This code is simplification of code in a Django app that receives an uploaded zip file via HTTP multi-part POST and does read-only processing of the data inside:
#!/usr/bin/env python
import csv, sys, StringIO, traceback, zipfile
try:
import io
except ImportError:
sys.stderr.write('Could not import the `io` module.\n')
def get_zip_file(filename, method):
if method == 'direct':
return zipfile.ZipFile(filename)
elif method == 'StringIO':
data = file(filename).read()
return zipfile.ZipFile(StringIO.StringIO(data))
elif method == 'BytesIO':
data = file(filename).read()
return zipfile.ZipFile(io.BytesIO(data))
def process_zip_file(filename, method, open_defaults_file):
zip_file = get_zip_file(filename, method)
items_file = zip_file.open('items.csv')
csv_file = csv.DictReader(items_file)
try:
for idx, row in enumerate(csv_file):
image_filename = row['image1']
if open_defaults_file:
z = zip_file.open('defaults.csv')
z.close()
sys.stdout.write('Processed %d items.\n' % idx)
except zipfile.BadZipfile:
sys.stderr.write('Processing failed on item %d\n\n%s'
% (idx, traceback.format_exc()))
process_zip_file(sys.argv[1], sys.argv[2], int(sys.argv[3]))
Pretty simple. We open the zip file and one or two CSV files inside the zip file.
What's weird is that if I run this with a large zip file (~13 MB) and have it instantiate the ZipFile from a StringIO.StringIO or a io.BytesIO (Perhaps anything other than a plain filename? I had similar problems in the Django app when trying to create a ZipFile from a TemporaryUploadedFile or even a file object created by calling os.tmpfile() and shutil.copyfileobj()) and have it open TWO csv files rather than just one, then it fails towards the end of processing. Here's the output that I see on a Linux system:
$ ./test_zip_file.py ~/data.zip direct 1
Processed 250 items.
$ ./test_zip_file.py ~/data.zip StringIO 1
Processing failed on item 242
Traceback (most recent call last):
File "./test_zip_file.py", line 26, in process_zip_file
for idx, row in enumerate(csv_file):
File ".../python2.7/csv.py", line 104, in next
row = self.reader.next()
File ".../python2.7/zipfile.py", line 523, in readline
return io.BufferedIOBase.readline(self, limit)
File ".../python2.7/zipfile.py", line 561, in peek
chunk = self.read(n)
File ".../python2.7/zipfile.py", line 581, in read
data = self.read1(n - len(buf))
File ".../python2.7/zipfile.py", line 641, in read1
self._update_crc(data, eof=eof)
File ".../python2.7/zipfile.py", line 596, in _update_crc
raise BadZipfile("Bad CRC-32 for file %r" % self.name)
BadZipfile: Bad CRC-32 for file 'items.csv'
$ ./test_zip_file.py ~/data.zip BytesIO 1
Processing failed on item 242
Traceback (most recent call last):
File "./test_zip_file.py", line 26, in process_zip_file
for idx, row in enumerate(csv_file):
File ".../python2.7/csv.py", line 104, in next
row = self.reader.next()
File ".../python2.7/zipfile.py", line 523, in readline
return io.BufferedIOBase.readline(self, limit)
File ".../python2.7/zipfile.py", line 561, in peek
chunk = self.read(n)
File ".../python2.7/zipfile.py", line 581, in read
data = self.read1(n - len(buf))
File ".../python2.7/zipfile.py", line 641, in read1
self._update_crc(data, eof=eof)
File ".../python2.7/zipfile.py", line 596, in _update_crc
raise BadZipfile("Bad CRC-32 for file %r" % self.name)
BadZipfile: Bad CRC-32 for file 'items.csv'
$ ./test_zip_file.py ~/data.zip StringIO 0
Processed 250 items.
$ ./test_zip_file.py ~/data.zip BytesIO 0
Processed 250 items.
Incidentally, the code fails under the same conditions but in a different way on my OS X system. Instead of the BadZipfile exception, it seems to read corrupted data and gets very confused.
This all suggests to me that I am doing something in this code that you are not supposed to do -- e.g.: call zipfile.open on a file while already having another file within the same zip file object open? This doesn't seem to be a problem when using ZipFile(filename), but perhaps it's problematic when passing ZipFile a file-like object, because of some implementation details in the zipfile module?
Perhaps I missed something in the zipfile docs? Or maybe it's not documented yet? Or (least likely), a bug in the zipfile module?
I might have just found the problem and the solution, but unfortunately I had to replace Python's zipfile module with a hacked one of my own (called myzipfile here).
$ diff -u ~/run/lib/python2.7/zipfile.py myzipfile.py
--- /home/msabramo/run/lib/python2.7/zipfile.py 2010-12-22 17:02:34.000000000 -0800
+++ myzipfile.py 2011-04-11 11:51:59.000000000 -0700
## -5,6 +5,7 ##
import binascii, cStringIO, stat
import io
import re
+import copy
try:
import zlib # We may need its compression method
## -877,7 +878,7 ##
# Only open a new file for instances where we were not
# given a file object in the constructor
if self._filePassed:
- zef_file = self.fp
+ zef_file = copy.copy(self.fp)
else:
zef_file = open(self.filename, 'rb')
The problem in the standard zipfile module is that when passed a file object (not a filename), it uses that same passed-in file object for every call to the open method. This means that tell and seek are getting called on the same file and so trying to open multiple files within the zip file is causing the file position to be shared and so multiple open calls result in them stepping all over each other. In contrast, when passed a filename, open opens a new file object. My solution is for the case when a file object is passed in, instead of using that file object directly, I create a copy of it.
This change to zipfile fixes the problems I was seeing:
$ ./test_zip_file.py ~/data.zip StringIO 1
Processed 250 items.
$ ./test_zip_file.py ~/data.zip BytesIO 1
Processed 250 items.
$ ./test_zip_file.py ~/data.zip direct 1
Processed 250 items.
but I don't know if it has other negative impacts on zipfile...
EDIT: I just found a mention of this in the Python docs that I had somehow overlooked before. At http://docs.python.org/library/zipfile.html#zipfile.ZipFile.open, it says:
Note: If the ZipFile was created by passing in a file-like object as the first argument to the
constructor, then the object returned by open() shares the ZipFile’s file pointer. Under these
circumstances, the object returned by open() should not be used after any additional operations
are performed on the ZipFile object. If the ZipFile was created by passing in a string (the
filename) as the first argument to the constructor, then open() will create a new file
object that will be held by the ZipExtFile, allowing it to operate independently of the ZipFile.
what i did was update setup tools then re download and it works now
https://pypi.python.org/pypi/setuptools/35.0.1
In my case, this solved the problem:
pip uninstall pillow
could it be that you had it open in your desktop? It has happened sometimes to me and the solution was just to run the code without having the files open outside of the python session.