How to stream (upload) large amount of data with Requests in Python? - python-3.x

The module requests provides a high level HTTP API. Using requests I'd like to send data via HTTP using a POST request. The documentation is very short about this, stating that a "file like object" should be provided without stating clearly what exactly requests would expect from that object. I've some binary data, but unfortunately this is generated data and I have not a file like object. How could I possibly implement a "file like object" myself that would conform to the expectations of requests? The documentation is quite poor in that regard and I wasn't able to clarify this by looking into the source code of requests myself. Has anyone done this before using the requests API?

File-like object is a standard Python term for an object that behaves like a file. This means that if you have a file, you have a file-like object and simply need to pass the file path to Requests. If you have a more complex situation you will need to give us a full description of the form of your data so we can help you more explicitly.
EDIT: To address your comment, here is the code to send a binary file to a host using Requests.
url = 'http://SomeSite/post'
files = {'files': ('mydata', open('mydata', mode='rb'), 'application/octet-stream')}
r = requests.post(url, files=files)
Opening the file with the Python open command creates the file-like object.
EDIT2: Whenever to open a file on disk you create a file-like object in the process of opening the objects. However, Python supports other object types that act like files. Some examples include the standard stdin, stdout and stderr. In addition, pipes can be access using os.pipe and via subprocess.Pipe. These objects behave like files, i.e. they can be accessed with a subset of the file API and their API's behave in the same way as the object that accesses a real file.
This is why they are called file-like because they use the same API's and act in the same way. You open, close, can read or write a pipe in the same way as you do a file.

Related

How can I use read/write streams with a JSON file?

I have a rather large JSON file that stores user information, and when my server starts it loads the entire file into memory. Obviously, this is not ideal. I have looked into using read/write streams, but I can't seem to understand quite how they'd work.
The data in the JSON file is formatted as such:
"accountName": {
"favoriteColor": "blue"
}
The process currently goes in this order:
Server starts, and data.json is loaded into a variable (dataVar),
User johndoe logs in, and their data is used to make an object.
The user changes their object's data, and dataVar is updated.
Server autosaves to the data.json file with the new contents.
I want to continue being able to access user data as needed, without loading everything into memory. I assume there are stream equivalents of things like dataVar.johndoe, but I can't seem to find that information.

Is there a way to use Splinter to screenshot a browser view directly into memory?

The intended way to take a screenshot via Splinter is pretty straightforward, and I understand that in the context of mimicking a web-browser a screenshot basically means saving an image to a file, but I was wondering if I could throw away that IO concern by directly reading the screenshot into a Python PIL object when I invoke browser.screenshot() . The reason for this is that I would perform some processing on the image regardless so saving it to disk and reading it from disk seems like a step I could short-circuit.
browser = Browser()
screenshot_path = browser.screenshot('absolute_path/your_screenshot.png')
Something like
screenshot_pil = browser.screenshot('path_to', inmemory=True)
Not sure if I missed this in the documentation, but there is a function screenshot_as_png() that seems to do what I want but I'm not sure how to access it through the namespace of a Browser object

Flask: Get gzip filename sent from Postman

I am sending a gzip file from Postman to a Flask endpoint. I can take that binary file with request.data and read it, save it, upload it, etc.
My problem is that I can't take its name. How can I do that?
My gzip file is called "test_file.json.gz" and my file is called "test_file.json".
How can I take any of those names?
Edit:
I'm taking the stream data with io.BytesIO(), but this library doesn't contain a name attribute or something, although I can see the file name into the string if I just:
>>>print(request.data)
>>>b'\x1f\x8b\x08\x08\xca\xb1\xd3]\x00\x03test_file.json\x00\xab\xe6RPP\xcaN\xad4T\xb2RP*K\xcc)M5T\xe2\xaa\x05\x00\xc2\x8b\xb6;\x16\x00\x00\x00'
Further to the comment, I think the code which handles your upload is relevant here.
See this answer regarding request.data:
request.data Contains the incoming request data as string in case it came with a mimetype Flask does not handle.
The recommended way to handle file uploads in flask is to use:
file = request.files['file']
file is then of type: werkzeug.datastructures.FileStorage.
file.stream is the stream, which can be read with file.stream.read() or simply file.read()
file.filename is the filename as specified on the client.
file.save(path) a method which saves the file to disk. path should be a string like '/some/location/file.ext'
source

Python: Universal XML parser

I'm trying to make simple Python 3 program to read weather information from XML web source, convert it into Python-readable object (maybe dictionary) and process it (for example visualize multiple observations into graph).
Source of data is national weather service's (direct translation) xml file at link provided in code.
What's different from typical XML parsing related question in Stack Overflow is that there are repetitive tags without in-tag identificator (<station> tags in my example) and some with (1st line, <observations timestamp="14568.....">). Also I would like to try parse it straight from website, not local file. Of course, I could create local temporary file too.
What I have so far, is simply loading script, that gives string containing xml code for both forecast and latest weather observations.
from urllib.request import urlopen
#Read 4-day forecast
forecast= urlopen("http://www.ilmateenistus.ee/ilma_andmed/xml/forecast.php").read().decode("iso-8859-1")
#Get current weather
observ=urlopen("http://www.ilmateenistus.ee/ilma_andmed/xml/observations.php").read().decode("iso-8859-1")
Shortly, I'm looking for as universal as possible way to parse XML to Python-readable object (such as dictionary/JSON or list) while preserving all of the information in XML-file.
P.S I prefer standard Python 3 module such as xml, which I didn't understand.
Try xmltodict package for simple conversion of XML structure to Python dict: https://github.com/martinblech/xmltodict

Python3 urlopen read weirdness (gzip)

I'm getting an URL from Schema.org. It's content-type="text/html"
Sometimes, read() functions as expected b'< !DOCTYPE html> ....'
Sometimes, read() returns something else b'\x1f\x8b\x08\x00\x00\x00\x00 ...'
try:
with urlopen("http://schema.org/docs/releases.html") as f:
txt = f.read()
except URLError:
return
I've tried solving this with txt = f.read().decode("utf-8").encode() but this results in an error... sometimes: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
The obvious work-around is to test if the first byte is hex and treat this accordingly.
My question is: Is this a bug or something else?
Edit
Related question. Apparently, sometimes I'm getting a gzipped stream.
Lastly
I solved this by adding the following code as proposed here
if 31 == txt[0]:
txt = decompress(txt, 16+MAX_WBITS)
The question remains; why does this return text/html sometimes and zipped some other times?
There are other questions in this category, but I cannot find an answer that addresses the actual cause of the problem.
Python's urllib2.urlopen() cannot transparently handle compression. It also by default does not set the Accept-Encoding request header. Additionally, the interpretation of this situation according to the HTTP standard has changed in the past.
As per RFC2616:
If no Accept-Encoding field is present in a request, the server MAY
assume that the client will accept any content coding. In this case,
if "identity" is one of the available content-codings, then the
server SHOULD use the "identity" content-coding, unless it has
additional information that a different content-coding is meaningful
to the client.
Unfortunately (as for the use case), RFC7231 changes this to
If no Accept-Encoding field is in the request, any content-coding is considered acceptable by the user agent.
Meaning, when performing a request using urlopen() you can get a response in whatever encoding the server decides to use and the response will be conformant.
schema.org seems to be hosted by google, i.e. it is most likely behind a distributed frontend load balancer network. So the different answers you get might be returned from load balancers with slightly different configurations.
Google Engineers have in the past advocated for the use HTTP compression, so this might as well be a conscious decision.
So as a lesson: when using urlopen() we need to set Accept-Encoding.
You are indeed receiving a gzipped response. You should be able to avoid it by:
from urllib import request
try:
req = request.Request("http://schema.org/docs/releases.html")
req.add_header('Accept-Encoding', 'identity;q=1')
with request.urlopen(req) as f:
txt = f.read()
except request.URLError:
return

Resources