how do you download an image in the form of bytes? - python-3.x

I mean I don't want the file to be downloaded onto the hdd, just the string has to be returned in form of bytes so that it can later be passed to some other function.

Here is one way:
url = 'https://m.media-amazon.com/images/M/MV5BMTY5MTY3NjgxNF5BMl5BanBnXkFtZTcwMDExMTQyMw##._V1_SX1777_CR0,0,1777,987_AL_.jpg'
import requests
# Return data as a string
output = requests.get(url).text
# Return data as bytes
output = requests.get(url).content
You could also use urlib or urlib2.

Related

How to write scraped data into a csv file via python?

I downloaded historical stock data via the following code.
url = "https://query1.finance.yahoo.com/v7/finance/download/RELIANCE.BO?period1=1577110559&period2=1608732959&interval=1d&events=history&includeAdjustedClose=true"
r = requests.get(url)
Then I tried to write it in a csv file via this code.
open('ril.csv').write(r.content)
But it gave an error prompt as
TypeError: write() argument must be str, not bytes
Modified code:
url = "https://query1.finance.yahoo.com/v7/finance/download/RELIANCE.BO?period1=1577110559&period2=1608732959&interval=1d&events=history&includeAdjustedClose=true"
r = requests.get(url)
open('ril.csv','wb').write(r.content)
data was downloaded in binary form, so we need to read and write it in binary form too.

How to send a pickled object across a server with encoding? Python 3

I want to send a pickled, encoded version of an object Account across to my server, and then decoding it at the server end and reinstating it as the object with corresponding data, however I am unsure how to convert it back from a string to the bytes (?) data type as an object.
On the clients end, this is essentially what happens:
command = 'insert account'
tempAccount = Account('Isabel', 'password')
pickledAcc = pickle.dumps(tempAccount)
clientCommand = f"{command},{pickledAcc}".encode('utf-8')
client.send(clientCommand)
However on the servers side, it receives an empty string as the pickledAcc part.
I have simplified my code a bit but the I think the essentials are there, but if you need more I can give it lol. Also should mention that I have used the proper length etiquette, i.e. sending a message before this to notify the server how long this message will be. And all of my server infrastructure works properly :)
Basically, I just need to know if it is possible to encode the pickled Account object to send it, or if doing so will not ever work and is stupid to do so.
The problem with the format line is that you insert the __repr__ of the pickledAcc instead of the real bytes. This will not get you the wanted result:
for example:
command = "test"
pickledAcc = pickle.dumps("test_data")
clientCommand = f"{command},{pickledAcc}".encode('utf-8')
Now client command will output:
b"test,b'\\x80\\x03X\\t\\x00\\x00\\x00test_dataq\\x00.'"
as you can see, the representation of the bytes array was encoded to utf-8 ("b\...")
To solve this problem I suggest you will convert the command to bytes array and then send clientCommand as a bytes array instead
hope that helped
Client side:
import base64
##--snip--##
pickledAcc = base64.b64encode(pickledAcc).decode()
clientCommand = f"{command},{pickledAcc}".encode('utf-8')
client.send(clientCommand)
Server Side:
import base64
##--snip--##
pickledAcc = base64.b64decode(pickledAcc)
pickledAcc = pickle.loads(pickledAcc)

Size of image in bytes, without saving to disk Python

Im trying to get the size of an image that is sent via http request encoded as base64. On the file system the image is a .png image and is around 1,034,023 bytes, however when I receive the image as base64 and get the size in bytes its smaller i.e 840734.
Is this correct and is this due to the compression of .png being different to the image loaded in memory? And if i want he size of the image that is displayed in the file system will I have to re save this image to disk when i receive it?
to get the size of the image in bytes I have the following functions (both return the same value). Im using Python3.
def image_size(imageb64):
character_count = len(imageb64)
padding_count = imageb64[character_count:None].count('=')
count = (3 * (character_count / 4)) - padding_count
print(f'Image size count: {count}')
def image_to_size_in_bytes(numpy_img):
img = Image.fromarray(numpy_img)
buffered = BytesIO()
img.save(buffered, format='PNG')
contents = buffered.getvalue()
print(f'IMAGE SIZE: {len(contents)}')

How to convert Hex to original file format?

I have a .tgz file that was formatted as shell code, it looks like this (Hex):
"\x1F\x8B\x08\x00\x44\x7A\x91\x4F\x00\x03\xED\x59\xED\x72.."
It was generated this way (python3):
import os
def main():
dump_src = "MyPlugin.tgz"
fc = ""
try:
with open(dump_src, 'rb') as fd:
fcr = fd.read()
for byte in bytearray(fcr):
fc += "\\x{:02x}".format(byte)
except:
fcr = dump_src
for byte in bytearray(fcr):
fc += "\\x{:02x}".format(byte)
print(fc)
# failed attempt:
fcback = bytes(int(fc[i+2:i+4], 16) for i in range(0, len(fc), 4))
print (fcback)
if __name__ == "__main__":
main()
How can I convert this back to the original tgz archive?
Edit: failed attempt in the last section outputs this:
b'\x8b\x00\x10]\x03\x93o0\x85%\xe2!\xa4H\xf1Fi\xa7\x15\xf61&\x13N\xd9[\xfag\x11V\x97\xd3\xfb%\xf7\xe3\\\xae\xc2\xff\xa4>\xaf\x11\xcc\x93\xf1\x0c\x93\xa4\x1b\xefxj\xc3?\xf9\xc1\xe8\xd1\xd9\x01\x97qB"\x1a\x08\x9cO\x7f\xe9\x19\xe3\x9c\x05\xf2\x04a\xaa\x00A,\x15"RN-\xb6\x18K\x85\xa1\x11\x83\xac/\xffR\x8a\xa19\xde\x10\x0b\x08\x85\x93\xfc]\x8a^\xd2-T\x92\x9a\xcc-W\xc7|\xba\x9c\xb3\xa6V0V H1\x98\xde\x03#\x14\'\n 1Y\xf7R\x14\xe2#\xbe*:\xe0\xc8\xbb\xc9\x0bo\x8bm\xed.\xfd\xae\xef\x9fT&\xa1\xf4\xcf\xa7F\xf4\xef\xbb"8"\xb5\xab,\x9c\xbb\xfc3\x8b\xf5\x88\xf4A\x0ek%5eO\xf4:f\x0b\xd6\x1bi\xb6\xf3\xbf\xf7\xf9\xad\xb5[\xdba7\xb8\xf9\xcd\xba\xdd,;c\x0b\xaaT"\xd4\x96\x17\xda\x07\x87& \xceH\xd6\xbf\xd2\xeb\xb4\xaf\xbd\xc2\xee\xfc\'3zU\x17>\xde\x06u\xe3G\x7f\x1e\xf3\xdf\xb6\x04\x10A\x04\x10A\x04\x10A\x04\x10A\xff\x9f\xab\xe8(\x00'
And when I output it to a file (e.g. via python3 main.py > MyFile.tgz) the file is corrupted.
Since you know the format of the data (each byte is encoded as a string of 4 characters in the format "\xAB") it's easy to revert the conversion and get the original bytes again. It'll only take one line of Python code:
data = bytes(int(fc[i+2:i+4], 16) for i in range(0, len(fc), 4))
This uses:
range(start, stop, step) with step 4 to iterate in groups of 4 characters through your string
slicing to get each group of 2 hexadecimal digits
int(x, base) to convert the hexadecimal string to an integer
a generator expression to immediately pass the converted elements to:
bytes() to create a bytes object with the data
The variable data is now of type bytes and you could directly write it to a file (to decompress with an external zip program), or pass it to zlib.decompress() (to further process it in Python).
UPDATE (follow-up on the comments and updated question):
Firstly, I have tested the above code and it does result in the same bytes as the input. Are you really sure that the example output in your question is the actual result of the code in your question? Please try to be careful when copying code and/or output. A few remarks:
Your code is not properly formatted, so I cannot run it without making modifications. And when I have made modifications to the code, I might run different code than you do, yielding different results. So next time please copy-paste your exact (working, tested) code without modifications.
The format string in your code uses lowercase hexadecimal format, and your first example output uses uppercase. So that output cannot be from this code.
I don't have access to your file "MyPlugin.tgz", but when I test your code with another .tgz file (after fixing the IndentationErrors), my output is correct. It starts with \x1f\x8b as expected (this is the magic number in the gzip header). I can't explain why your output is different...
Secondly, it seems like you don't fully understand how bytes and string representations work. When you write print(fcback), a string representation of the Python object fcback (in this case a bytes object) is printed. The string representation of a bytes object is not the same as the binary data! When printing a bytes object, each byte that corresponds to a printable ASCII character is replaced by that character, other bytes are escaped (similar to the formatted string that your code generates). Also, it starts with b' and ends with '.
You cannot print binary data to your terminal and then pipe the output to a file. This will result in a different file. The correct way to write the data to a file is using file.write(data) in your Python code.
Here's a fully working example:
def binary_to_text(data):
"""Convert a bytes object to a formatted text string."""
text = ""
for byte in data:
text += "\\x{:02x}".format(byte)
return text
def text_to_binary(text):
"""Convert a formatted text string to a bytes object."""
return bytes(int(text[i+2:i+4], 16) for i in range(0, len(text), 4))
def main():
# Read the binary data from input file:
with open('MyPlugin.tgz', 'rb') as input_file:
input_data = input_file.read()
# Convert binary to text (based on your original code):
text = binary_to_text(input_data)
print(text[0:100])
# Convert the text back to binary:
output_data = text_to_binary(text)
print(output_data[0:100])
# Write the binary data back to a file:
with open('MyPlugin-restored.tgz', 'wb') as output_file:
output_file.write(output_data)
if __name__ == '__main__':
main()
Note that I only print the first 100 elements to keep the output short. Also notice that the second print-statement prints a much longer text. This is because the first print gets 100 characters (which are printed "as is"), while the second print gets 100 bytes (of which most bytes are escaped, causing the output to be longer).

python-snappy streaming data in a loop to a client

I would like to send multiple compressed arrays from a server to a client using python snappy, but I cannot get it to work after the first array. Here is a snippet for what is happening:
(sock is just the network socket that these are communicating through)
Server:
for i in range(n): #number of arrays to send
val = items[i][1] #this is the array
y = (json.dumps(val)).encode('utf-8')
b = io.BytesIO(y)
#snappy.stream_compress requires a file-like object as input, as far as I know.
with b as in_file:
with sock as out_file:
snappy.stream_compress(in_file, out_file)
Client:
for i in range(n): #same n as before
data = ''
b = io.BytesIO()
#snappy.stream_decompress requires a file-like object to write o, as far as I know
snappy.stream_decompress(sock, b)
data = b.getvalue().decode('utf-8')
val = json.loads(data)
val = json.loads(data) works only on the first iteration, but afterwards it stop working. When I do a print(data), only the first iteration will print anything. I've verified that the server does flush and send all the data, so I believe it is a problem with how I decide to receive the data.
I could not find a different way to do this. I searched and the only thing I could find is this post which has led me to what I currently have.
Any suggestions or comments?
with doesn't do what you think, refer to it's documentation. It calls sock.__exit__() after the block executed, that's not what you intended.
# what you wrote
with b as in_file:
with sock as out_file:
snappy.stream_compress(in_file, out_file)
# what you meant
snappy.stream_compress(b, sock)
By the way:
The line data = '' is obsolete because it's reassigned anyways.
Adding to #paul-scharnofske's answer:
Likewise, on the receiving side: stream_decompress doesn't quit until end-of-file, which means it will read until the socket is closed. So if you send separate multiple compressed chunks, it will read all of them before finishing, which seems not what you intend. Bottom line, you need to add "framing" around each chunk so that you know on the receiving end when one ends and the next one starts. One way to do that... For each array to be sent:
Create a io.BytesIO object with the json-encoded input as you're doing now
Create a second io.BytesIO object for the compressed output
Call stream_compress with the two BytesIO objects (you can write into a BytesIO in addition to reading from it)
Obtain the len of the output object
Send the length encoded as a 32-bit integer, say, with struct.pack("!I", length)
Send the output object
On the receiving side, reverse the process. For each array:
Read 4 bytes (the length)
Create a BytesIO object. Receive exactly length bytes, writing those bytes to the object
Create a second BytesIO object
Pass the received object as input and the second object as output to stream_decompress
json-decode the resulting output object

Resources