Difference between io.StringIO and a string variable in python - python-3.x

I am new to python.
Can anybody explain what's the difference between a string variable and io.StringIO . In both we can save character.
e.g
String variable
k= 'RAVI'
io.stringIO
string_out = io.StringIO()
string_out.write('A sample string which we have to send to server as string data.')
string_out.getvalue()
If we print k or string_out.getvalue() both will print the text
print(k)
print(string_out.getvalue())

They are similar because both str and StringIO represent strings, they just do it in different ways:
str: Immutable
StringIO: Mutable, file-like interface, which stores strs
A text-mode file handle (as produced by open("somefile.txt")) is also very similar to StringIO (both are "Text I/O"), with the latter allowing you to avoid using an actual file for file-like operations.

you can use io.StringIO() to simulate files, since python is dynamic with variable types usually if you have something that accepts a file object you can also use io.StringIO() with it, meaning you can have a "file" in memory that you can control the contents of without actually writing any temporary files to disk

Related

What does "deallocated bytearray object has exported buffers" mean exactly

I am trying to run an encryption algorithm using AES256 but i get this error instead:
"deallocated bytearray object has exported buffers"
I can't seem to find any proper explanation on what the error itself actually means and therefore am having trouble debugging this. Can anyone explain?
For context, this seems to happen particularly for large files over 1GB
for root, dirs, files in os.walk(dirPath):
for name in files:
filePath = os.path.join(root, name)
with open(filePath, 'rb') as _file:
textStr = _file.read()
encrypted = fernet.encrypt(textStr)
with open(filePath, 'wb') as _file:
_file.write(encrypted)
The above code is me trying to encrypt all files in a directory
It's referring to the buffer protocol, a way of making views of raw memory in Python. It's not frequently used at the Python layer, instead usually being seen in CPython C modules, both built-in and third party C extension modules. The easiest way to use it from Python itself is with the memoryview type.
My guess is that something in the code (yours or the module you're using) made a view of a bytearray object as a buffer, then decref-ed the bytearray to zero references before releasing the buffer.

How to write an Image to string in Julia?

I want to encode an image in my directory "x.png" to a String or Array{UInt8, 1}.
I am writing a code in Julia to serialize an image using protobufs. It requires the image to be in encoded
String format.
In Python, it is done as follows. I am looking for similar functionality in Julia.
from PIL import Image
img = Image.load('x.png')
import io
output = io.BytesIO()
img.save(output, 'PNG')
img_string_data = output.getvalue()
output.close()
The output may be a String object or an Array{UInt8, 1}
In Julia you can achieve by writing:
img_string_data = read("x.png")
img_string_data now is Vector{UInt8}. You could also write read("x.png", String) to get a String (which is not that useful though as it will probably mostly contain invalid characters).
There is one difference between Julia solution and your Python solution. Julia approach will store in img_string_data the contents identical to what "x.png" holds on binary level while your Python solution will store an identical image, but possibly different on binary level (i.e. PIL might change some bytes in your file).

Arabic text replaced with escape sequences when creating CSV files using python

I am trying to create a CSV file that contains Arabic tweets collected using tweepy for a project I am doing. All is fine gathering the data, however, when i am writing to the CSV file all Arabic results are escaped with \xXXXX sequences
as follows:
b'#\xd8\xa7\xd9\x84\xd9\x8a\xd9\x88\xd9\x85_\xd8\xa7\xd9\x84\xd8\xb9\xd8\xa7\xd9\x84\xd9\x85\xd9\x8a_\xd9\x84\xd9\x84\xd8\xa7\xd8\xb9\xd8\xa7\xd9\x82\xd9\x87_2017 \xd8\xa7\xd9\x84\xd8\xa5\xd8\xb9\xd8\xa7\xd9\x82\xd8\xa9 \xd8\xa7\xd9\x84\xd8\xad\xd9\x82\xd9\x8a\xd9\x82\xd9\x8a\xd8\xa9 \xd8\xa7\xd8\xb9\xd8\xa7\xd9\x82\xd8\xa9 \xd8\xa7\xd9\x84\xd9\x81\xd9\x83\xd8\xb1 \xd9\x88\xd9\x84\xd9\x8a\xd8\xb3\xd8\xaa \xd8\xa7\xd8\xb9\xd8\xa7\xd9\x82\xd8\xa9
I looked at many previously asked questions and all I could find was suggestions for python 2 or answers similar to the one I am writing. When I was creating JSON files instead I was using ensure_ascii=False but I couldn't find anything similar for CSV. Below is my code:
with codecs.open('tweets.csv', 'a', encoding='utf-8') as file:
fieldnames = ['tweet', 'country']
writer = csv.DictWriter(file, fieldnames=fieldnames)
data = {'tweet': status.text, 'country': status.place.full_name}
writer.writerow(data)
I tried adding .encoding='utf-8' to status.text and status.place as well but that also didn't work. Any suggestions?
You have to make sure the Arabic string you have is decoded into UTF-8 before you write it. Assuming status.text is of type bytes you should type text=status.text.decode('utf-8'). (Maybe you have to do this for status.place.full_name too.) But if it's of type str then it won't have an decode() method. To avoid escape sequences in your file, a str object should be written anyway.
If you try to specify the encoding of a bytes object (like the one you presumably have) as 'utf-8' that won't work because the text is already in UTF-8 bytes. So in order to get UTF-8 characters you must call decode() on the bytes object. That way it writes the UTF-8 characters and not the UTF-8 bytes.

python3 the way to write string into file in its entirety

I am a newbie in Python3.
I have a question in writing a string into a file.
The below string is what I tried to write into a file.
ÀH \x10\x08\x81\x00 (in hex, c04820108810)
When I checked the file using xxd command, I could check there is a difference between the string and the file.
00000000: c380 4820 1008 c281 00 ..H .....
This is code I wrote.
s = 'ÀH \x10\x08\x81\x00'
with open('test', 'w') as f:
f.write(s)
The question is how can I write this string into file in its entirety.
It seems that you want to write binary data. In that case, you should use the bytes type instead of str as this gives you full control over the binary content of the sequence.
When dealing with strings, you have to take into account that Python will internally handle everything as UTF-8, so by the time you enter something like À, the file encoding will decide on what is actually entered. You can always encode() a string to look at its bytes:
>>> 'ÀH \x10\x08\x81\x00'.encode()
b'\xc3\x80H \x10\x08\xc2\x81\x00'
You can convert this to hex using the binascii module for a more readable hex string of those bytes:
>>> binascii.hexlify('ÀH \x10\x08\x81\x00'.encode())
b'c38048201008c28100'
As you can see, this is the same that was written to your file. So Python already does the correct thing. It’s just that the input is not what you want it to be.
So instead, use a bytes string and write to the file in binary mode:
# use a bytes string
s = b'\xc0\x48\x20\x10\x88\x10'
# open the file in binary mode
with open('test', 'bw') as f:
f.write(s)
Btw. if you look at the encoded string from the beginning, you can already see that you have a different encoding in mind than Python when you entered that string. You expected À to be 0xc0 in binary which is somewhat correct since that its Latin-1 representation. But when you lookup its other representations, you can see that in UTF-8, which is what Python uses by default, it is 0xc380 instead—which is again the value we got when encoding it in Python.
You have to setup coding style to utf-8 and also use raw strings because you have \ escape characters. So add coding style and put r before your string to make it raw.
# -*- coding: utf-8 -*-
s = r'ÀH \x10\x08\x81\x00'
with open('test.txt', 'w') as f:
f.write(s)

Unpickling from converted string in python/numpy

I have a ton of numpy ndarrays that are stored picked to strings. That may have been a poor design choice but it's what I did, and now the picked strings seem to have been converted or something along the way, when I try to unpickle I notice they are of type str and I get the following error:
TypeError: 'str' does not support the buffer interface
when I invoke
numpy.loads(bin_str)
Where bin_str is the thing I'm trying to unpickle. If I print out bin_strit looks like
b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\ ...
continuing for some time, so the info seems to be there, I'm just not quite sure how to convert it into whatever string format numpy/pickle need. On a whim I tried
numpy.loads( bytearray(bin_str, encoding='utf-8') )
and
numpy.loads( bin_str.encode() )
which both throw an error _pickle.UnpicklingError: unpickling stack underflow. Any ideas?
PS: I'm on python 3.3.2 and numpy 1.7.1
Edit
I discovered that if I do the following:
open('temp.txt', 'wb').write(...)
return numpy.load( 'temp.txt' )
I get back my array, and ... denotes copying and pasting the output of print(bin_str) from another window. I've tried writing bin_str to a file directly to unpickle but that doesn't work, it complains that TypeError: 'str' does not support the buffer interface. A few sane ways of converting bin_str to something that can be written directly to a binary file result in pickle errors when trying to read it back.
Edit 2
So I guess what's happened is that my binary pickle string ended up encoded inside of a normal string, something like:
"b'pickle'"
which is unfortunate and I haven't figured out how to deal with that, except this ridiculous and convoluted way to get it back:
open('temp.py', 'w').write('foo = ' + bin_str)
from temp import foo
numpy.loads( foo )
This seems like a very shameful solution to the problem, so please give me a better one!
It sounds like your saved strings are the reprs of the original bytes instances returned by your pickling code. That's a bit unfortunate, but not too bad. repr is intended to return a "machine friendly" representation of an object, and it can often be reversed by using eval:
import numpy as np
import pickle
# this part has already happened
orig_obj = np.array([1,2,3])
orig_pickle = pickle.dumps(orig_obj)
saved_str = repr(orig_pickle) # this was a mistake, but it's already done
# this is what you need to do to get something equivalent to orig_obj back
reconstructed_pickle = eval(saved_str)
reconstructed_obj = pickle.loads(reconstructed_pickle)
# test
if np.all(reconstructed_obj == orig_obj):
print("It worked!")
Obligatory note that using eval can be dangerous: Be aware that eval can run any Python code it wants, so don't call it with untrusted data. However, pickle data has the same risks (a malicious Pickle string can run arbitrary code upon unpickling), so you're not losing much safety in this situation. I'm guessing that you trust your data in this case anyway.

Resources