Decompress gz compressed string using Python3.6 - python-3.x

I want to decompress the following gz compressed string using python3.6:
H4sIAAAAAAAA//NIzcnJVwjPL8pJAQBWsRdKCwAAAA==
The decompressed string is "Hello World"
I was able to decompress it using online tool - http://www.txtwizard.net/compression but I couldn't find a proper way to do it in python.
I tried zlib and gzip, but they require bytes not str. I also tried converting it using io.Bytes() but of no use. My Code is:
import gzip
import io
class SearchEvents:
def decompressPayload():
payload = "H4sIAAAAAAAA//NIzcnJVwjPL8pJAQBWsRdKCwAAAA=="
payload_bytes = io.BytesIO(payload)
print(gzip.decompress(payload_bytes))
SearchEvents.decompressPayload()
I am expecting "Hello World" as output. But I am getting the following error:
Traceback (most recent call last):
File "SearchEvents.py", line 13, in <module>
SearchEvents.decompressPayload()
File "SearchEvents.py", line 10, in decompressPayload
payload_bytes = io.BytesIO(payload)
TypeError: a bytes-like object is required, not 'str'
Is there any way to achieve what I want?

I want to decompress the following gz compressed string using python3.6:
...==
That's not a gzip-compressed string. At least, not until you Base64-decode it first.
>>> gzip.decompress(base64.b64decode('H4sIAAAAAAAA//NIzcnJVwjPL8pJAQBWsRdKCwAAAA=='))
b'Hello World'

For stuff that needs bytes, give it bytes. Add the b prefix to make a bytes literal, e.g.:
gzip.decompress(b"H4sIAAAAAAAA//NIzcnJVwjPL8pJAQBWsRdKCwAAAA==")
This doesn't work, because that's not valid compressed data. It looks like it's base64 encoded though, so by combining it with binascii, you get:
import binascii
import gzip
gzip.decompress(binascii.a2b_base64(b"H4sIAAAAAAAA//NIzcnJVwjPL8pJAQBWsRdKCwAAAA=="))
Which produces b'Hello World'

Related

Converting list of string to bytes in Python3

I have been trying to convert a list of string elements to bytes so that I can send it to the server.
Below is the snippet for my code:-
ls_queue=list(q.queue)
print("Queue converted to list elements:::",ls_queue)
encoded_list=[x.encode('utf-8') for x in ls_queue]
print("Encoded list:::",encoded_list)
s.send(encoded_list)
The output I get is:
Encoded list::: [b'madhav']
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\tkinter\__init__.py", line 1883, in
__call__
return self.func(*args)
File "Practice_Client_Server.py", line 149, in Word_Append_Window
s.send(encoded_list)
TypeError: a bytes-like object is required, not 'list'
I can see that it is getting converted to bytes but it still gives the error while trying to encode and send. Can someone take a look as to what I am doing wrong here?
Thank you
You are sending a list object when send is expecting a bytes one, that happened when u converted the elements of the list into bytes but not the list container. What u can do is serialize it as a JSON string and then convert it to bytes, for example:
import json
l = ['foo', 'bar']
l_str = json.dumps(l)
l_bytes = l_str.encode('utf-8')
send(l_bytes)
Then u can read it on your server doing the opposite:
reconstructed_l = json.loads(l_bytes.decode('utf-8'))

a bytes-like object is required, not 'str': typeerror in compressed file

I am finding substring in compressed file using following python script. I am getting "TypeError: a bytes-like object is required, not 'str'". Please any one help me in fixing this.
from re import *
import re
import gzip
import sys
import io
import os
seq={}
with open(sys.argv[1],'r') as fh:
for line1 in fh:
a=line1.split("\t")
seq[a[0]]=a[1]
abcd="AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG"
print(a[0],"\t",seq[a[0]])
count={}
with gzip.open(sys.argv[2]) as gz_file:
with io.BufferedReader(gz_file) as f:
for line in f:
for b in seq:
if abcd in line:
count[b] +=1
for c in count:
print(c,"\t",count[c])
fh.close()
gz_file.close()
f.close()
and input files are
TruSeq2_SE AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
the second file is compressed text file. The line "if abcd in line:" shows the error.
The "BufferedReader" class gives you bytestrings, not text strings - you can directly compare both objects in Python3 -
Since these strings just use a few ASCII characters and are not actually text, you can work all the way along with byte strings for your code.
So, whenever you "open" a file (not gzip.open), open it in binary mode (i.e.
open(sys.argv[1],'rb') instead of 'r' to open the file)
And also prefix your hardcoded string with a b so that Python uses a binary string inernally: abcd=b"AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" - this will avoid a similar error on your if abcd in line - though the error message should be different than the one you presented.
Alternativally, use everything as text - this can give you more methods to work with the strings (Python3's byte strigns are somewhat crippled) presentation of data when printing, and should not be much slower - in that case, instead of the changes suggested above, include an extra line to decode the line fetched from your data-file:
with io.BufferedReader(gz_file) as f:
for line in f:
line = line.decode("latin1")
for b in seq:
(Besides the error, your progam logic seens to be a bit faulty, as you don't actually use a variable string in your innermost comparison - just the fixed bcd value - but I suppose you can fix taht once you get rid of the errors)

Python 3.2 TypeError - can't figure out what it means

I originally put this code through Python 2.7 but needed to move to Python 3.x because of work. I've been trying to figure out how to get this code to work in Python 3.2, with no luck.
import subprocess
cmd = subprocess.Popen('net use', shell=True, stdout=subprocess.PIPE)
for line in cmd.stdout:
if 'no' in line:
print (line)
I get this error
if 'no' in (line):
TypeError: Type str doesn't support the buffer API
Can anyone provide me with an answer as to why this is and/or some documentation to read?
Much appreciated.
Python 3 uses the bytes type in a lot places where the encoding is not clearly defined. The stdout of your subprocess is a file object working with bytes data. So, you cannot check if there is some string within a bytes object, e.g.:
>>> 'no' in b'some bytes string'
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
'no' in b'some bytes string'
TypeError: Type str doesn't support the buffer API
What you need to do instead is a test if the bytes string contains another bytes string:
>>> b'no' in b'some bytes string'
False
So, back to your problem, this should work:
if b'no' in line:
print(line)

How does one add string to tarfile in Python3

I have problem adding an str to a tar arhive in python. In python 2 I used such method:
fname = "archive_name"
params_src = "some arbitrarty string to be added to the archive"
params_sio = io.StringIO(params_src)
archive = tarfile.open(fname+".tgz", "w:gz")
tarinfo = tarfile.TarInfo(name="params")
tarinfo.size = len(params_src)
archive.addfile(tarinfo, params_sio)
Its essentially the same what can be found in this here.
It worked well. However, going to python 3 it broke and results with the following error:
File "./translate_report.py", line 67, in <module>
main()
File "./translate_report.py", line 48, in main
archive.addfile(tarinfo, params_sio)
File "/usr/lib/python3.2/tarfile.py", line 2111, in addfile
copyfileobj(fileobj, self.fileobj, tarinfo.size)
File "/usr/lib/python3.2/tarfile.py", line 276, in copyfileobj
dst.write(buf)
File "/usr/lib/python3.2/gzip.py", line 317, in write
self.crc = zlib.crc32(data, self.crc) & 0xffffffff
TypeError: 'str' does not support the buffer interface
To be honest I have trouble understanding where it comes from since I do not feed any str to tarfile module back to the point where I do construct StringIO object.
I know the meanings of StringIO and str, bytes and such changed a bit from python 2 to 3 but I do not see a mistake and cannot come up with better logic to solve this task.
I create StringIO object precisely to provide buffer methods around the string I want to add to the archive. Yet it strikes me that some str does not provide it. On top of it the exception is raised around lines that seem to be responsible for checksum calculations.
Can some one please explain what I am miss-understanding or at least give an example how to add a simple str to the tar archive with out creating an intermediate file on the file-system.
When writing to a file, you need to encode your unicode data to bytes explicitly; StringIO objects do not do this for you, it's a text memory file. Use io.BytesIO() instead and encode:
params_sio = io.BytesIO(params_src.encode('utf8'))
Adjust your encoding to your data, of course.

Error when using zlib.compress function in Python 3.2

I'm importing zlib in my Python program. It works fine in Python 2.6 but shows an error when I try to run it in Python 3.2.
This is my code:
import zlib
s = 'sam'
print ("Your string length is",len(s))
t = zlib.compress(s)
print ("Your compressed string is",t)
print ("Your compressed string length is",len(t))
print ("Your decompressed string is",zlib.decompress(t))
print ("Crc32 is",zlib.crc32(t))
The error I get is this:
Your string length is 3
Traceback (most recent call last):
File "F:\workspace\samples\python\zip.py", line 4, in <module>
t = zlib.compress(s)
TypeError: 'str' does not support the buffer interface
But the above program works fine in Python 2.6. Should I use an alternative to zlib? Please help me.
Edit: I got it to work. It seems I needed to encode it. Here is the revised code:
import zlib
s = 'sam'
print ("Your string length is",len(s))
s=s.encode('utf-8')
t = zlib.compress(s)
print ("Your compressed string is",t)
print ("Your compressed string length is",len(t))
print ("Your decompressed string is",zlib.decompress(t))
print ("Crc32 is",zlib.crc32(t))
Th str type in Python is no longer a sequence of 8-bit characters, but a sequence of Uncode characters. You need to use the bytes type for binary data. You convert between strings and bytes by encoding/decoding.

Resources