Python interpret bytearray as bytes - python-3.x

I have a question regarding the interpretation of a string as bytes
Within python, I have the situation that one variable contains e.g. this value
"bytearray(b'\x13\x02US')"
this is unfortunately due to the behavior of a module I am using. My question is, how could i get this string into bytes?
I have tried stripping away the "bytearray(b'" and the "')" at the end, and use .encode() as a function, but the result then is:
b'\\x13\\x02US'
Which clearly escapes the \ in order to prevent the interpretation as bytes.
How could i get this converted into
b'\x13\x02US'
instead though?
Thank you very much!

You could use .decode().replace('\\', '\'), this way it replaces the double slashes with single ones. Either attache it after your .encode() function or do it on your string seperately.

Related

JSONFormat.print() method encoding special characters and also adding extra slash

I need to convert a protobuf message to JSON string in java. For this I am using the below API as recommended by the docs (https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/util/JsonFormat.Printer.html)
String jsonString = JsonFormat.printer().includingDefaultValueFields().print(protobufMessage);
This is working fine for a simple string, however, when my string contains special characters like &, single quote etc. the gson.toJson() method inside JsonFormat is converting special characters to octal format. For example "A&BC" is converted to "A\u0026BC". Also, the resultant string has an extra backslash appended.
So finally "A&BC" is converted to the string "A\\u0026BC".
If it were "A\u0026BC" then I could have converted to a byte array and formed a string with it. But because of the additional backslash I am not able to do so.
Currently I am using protobuf version 3.7.1 and I tried to upgrade and check if any latest API is available, but it did not help. I searched online but did not find any references (a similar issue was reported for JSONFormat.printToString but this API is removed in a later version. https://github.com/carlomedas/protobuf-java-format/issues/16). Can someone please help here if you have come across this issue.
I think the problem might be that you're using that string to pass along, and it's getting parsed a 2nd time. If you use the printer, it will convert "A&BC" to "A\u0026BC". Then when Jackson parses that, it will append the 2nd backslash. To avoid this, you can use #JsonRawValue annotation to avoid being parsed with the 2nd backslash.

Concatenation with empty string raises ERR:INVALID DIM

In TI-BASIC, the + operation is overloaded for string concatenation (in this, if nothing else, TI-BASIC joins the rest of the world).
However, any attempt to concatenate involving an empty string raises a Dimension Mismatch error:
"Fizz"+"Buzz"
FizzBuzz
"Fizz"+""
Error
""+"Buzz"
Error
""+""
Error
Why does this occur, and is there an elegant workaround? I've been using a starting space and truncating the string when necessary (doesn't always work well) or using a loop to add characters one at a time (slow).
The best way depends on what you are doing.
If you have a string (in this case, Str1) that you need to concatenate with another (Str2), and you don't know if it is empty, then this is a good general-case solution:
Str2
If length(Str1
Str1+Str2
If you need to loop and add a stuff to the string each time, then this is your best solution:
Before the loop:
" →Str1
In the loop:
Str1+<stuff_that_isn't_an_empty_string>→Str1
After the loop:
sub(Str1,2,length(Str1)-1→Str1
There are other situations, too, and if you have a specific situation, then you should post a simplified version of the relevant code.
Hope this helps!
It is very unfortunate that TI-Basic doesn't support empty strings. If you are starting with an empty string and adding chars, you have to do something like this:
"?
For(I,1,3
Prompt Str1
Ans+Str1
End
sub(Ans,2,length(Ans)-1
Another useful trick is that if you have a string that you are eventually going to evaluate using expr(, you can do "("+Str1+")"→Str1 and then freely do search and replace on the string. This is a necessary workaround since you can't search and replace any text involving the first or last character in a string.

In Python 3, how can I convert ascii to string, *without encoding/decoding*

Python 3.6
I converted a string from utf8 to this:
b'\xe6\x88\x91\xe6\xb2\xa1\xe6\x9c\x89\xe7\x94\xb5#xn--ssdcsrs-2e1xt16k.com.au'
I now want that chunk of ascii back into string form, so there is no longer the little b for bytes at the beginning.
BUT I don't want it converted back to UTF8, I want that same sequence of characters that you ses above in my Python string.
How can I do so? All I can find are ways of converting bytes to string along with encoding or decoding.
The (wrong) answer is quite simple:
chr(asciiCode)
In your special case:
myString = ""
for char in b'\xe6\x88\x91\xe6\xb2\xa1\xe6\x9c\x89\xe7\x94\xb5#xn--ssdcsrs-2e1xt16k.com.au':
myString+=chr(char)
print(myString)
gives:
æ没æçµ#xn--ssdcsrs-2e1xt16k.com.au
Maybe you are also interested in the right answer? It will probably not please you, because it says you have ALWAYS to deal with encoding/decoding ... because myString is now both UTF-8 and ASCII at the same time (exactly as it already was before you have "converted" it to ASCII).
Notice that how myString shows up when you print it will depend on the implicit encoding/decoding used by print.
In other words ...
there is NO WAY to avoid encoding/decoding
but there is a way of doing it a not explicit way.
I suppose that reading my answer provided HERE: Converting UTF-8 (in literal) to Umlaute will help you much in understanding the whole encoding/decoding thing.
What you have there is not ASCII, as it contains for instance the byte \xe6, which is higher than 127. It's still UTF8.
The representation of the string (with the 'b' at the start, then a ', then a '\', ...), that is ASCII. You get it with repr(yourstring). But the contents of the string that you're printing is UTF8.
But I don't think you need to turn that back into an UTF8 string, but it may depend on the rest of your code.

How to convert between bytes and strings in Python 3?

This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes.
As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a .tostring() method that was clearly not producing what I understood as a string...)
My test program goes like this:
import mangler # spoof package
stringThing = """
<Doc>
<Greeting>Hello World</Greeting>
<Greeting>你好</Greeting>
</Doc>
"""
# print out the input
print('This is the string input:')
print(stringThing)
# now make the string into bytes
bytesThing = mangler.tostring(stringThing) # pseudo-code again
# now print it out
print('\nThis is the bytes output:')
print(bytesThing)
The output from this code gives this:
This is the string input:
<Doc>
<Greeting>Hello World</Greeting>
<Greeting>你好</Greeting>
</Doc>
This is the bytes output:
b'\n<Doc>\n <Greeting>Hello World</Greeting>\n <Greeting>\xe4\xbd\xa0\xe5\xa5\xbd</Greeting>\n</Doc>\n'
So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.
The 'mangler' in the above code sample was doing the equivalent of this:
bytesThing = stringThing.encode(encoding='UTF-8')
There are other ways to write this (notably using bytes(stringThing, encoding='UTF-8'), but the above syntax makes it obvious what is going on, and also what to do to recover the string:
newStringThing = bytesThing.decode(encoding='UTF-8')
When we do this, the original string is recovered.
Note, using str(bytesThing) just transcribes all the gobbledegook without converting it back into Unicode, unless you specifically request UTF-8, viz., str(bytesThing, encoding='UTF-8'). No error is reported if the encoding is not specified.
In python3, there is a bytes() method that is in the same format as encode().
str1 = b'hello world'
str2 = bytes("hello world", encoding="UTF-8")
print(str1 == str2) # Returns True
I didn't read anything about this in the docs, but perhaps I wasn't looking in the right place. This way you can explicitly turn strings into byte streams and have it more readable than using encode and decode, and without having to prefex b in front of quotes.
This is a Python 101 type question,
It's a simple question but one where the answer is not so simple.
In python3, a "bytes" object represents a sequence of bytes, a "string" object represents a sequence of unicode code points.
To convert between from "bytes" to "string" and from "string" back to "bytes" you use the bytes.decode and string.encode functions. These functions take two parameters, an encoding and an error handling policy.
Sadly there are an awful lot of cases where sequences of bytes are used to represent text, but it is not necessarily well-defined what encoding is being used. Take for example filenames on unix-like systems, as far as the kernel is concerned they are a sequence of bytes with a handful of special values, on most modern distros most filenames will be UTF-8 but there is no gaurantee that all filenames will be.
If you want to write robust software then you need to think carefully about those parameters. You need to think carefully about what encoding the bytes are supposed to be in and how you will handle the case where they turn out not to be a valid sequence of bytes for the encoding you thought they should be in. Python defaults to UTF-8 and erroring out on any byte sequence that is not valid UTF-8.
print(bytesThing)
Python uses "repr" as a fallback conversion to string. repr attempts to produce python code that will recreate the object. In the case of a bytes object this means among other things escaping bytes outside the printable ascii range.
TRY THIS:
StringVariable=ByteVariable.decode('UTF-8','ignore')
TO TEST TYPE:
print(type(StringVariable))
Here 'StringVariable' represented as a string. 'ByteVariable' represent as Byte. Its not relevent to question Variables..

python 3: how to make strip() work for bytes

I've contered a Python 2 code to Python 3.
In doing so, I've changed
print 'String: ' + somestring
into
print(b'String: '+somestring)
because I was getting the following error:
Can't convert 'bytes' object to str implicitly
But then now I can't implement string attributes such as strip(), because they are no longer treated as strings...
global name 'strip' is not defined
for
if strip(somestring)=="":
How should I solve this dilemma between switching string to bytes and being able to use string attributes? Is there a workaround?
Please help me out and thank you in advance..
There are two issues here, one of which is the actual issue, the other is confusing you, but not an actual issue. Firstly:
Your string is a bytes object, ie a string of 8-bit bytes. Python 3 handles this differently from text, which is Unicode. Where do you get the string from? Since you want to treat it as text, you should probably convert it to a str-object, which is used to handle text. This is typically done with the .decode() function, ie:
somestring.decode('UTF-8')
Although calling str() also works:
str(somestring, 'UTF8')
(Note that your decoding might be something else than UTF8)
However, this is not your actual question. Your actual question is how to strip a bytes string. And the asnwer is that you do that the same way as you string a text-string:
somestring.strip()
There is no strip() builtin in either Python 2 or Python 3. There is a strip-function in the string module in Python 2:
from string import strip
But it hasn't been good practice to use that since strings got a strip() method, which is like ten years or so now. So in Python 3 it is gone.
>>> b'foo '.strip()
b'foo'
Works just fine.
If what you're dealing with is text, though, you probably should just have an actual str object, not a bytes object.
I believe you can use the "str" function to cast it to a string
print str(somestring).strip()
or maybe
print str(somestring, "utf-8").strip()
However, if the object is already a string, you don't get a new one. So if you're not sure whether an object is a string and need it to be and call str(obj), you won't create another if it's already a string.
x='123'
id(x)
2075707536496
y=str(x)
id(y)
2075707536496

Resources