How to ignore/delete undefined characters on string decoded - python-3.x

I'm reading a bus with bytes-like sequence of characters and i need to decode it in a string, but when I use decode method the output show undefined characters and i need to delete/ignore them.
Thanks all for help
I have already tried to use the method decode(encoding='utf-8', errors='ignore'), or with encoding='ascii', but I get same result.
x = ser.read_until(b'\x03', None)
string = x.decode(encoding='utf-8', errors='ignore')
This is the actual result: xx423711B552000083x (x = undefined character)
And I expected to have: 423711B552000083

Related

How to convert string with backslash to char

I have to convert a backslash string into char, but it seems that casting doesn't exist like in java:
String msg = (Char) new_msg
I have to convert string values like "\t", "\n", "\000"-"\255" to char.
I would first start questioning why you have a single character string in the first place, and whether you can be sure that that is what you actually have. But given that it is, you can get a char from a string, whether it's in the first position or any other, using either String.get, or the more convenient string indexing operator:
let s = "\t"
let c: char = String.get s 0
let c: char = s.[0]
But note that these will both raise an Invalid_argument exception if there is no character at that position, such as if the string is empty, for example.
As an addendum to glennsl's answer, both methods will raise an Invalid_argument exception if the string is empty. If you were writing a function to get the first char of a string, you might use the option type to account for this.
let first_char_of_string s =
try Some s.[0]
with Invalid_argument _ -> None

how to handle list that contains emoji in Python3

I've been making function that takes list that has only emoji and transfer it to utf-8 unicode and return the unocode list . My current code seems to take multiple args and return error . I'm new to handling emoji . Could you give me some tips ??
main.py
def encode_emoji(emoji_list):
result = []
for i in range(len(emoji_list)):
emoji = str(emoji_list[i])
d_ord = format(ord(":{}:","#08x").format(emoji))
result.append(str(d_ord))
break
return result
encode_emoji(["😀","😃","😄"])
Result of above code
Traceback (most recent call last):
File "main.py", line 11, in <module>
encode_emoji(["😀","😃","😄"])
File "main.py", line 5, in encode_emoji
d_ord = format(ord(":{}:","#08x").format(emoji))
TypeError: ord() takes exactly one argument (2 given)
I have no idea of how you intend to get the utf-8 encoding of an emoji with this line:
d_ord = format(ord(":{}:","#08x").format(emoji))
As the error message says, ord would take a single argument: a 1-character long string, and return an integer. Now, even if the code above would be placed so that the value returned by ord(emoji) was correctly concatenated to 0x8 as a prefix, that would basically be an specific representation of a basically random hexadecimal number - not the utf-8 sequence for the emoji.
To encode some text into utf-8, just call the encode method of the string itself.
Also, in Python, one almost never will use the for... in range(len(...)) pattern, as for is well designed to iterate over any sequence or iterable with no side effects.
Your code also have a loosely placed break statement that would stop any processing after the first character.
Without using the list-comprehension syntax, a function to encode emoji as utf-8 byte strings is just:
def encode_emoji(emoji_list):
result = []
for part in emoji_list:
result.append(part.encode("utf-8"))
Once you get more acquainted with the language and understand comprehensions, it is just:
def encode_emoji(emoji_list):
return [part.encode("utf-8") for part in emoji_list)]
Now, given the #8 pattern in your code, it may be that you have misunderstood what utf-8 means, and are simply trying to write down the emoji's as valid HTML encoded char references - that later will be embedded in text that will be encoded to utf-8.
In that case, you have indeed to call ord(emoji) to get its codepoint, but then represent the resulting number as hexadecimal, and replace the leading 0x Python's hex call yields with #:
def encode_emoji(emoji_list):
return [hex(ord(emoji)).replace("0x", "#") + ";" for emoji in emoji_list)]
TypeError: ord() takes exactly one argument (2 given)
I think the error is self-explanatory. That function takes one argument, but you are passing it two:
":{}"
"#08x"
Here some docs to read in case you need.

How to convert a variable to a raw string?

If I have a string, "foo; \n", I can turn this into a raw string with r"foo; \n". If I have a variable x = "foo; \n", how do I convert x into a raw string? I tried y = rf"{x}" but this did not work.
Motivation:
I have a python string variable, res. I compute res as
big_string = """foo; ${bar}"""
from string import Template
t = Template(big_string)
res = t.substitute(bar="baz")
As such, res is a variable. I'd like to convert this variable into a raw string. The reason is I am going to POST it as JSON, but I am getting json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 620 (char 619). Previously, when testing I fixed this by converting my string to a raw string with: x = r"""foo; baz""" (keeping in line with the example above). Now I am not dealing with a big raw string. I am dealing with a variable that is a JSON representation of a string where I have replaced a single variable, bar above, with a list for a query, and now I want to convert this string into a raw string (e.g. r"foo; baz", yes I realize this is not valid JSON).
Update: As per this question I need a raw string. The question and answer flagged in the comments as duplicate do not work (res.encode('unicode_escape')).

How do I format this from str to byte-like object?

Ahoy, I'm having trouble with decoding these filenames (They're encoded as base64). I know they need to be byte-like objects, but I can't for the life of me make it so. Please help, much love.
for filename in os.listdir('./Files'):
name, typeId = base64.b64decode(filename.replace('.png', '')).split('_!_')
Error:
name, typeId = base64.b64decode(filename.replace('.png', '')).split('_!_')
TypeError: a bytes-like object is required, not 'str'
TypeError: a bytes-like object is required, not 'str'
You're probably going to get this error from two places:
b64decode(filename.replace('.png', ''))
As you've mentioned, b64decode expects a bytes-like object.
But filename is a str and filename.replace will also return a str.
.split('_!_')
Since b64decode will return bytes, you also have to pass a bytes-like object to split.
Try this:
for fname in os.listdir('./Files'):
fname_bytes = os.fsencode(fname.replace('.png', ''))
dec = base64.b64decode(fname_bytes)
parts = dec.split(b"_!_")
To solve 1., you can use fsencode as noted in the os.listdir docs:
Note: To encode str filenames to bytes, use fsencode().
To solve 2., you can prefix a "b" to the string to make it a byte literal:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type.

Python how to trim a bytes string

I want to trim a bytes string before an index found by locating $$$,
trimmed_bytes_stream = padded_bytes_stream[:padded_stream.index('$$$')]
but got an error:
TypeError: a bytes-like object is required, not 'str'
Is there bytes object equivalent methods to do that? Or have convert bytes string to string and then using string methods? finally convert back to bytes after trimming?
Append a b to your search item
trimmed_bytes_stream = padded_bytes_stream[:padded_stream.index(b'$$$')]

Resources