Getting error while encoding str in UTF8 format - string

This is working code.
line = 'line'
another_line = 'new ' + line
another_line.encode('utf-8')
output
b'new line'
Now I'm trying to figure out why I'm getting the error for below code in python3 vs I'm getting concatenated string in python2 ?.
line = 'line'
'new '+line.encode('utf-8')
TypeError: Can't convert 'bytes' object to str implicitly

As the error states, Python3 will not automatically convert a byte type to a string (the + operator sees a string first so wants a string on the right as well) implicitly (automatically) so you need to tell it explicitly to do so.
line = 'line'
print('new '+str(line.encode('utf-8')))
note that this gives slightly different output.
If you want the exact same output then this works:
line = 'line'
print('new '.encode('utf-8')+line.encode('utf-8'))
From the docs
"The + (addition) operator yields the sum of its arguments. The arguments must either both be numbers or both be sequences of the same type. In the former case, the numbers are converted to a common type and then added together. In the latter case, the sequences are concatenated." and "Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side."

Related

basic string concatenation syntax confirmation

I'm new to basic.
Does the following line mean the string ":PULS:WIDT1 " concatenates with a string variable named Te?
":PULS:WIDT1 "&VAL$(Te);
Does the following line mean the string ":PULS:WIDT1 " concatenates with a string variable named Te?
The line is incomplete. Even if this were an accepted syntax and concatenation somehow took place, where would the result go?
The ampersand character (&) is indeed a string concatenation operator in BASIC, but so is the plus character (+). Not every BASIC allows both, but most will allow at least one.
Inserted in a syntactically correct statement, your expression
":PULS:WIDT1 "&VAL$(Te)
would concatenate the ":PULS:WIDT1 " string litteral with the contents of an element of the string array VAL$() indexed by the numerical Te variable.
Although VAL happens to be the name of a built-in function, many BASICs don't mind that you name one or more user variables the same as a keyword.

pass regex group to function for substituting [duplicate]

I have a string S = '02143' and a list A = ['a','b','c','d','e']. I want to replace all those digits in 'S' with their corresponding element in list A.
For example, replace 0 with A[0], 2 with A[2] and so on. Final output should be S = 'acbed'.
I tried:
S = re.sub(r'([0-9])', A[int(r'\g<1>')], S)
However this gives an error ValueError: invalid literal for int() with base 10: '\\g<1>'. I guess it is considering backreference '\g<1>' as a string. How can I solve this especially using re.sub and capture-groups, else alternatively?
The reason the re.sub(r'([0-9])',A[int(r'\g<1>')],S) does not work is that \g<1> (which is an unambiguous representation of the first backreference otherwise written as \1) backreference only works when used in the string replacement pattern. If you pass it to another method, it will "see" just \g<1> literal string, since the re module won't have any chance of evaluating it at that time. re engine only evaluates it during a match, but the A[int(r'\g<1>')] part is evaluated before the re engine attempts to find a match.
That is why it is made possible to use callback methods inside re.sub as the replacement argument: you may pass the matched group values to any external methods for advanced manipulation.
See the re documentation:
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
Use
import re
S = '02143'
A = ['a','b','c','d','e']
print(re.sub(r'[0-9]',lambda x: A[int(x.group())],S))
See the Python demo
Note you do not need to capture the whole pattern with parentheses, you can access the whole match with x.group().

how to handle list that contains emoji in Python3

I've been making function that takes list that has only emoji and transfer it to utf-8 unicode and return the unocode list . My current code seems to take multiple args and return error . I'm new to handling emoji . Could you give me some tips ??
main.py
def encode_emoji(emoji_list):
result = []
for i in range(len(emoji_list)):
emoji = str(emoji_list[i])
d_ord = format(ord(":{}:","#08x").format(emoji))
result.append(str(d_ord))
break
return result
encode_emoji(["😀","😃","😄"])
Result of above code
Traceback (most recent call last):
File "main.py", line 11, in <module>
encode_emoji(["😀","😃","😄"])
File "main.py", line 5, in encode_emoji
d_ord = format(ord(":{}:","#08x").format(emoji))
TypeError: ord() takes exactly one argument (2 given)
I have no idea of how you intend to get the utf-8 encoding of an emoji with this line:
d_ord = format(ord(":{}:","#08x").format(emoji))
As the error message says, ord would take a single argument: a 1-character long string, and return an integer. Now, even if the code above would be placed so that the value returned by ord(emoji) was correctly concatenated to 0x8 as a prefix, that would basically be an specific representation of a basically random hexadecimal number - not the utf-8 sequence for the emoji.
To encode some text into utf-8, just call the encode method of the string itself.
Also, in Python, one almost never will use the for... in range(len(...)) pattern, as for is well designed to iterate over any sequence or iterable with no side effects.
Your code also have a loosely placed break statement that would stop any processing after the first character.
Without using the list-comprehension syntax, a function to encode emoji as utf-8 byte strings is just:
def encode_emoji(emoji_list):
result = []
for part in emoji_list:
result.append(part.encode("utf-8"))
Once you get more acquainted with the language and understand comprehensions, it is just:
def encode_emoji(emoji_list):
return [part.encode("utf-8") for part in emoji_list)]
Now, given the #8 pattern in your code, it may be that you have misunderstood what utf-8 means, and are simply trying to write down the emoji's as valid HTML encoded char references - that later will be embedded in text that will be encoded to utf-8.
In that case, you have indeed to call ord(emoji) to get its codepoint, but then represent the resulting number as hexadecimal, and replace the leading 0x Python's hex call yields with #:
def encode_emoji(emoji_list):
return [hex(ord(emoji)).replace("0x", "#") + ";" for emoji in emoji_list)]
TypeError: ord() takes exactly one argument (2 given)
I think the error is self-explanatory. That function takes one argument, but you are passing it two:
":{}"
"#08x"
Here some docs to read in case you need.

Confusion with valueError vs ValueType try/exception

I have a follow up question to a post I saw on converting a str() input to a int() type. Based on the definitions of valueError and valueType I would expect the valueType exception to have been used however, it doesn't work (when i tried it). ValueError works but I'm not sure why, isn't int('some string') an example of a wrong type?
Link to original post i'm referring to: Converting String to Int using try/except in Python
From the docs:
class int(x, base=10) Return an integer object constructed from a
number or string x, or return 0 if no arguments are given. If x is a
number, return x.int(). For floating point numbers, this truncates
towards zero.
If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in radix
base. Optionally, the literal can be preceded by + or - (with no space
in between) and surrounded by whitespace. A base-n literal consists of
the digits 0 to n-1, with a to z (or A to Z) having values 10 to 35.
The default base is 10. The allowed values are 0 and 2–36. Base-2, -8,
and -16 literals can be optionally prefixed with 0b/0B, 0o/0O, or
0x/0X, as with integer literals in code. Base 0 means to interpret
exactly as a code literal, so that the actual base is 2, 8, 10, or 16,
and so that int('010', 0) is not legal, while int('010') is, as well
as int('010', 8).
When you call the int() function on a string it will try to convert it to the specified base in the arguments (by default base-10) by iterating over the string and converting the string object over to an int object in the desired base. If it reaches a point where the conversion can not be made due to illegal syntax , it will raise a ValueError so terminate the program early. If for some reason you want to go forward you can but a try: except block in the code to catch the exception.
From the Docs:
exception ValueError Raised when a built-in operation or function
receives an argument that has the right type but an inappropriate
value, and the situation is not described by a more precise exception
such as IndexError.

How do I concatenate a string stored in variable and a number in MATLAB

I am trying to read a tag from XML and then want to concatenate a number to it.
Firstly, I am saving the value of the string to a variable and trying to concatenate it
with the variable in the for loop. But it throws an error.
for i = 0:tag.getLength-1
node = tag.item(i);
disp([node.getTextContent]);
str=node.getTextContent;
str= strcat(str, num2str(i))
new_loads = cat(2,loads,[node.getTextContent]);
end
Error thrown is
Operands to the || and && operators must be
convertible to logical scalar values.
Error in strcat (line 83)
if ~isempty(str) && (str(end) == 0 ||
isspace(str(end)))
Error in SMERCGUI>pushbutton1_Callback (line 182)
str= strcat(str,' morning')
Error in gui_mainfcn (line 96)
feval(varargin{:});
Error in SMERCGUI (line 44)
gui_mainfcn(gui_State, varargin{:});
Error in
#(hObject,eventdata)SMERCGUI('pushbutton1_Callback',hObject,eventdata,guidata(hObject))
Error while evaluating uicontrol Callback
The error suggests that your string is not a string. It's not clear to me whether it's throwing an error at the strcat line, or at the later cat line.
At any rate, it should be clear that you cannot concatenate elements of different types into an array - cell array yes, regular array no. So the line
new_loads = cat(2,loads,[node.getTextContent]);
is bound to give a problem. 2 is numerical, and node.getTextContent is a string - or maybe a cell array or something else. I can't see what loads is, so I can't tell if that is involved in the problem.
Usually a good way to combine numbers and strings into a single string is
newString = sprintf('%s %d', oldString, number);
You can then use all the formatting tricks of printf to produce output exactly as you want. But before you do anything, make sure you understand the type of all the elements you are trying to string together. The easiest way to do this for all the elements in memory is
whos
Or if you just want it for one variable,
whos str
Or all variables starting with s:
whos s*
The output is self-explanatory. If you still can't figure it out after this, leave a comment and I'll try to help you out.
EDIT based on what I read at http://blogs.mathworks.com/community/2010/11/01/xml-and-matlab-navigating-a-tree/ , it is possible that you just need to cast your str variable to a Matlab string (apparently it's a java.lang.string). So try to add
str = char(str);
before using str. It may be what you need.

Resources