Convering double backslash to single backslash in Python 3 - python-3.x

I have a string like so:
>>> t
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
That I made using a function that converts unicode to the representative Python escape sequences. Then, when I want to convert it back, I can't get rid of the double backslash so that it is interpreted as unicode again. How can this be done?
>>> t = unicode_encode("
>>> t
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
>>> print(t)
\u0048\u0065\u006c\u006c\u006f\u0020\u20ac\u0020\u00b0
>>> t.replace('\\','X')
'Xu0048Xu0065Xu006cXu006cXu006fXu0020Xu20acXu0020Xu00b0'
>>> t.replace('\\', '\\')
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
Of course, I can't do this, either:
>>> t.replace('\\', '\')
File "<ipython-input-155-b46c447d6c3d>", line 1
t.replace('\\', '\')
^
SyntaxError: EOL while scanning string literal

Not sure if this is appropriate for your situation, but you could try using unicode_escape:
>>> t
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
>>> type(t)
<class 'str'>
>>> enc_t = t.encode('utf_8')
>>> enc_t
b'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
>>> type(enc_t)
<class 'bytes'>
>>> dec_t = enc_t.decode('unicode_escape')
>>> type(dec_t)
<class 'str'>
>>> dec_t
'Hello € °'
Or in abbreviated form:
>>> t.encode('utf_8').decode('unicode_escape')
'Hello € °'
You take your string and encode it using UTF-8, and then decode it using unicode_escape.

Since a backslash is an escape character and you are searching for two backslashes you need to replace four backslashes with two - i.e.:
t.replace("\\\\", "\\")
This will replace every r"\\" with r"\". The r indicates raw string. So, for example, if you type print(r"\\") into idle or any python script (or print r"\\" in Python 2) you will get \\\\. This means that every "\\" is really just a r"\".
user1632861 suggested that you use .replace("\\", ""), but this replaces ever r"\" with nothing. Try the above method instead. :D
In this case, however, it appears as though you are reading/receiving data, and you probably want to use the correct encoding and then decode to unicode (as the person above me suggested).

You only got one backslash in your code, but backslashes are represent as \\. As you can see, when you use print(), there's only one backslash. So if you want to get rid of one of the two backslashes, don't do anything, it's not there. If you wanna get rid of both, just remove one. Again use \\ to represent one backslash: t.replace("\\", "")
So your string never has two backslashes in the first place, it shouldn't be the problem.

Related

Why the print('\n', 'abc') in Python 3 is giving new line and an empty space?

I was just expecting a new line character and written below code.
>>> print('\n', 'abc')
abc
>>>
But it gave single space also in front of the string abc, may I know why it added that space?
I am using Python 3.9 in Windows 10
If given more than one argument, print joins them with spaces, so you are effectively doing print("\n abc").
The default separator (sep) argument to the print function is a whitespace. Change it for a empty string to remove that single space indentation:
print('\n', 'abc', sep='')

How to replace single forward slashes with single backward slashes

Consider a python variable containing an arbitrary string with some forward slashes. I would like
to replace every forward slash in the string with a backward slash. These forward slashes appear
in the input string not in the context of a path separator.
I cannot find a way to do this replacement using python's string 'replace' method.
Using a single backslash as the second argument produces syntax error as the single backslash
escapes the terminating quote
>>> s = 'a26/n//3#5'
>>> s
'a26/n//3#5'
>>> s.replace('/', '\')
File "<stdin>", line 1
s.replace('/', '\')
^
SyntaxError: EOL while scanning string literal
Using two single backslashes in the replacement string produces two backslashes in output string
>>> s.replace('/', '\\')
'a26\\n\\\\3#5'
The replaced string should contain
a26\n\\3#5
The output you're seeing is the repr representation of the string.
>>> s = 'a26/n//3#5'
>>> s
'a26/n//3#5'
>>> s.replace('/', '\\')
>>> s
>>> 'a26\\n\\\\3#5' # repr representation ('\' as '\\')
To get your expected output you should print the string:
>>> new_s = s.replace('/', '\\')
>>> print(new_s)
>>> a26\n\\3#5
Edit: Fixed typo

Convert string with "\x" character to float

I'm converting strings to floats using float(x). However for some reason, one of the strings is "71.2\x0060". I've tried following this answer, but it does not remove the bytes character
>>> s = "71.2\x0060"
>>> "".join([x for x in s if ord(x) < 127])
'71.2\x0060'
Other methods I've tried are:
>>> s.split("\\x")
['71.2\x0060']
>>> s.split("\x")
ValueError: invalid \x escape
I'm not sure why this string is not formatted correctly, but I'd like to get as much precision from this string and move on.
Going off of wim's comment, the answer might be this:
>>> s.split("\x00")
['71.2', '60']
So I should do:
>>> float(s.split("\x00")[0])
71.2
Unfortunately the POSIX group \p{XDigit} does not exist in the re module. To remove the hex control characters with regular expressions anyway, you can try the following.
impore re
re.sub(r'[\x00-\x1F]', r'', '71.2\x0060') # or:
re.sub(r'\\x[0-9a-fA-F]{2}', r'', r'71.2\x0060')
Output:
'71.260'
'71.260'
r means raw. Take a look at the control characters up to hex 1F in the ASCII table: https://www.torsten-horn.de/techdocs/ascii.htm

why does 2 back slashes appear in python if I enter a back slash followed by a special character

I am new to python. So just out of curiosity, I wrote the below string in IDLE:
'Happy Birth\^day' and I got the output as 'Happy Birth\\^day'
From where does python add an extra backslash?
The extra backslash is not actually added; it's just added by the repr() function to indicate that it's a literal backslash. The Python interpreter uses the repr() function (which calls __repr__() on the object) when the result of an expression needs to be printed:
>>> '\\'
'\\'
>>> print '\\'
\
>>> print '\\'.__repr__()
'\\'
So try -
print('Happy Birth\^day')
#Happy Birth\^day
And it will only print a single backslash.
That's just because of the representation and the escaping feature of \. It doesn't actually include double backslash. It adds it because of the future processes that may be done on the string.
Let't check it using len function in python to check whether the second backslash exists or not:
happy_birthday = 'Happy Birth\^day'
len(happy_birthday)
>>> 16
happy_birthday
>>> 'Happy Birth\\^day' # length is 17!
print(happy_birthday)
>>> 'Happy Birth\^day'
Result
As you see the len is considering only one backslash.
So actually it is not adding it. It is the representation mode which shows it like this and it will be correct when you try to use it.

Replace two backslashes with a single backslash

I want to replace a string with two backslashes with single backslashes. However replace doesn't seem to accept '\\' as the replacement string. Here's the interpreter output:
>>> import tempfile
>>> temp_folder = tempfile.gettempdir()
>>> temp_folder
'C:\\Users\\User\\AppData\\Local\\Temp'
>>> temp_folder.replace('\\\\', '\\')
'C:\\Users\\User\\AppData\\Local\\Temp'
BTW, I know that Windows paths need to contain either double backslashes or a single forward slashes. I want to replace them anyway for display purposes.
Your output doesn't have double backslashes. What you are looking at is the repr() value of the string and that displays with escaped backslashes. Assuming your temp_folder would have double backslashes, you should instead use:
print(temp_folder.replace('\\\\', '\\'))
and that will show you:
C:\Users\User\AppData\Local\Temp
which also drops the quotes.
But your temp_folder is unlikely to have double backslashes and this difference in display probably got you thinking that there are double backslashes in the return value from tempfile.gettempdir(). As #Jean-Francois indicated, there should not be (at least not on Windows). So you don't need to use the .replace(), just print:
print(temp_folder)
This works for me
text = input('insert text')
list = text.split('\\')
print(list)
text2 = ''
for items in list:
if items != '':
text += items + '\\'
print(text2)

Resources