I'm Trying to delete 2 chars from start of the str and 1 char from the end
import urllib3
target_url="www.klimi.hys.cz/nalada.txt"
http = urllib3.PoolManager()
r = http.request('GET', target_url)
print(r.status)
print(r.data)
print()
And Output is
200
b'smutne'
I need to output be only "smutne", only this, not the " b' " and " ' "
When you have bytes, you'll need them to decode it into a string with the appropriate encoding type. For example, if you have a ASCII characters as bytes, you can do:
>>> foo = b'mystring'
>>> print(foo)
b'mystring'
>>> print(foo.decode('ascii'))
'mystring'
Or, more commonly, you probably have Unicode characters (which includes most of the ASCII character codes too):
>>> print(foo.decode('utf-8'))
'mystring'
This will work if you have glyphs with accents and such.
More on Python encoding/decoding here: https://docs.python.org/3/howto/unicode.html
In the particular case of urllib3, r.data returns bytes that you'll need to decode in order to use as a string if that's what you want.
Related
I'm trying to perform string replace that contain backsplash with a character. Any able to guide how to handle backsplash with a character
string= 'decimal("\t",0) Amount = 0;'
print(string.replace('\t','\0')) #tried: print(string.replace('"\\t"','"\\0"'))
expected output
decimal("\0",0) Amount = 0;
current output
decimal(" ",0) Amount = 0;
Given:
>>> string= 'decimal("\t",0) Amount = 0;'
You can use the literal \ with '\\' and do:
>>> print(string.replace('\t','\\0'))
decimal("\0",0) Amount = 0;
Careful not to confuse what is printed with its interpreter representation which will be the double backslash form:
>>> string.replace('\t','\\0')
'decimal("\\0",0) Amount = 0;'
Also understand that '\t' is a single character; a tab. The character '\0' is also a single character; a NUL:
>>> len('\t')
1
>>> len('\0')
1
What you need is the two character string '\\0' which in turn will be printed as '\0' even though that is not a NUL:
>>> len('\\0')
2
>>> '\\0'=='\0'
False
As an alternative to '\\0' you can also use a raw string:
>>> '\\0'==r'\0'
True
Which you use is a matter of preference (I use the r form exclusively for regex for example and personally prefer \\ for applications like this) but it is a form you should know.
The following works:
import re
text = "I\u2019m happy"
text_p = text
text_p = re.sub("[\u2019]","'",text_p)
print(text_p)
Output: I'm happy
This doesn't work:
training_data = pd.read_csv('train.txt')
import re
text = training_data['tweet_text'][0] # Assume that this returns a string "I\u2019m happy"
text_p = text
text_p = re.sub("[\u2019]","'",text_p)
print(text_p)
Output: I\u2019m happy
I tried running your code and got I'm happy returned from both the string and the list item when passing each into re.sub(...) as outlined in your question.
If you're just looking to parse (decode) the unicode characters you probably don't need to be using re. Something like the below could be used to parse the unicode characters without having to run re to check each possibility.
text = training_data['tweet_text'][0]
if type(text) == str: # if value is str then encode to utf-8 byte string then decode back to str
text = text.encode()
text = text.decode()
elif type(text) == bytes: # elif value is bytes just decode to str
text = text.decode()
else: # else printout to console if value is neither str or bytes
print("Value not recognised as str or bytes!")
I am using Python 3.5.2
I know that with print("\x00") (where 0 is an ASCII character) I can print symbols with hex format. But how can I print number 500,000 (in hex: 7A120) when print("\x00") takes only 2 characters?
To print a constant hexidecimal expression, you can prefix the number with a 0x, and it will resolve to an int with the equivalent base 10 value, like so:
>>> print(0x7A120)
500000
If you want to print a string with arbitrary hexidecimal characters in it, use int:
>>> a = "7A120"
>>> print(int(a, 16))
500000
The second argument to int is the base to parse the string from, in this case base 16 (hex).
To print an integer in hexidecimal format, use the format operator, %:
>>> a = 0x7A120
>>> print("%x" % a)
7a120
You can change the "x" in "%x" to uppercase to print a through f in uppercase:
>>> b = 0xABCDEF
>>> print("%x" % b)
abcdef
>>> print("%X" % b)
ABCDEF
I want to take a string such as:
'\\xeb\\x4d'
and turn it into:
b'\xeb\x4d'
If I do:
bytes('\\xeb\\x4d', 'utf-8')
I get:
b'\\xeb\\x4d'
I need something that does the following:
something('\\xeb\\x4d') == b'\xeb\x4d'
>>> a = '\\xeb\\x4d' # a Unicode string
>>> a.encode('latin1') # get a byte string
b'\\xeb\\x4d'
>>> a.encode('latin1').decode('unicode_escape') # unescape, get a Unicode string
'ëM'
>>> a.encode('latin1').decode('unicode_escape').encode('latin1') # get a byte string
b'\xebM'
>>> a.encode('latin1').decode('unicode_escape').encode('latin1') == b'\xeb\x4d'
True
Note that latin1 is the first 256 codepoints of Unicode, so encoding the first 256 bytes of Unicode gives the same byte values as the original codepoint.
a = '\\xeb\\x4d'
a = bytes(a, 'utf-8')
a = a.decode('unicode_escape').encode('latin1')
gives
b'\xebM'
because
'\x4d' == 'M'
file_1 = (r'res\test.png')
with open(file_1, 'rb') as file_1_:
file_1_read = file_1_.read()
file_1_hex = binascii.hexlify(file_1_read)
print ('Hexlifying test.png..')
pack = ("test.packet")
file_1_size_bytes = len(file_1_read)
print (("test.png is"),(file_1_size_bytes),("bytes."))
struct.pack( 'i', file_1_size_bytes)
file_1_size_bytes_hex = binascii.hexlify(struct.pack( '>i', file_1_size_bytes))
print (("Hexlifyed length - ("),(file_1_size_bytes_hex),(")."))
with open(pack, 'ab') as header_1_:
header_1_.write(binascii.unhexlify(file_1_size_bytes_hex))
print (("("),(binascii.unhexlify(file_1_size_bytes_hex)),(")"))
with open(pack, 'ab') as header_head_1:
header_head_1.write(binascii.unhexlify("0000020000000D007200650073002F00000074006500730074002E0070006E006700000000"))
print ("Header part 1 added.")
So this writes "0000020000000D007200650073002F00000074006500730074002E0070006E006700000000(00)" to the pack unhexlifyed.
There's an extra "00" byte at the end. this is messing everything up im trying to do because the packets length is referred back to when loading it and i have about 13 extra "00" bytes at the end of each string i write to the file. So in turn my file is 13 bytes longer than it should be. Not to mention the headers byte length isnt being read properly because the padding is off by 1 byte.
You seem to be saying that binascii.unhexlify does not really condense the input string. I have trouble believing that. Here is a minimal complete runnable example and the output I get with 3.4.2 on Win 7.
import binascii
import io
b = binascii.unhexlify(
"000000030000000100000000000000040041004E0049004D00000000000000")
print(b) # bytes
bf = io.BytesIO()
bf.write(b)
print(bf.getvalue())
>>>
b'\x00\x00\x00\x03\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x04\x00A\x00N\x00I\x00M\x00\x00\x00\x00\x00\x00\x00'
b'\x00\x00\x00\x03\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x04\x00A\x00N\x00I\x00M\x00\x00\x00\x00\x00\x00\x00'
Unhexlify has converted each pair of hex characters to the byte expected.