Python2.7: How to get correct Chinese character?

Python2.7: How to get correct Chinese character? - string

I am having a problem with opening Chinese string in XML file.
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> from io import open
>>> file = open(u'/senti/cet_2.xml', encoding = u'utf-8')
>>> contents = file.read()
>>> contents
u'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<document>\n
<Topic>\u5584\u826f \u4e30\u5bcc \u9ad8\u8d35</Topic>\n <title T="\u5584\u826f\uff0c \u4e30\u5bcc\uff0c\u9ad8\u8d35">\n
However, I have no problem with the same code in Python 3.3
Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32
>>> file = open('/Senti/cet_2.xml', encoding = 'utf-8')
>>> contents = file.read()
>>> contents
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<document>\n
<Topic>善良 丰富 高贵</Topic>\n <title T="善良，丰富，高贵">\n
How can I get the correct string in Python 2.7?

Related

Why parse_qsl does not work for empty value?

I don't quite understand why no results are returned when the value is empty. Is there a way to get the key value pair when the value is empty? Thanks.
>>> urllib.parse.parse_qsl('a=b')
[('a', 'b')]
>>> urllib.parse.parse_qsl('a=')
[]

You can use keep_blank_values parameter. By the way, what version of python are you using. This is what I get when I use the keep_blank_values. By default it is set to False. And I use python version 3.8.2
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import parse_qsl
>>> parse_qsl('a=b')
[('a', 'b')]
>>> parse_qsl('a=')
[]
>>> parse_qsl('a=', keep_blank_values=True)
[('a', '')]
>>>

Trimming the filename using _ and removing characters using python

If my file name is 5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav .. Output to be 5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav . How can we trim the file name using python?

There are several ways to do, for instance
import os
l=os.path.splitext("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav")
l[0].split("_")[0] + l[1]
I use os.path.splitext to separate the possible extension
Execution with and without '_' and with and without extension :
pi#raspberrypi:~ $ python3
Python 3.7.3 (default, Dec 20 2019, 18:57:59)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>>
>>> def f(s):
... l = os.path.splitext(s)
... return l[0].split("_")[0] + l[1]
...
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843'
>>>

Unable to decode the base64 string - Python3.7

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import boto3
>>> import json
>>> import gzip
>>> import time
>>> from io import StringIO
>>> from base64 import b64decode
>>> import io
>>>
>>>
>>> orig_str = 'H4sIAAAAAAAAAL3U3WvbMBAA8H9F6Dk29yGdpL6FLO3DGCsk7GWUkaVKMeTD2G7LKP3fd0vTkm5ho52TF3O+Eyf/fLYe7Cq37ewmT3/U2Z7ZD8Pp8Nun8WQyvBjbgd3cr3OjaQgucYLoQhJNLzc3F83mttbKl8tRcb7c3Beaa59Kk67Js5XW8roqwMfFIjEHR9foKEMxm89z3enS9vZ7O2+quqs26/Nq2eWmtWdf7V09LxbaUTsVi20a7dW28fgur7tfax5sda392RM4n1wASYwAzjFr5JkjOSaUSKyXSHobUYIGEl0E3bur1N3NVkpAH6JohRwADJ7fh7Yns882BzUGUUqREgVLMRioZCwxlujZILFxQBI0b9AErb9s9RJGNsPRaHw5NZ8/2sfB/8mwV9lrzL5zp1LfqWTUqwy4DFI6KJ8Ivw/NE6fT0fiIQ3sF3bHePTUBIA4efUQWSRQjBErkMSTN6GNIEKeqlBIJsLjoDtJioP5pPuAhmu71HCbqk+Z7pf39EPGJwh+HyNFk0qvsn7/adnAnooUjfo/7I9wN7C2f49XjTwIzaQCLBwAA'
>>>
>>>
>>> print(b64decode(orig_str))
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xbd\xd4\xddk\xdb0\x10\x00\xf0\x7fE\xe896\xf7!\x9d\xa4\xbe\x85,\xed\xc3\x18+$\xece\x94\x91\xa5J1\xe4\xc3\xd8n\xcb(\xfd\xdfwK\xd3\x92na\xa3\x9d\x93\x17s\xbe\x13\'\xff|\xb6\x1e\xec*\xb7\xed\xec&O\x7f\xd4\xd9\x9e\xd9\x0f\xc3\xe9\xf0\xdb\xa7\xf1d2\xbc\x18\xdb\x81\xdd\xdc\xafs\xa3i\x08.q\x82\xe8B\x12M/77\x17\xcd\xe6\xb6\xd6\xca\x97\xcbQq\xbe\xdc\xdc\x17\x9ak\x9fJ\x93\xae\xc9\xb3\x95\xd6\xf2\xba*\xc0\xc7\xc5"1\x07G\xd7\xe8(C1\x9b\xcfs\xdd\xe9\xd2\xf6\xf6{;o\xaa\xba\xab6\xeb\xf3j\xd9\xe5\xa6\xb5g_\xed]=/\x16\xdaQ;\x15\x8bm\x1a\xed\xd5\xb6\xf1\xf8.\xaf\xbb_k\x1elu\xad\xfd\xd9\x138\x9f\\\x00I\x8c\x00\xce1k\xe4\x99#9&\x94H\xac\x97Hz\x1bQ\x82\x06\x12]\x04\xdd\xbb\xab\xd4\xdd\xcdVJ#\x1f\xa2h\x85\x1c\x00\x0c\x9e\xdf\x87\xb6\'\xb3\xcf6\x075\x06QJ\x91\x12\x05K1\x18\xa8d,1\x96\xe8\xd9 \xb1q#\x124o\xd0\x04\xad\xbfl\xf5\x12F6\xc3\xd1h|95\x9f?\xda\xc7\xc1\xff\xc9\xb0W\xd9k\xcc\xbes\xa7R\xdf\xa9d\xd4\xab\x0c\xb8\x0cR:(\x9f\x08\xbf\x0f\xcd\x13\xa7\xd3\xd1\xf8\x88C{\x05\xdd\xb1\xde=5\x01 \x0e\x1e}D\x16I\x14#\x04J\xe41$\xcd\xe8cH\x10\xa7\xaa\x94\x12\t\xb0\xb8\xe8\x0e\xd2b\xa0\xfei>\xe0!\x9a\xee\xf5\x1c&\xea\x93\xe6{\xa5\xfd\xfd\x10\xf1\x89\xc2\x1f\x87\xc8\xd1d\xd2\xab\xec\x9f\xbf\xdavp\'\xa2\x85#~\x8f\xfb#\xdc\r\xec-\x9f\xe3\xd5\xe3O\x023i\x00\x8b\x07\x00\x00'
>>>
>>>
The output was on bytes, while I wanted this to be the text/string; should look something like this,
2 074939084796 eni-0d882207508141cd4 432.150.28.36 352.67.89.12 123 52782 17 1 76 1578627847 1578627896 ACCEPT OK

How to process the ASCII in Python3.65?

there is a log file to process, which comes from a machine of the other company.
They are using Python2.7, so the file is encoded by ASCII. But in my team, we are using Python3.65.
When I
fileopen = open(my_file, 'rb')
file_content = fileopen.read()
I find the file_content is automatically decoded by UTF-8.How to handle this?

What you are seeing is an artifact of the way your editor is displaying the data. If you read a file in binary you will get the same bytes you wrote. They are are just bytes, not encoded as anything. Your editor looks like it is decoding the string for display...incorrectly.
Below I write a file in binary with Python 3.6 and read it back with 2.7 and 3.6:
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> open('test.txt','wb').write(b'\x8c\x8d\x91\x8c')
4
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> open('test.txt','rb').read()
'\x8c\x8d\x91\x8c'
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> open('test.txt','rb').read()
b'\x8c\x8d\x91\x8c'
In both cases the same four bytes read are the ones written.
Try displaying the value of the bytes instead of the strings:
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('test.txt','rb').read()
>>> for c in b:
... print(hex(c))
...
0x8c
0x8d
0x91
0x8c
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('test.txt','rb').read()
>>> for c in b:
... print hex(ord(c))
...
0x8c
0x8d
0x91
0x8c

I am unable to get tensorflow to execute

The following is a snippet from my Python 3.5 interpreter. I don't understand why this produces the error (Listed bellow). Any suggestions?
Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64
bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, Tensorflow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
Then it gives me just a list of internal Errors starting with:
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python2.7: How to get correct Chinese character? - string

Related

Why parse_qsl does not work for empty value?

Trimming the filename using _ and removing characters using python

Unable to decode the base64 string - Python3.7

How to process the ASCII in Python3.65?

I am unable to get tensorflow to execute

Categories

Resources