How to get utf-8 bytes in of emoji? - python-3.x

How would I get the actual character points for an emoji in python3?
>>> '😋'
'😋'
>>> ?
\xF0\x9F\x98\x8B
And then, vice-versa, how would I print the emoji from code points?
>>> print ('\xF0\x9F\x98\x8B')
'😋'
This was the behavior in python2.7 but not in 3 so curious how to do it here.
Python 2.7.18 (default, Nov 13 2021, 06:17:34)
>>> '😋'
'\xf0\x9f\x98\x8b'
>>> print('\xf0\x9f\x98\x8b')
😋

You can use the string/bytes default decode / encode methods:
>>> '😀'.encode('utf-8')
b'\xf0\x9f\x98\x80'
>>> b'\xf0\x9f\x98\x80'.decode('utf-8')
'😀'

Related

Trimming the filename using _ and removing characters using python

If my file name is 5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav .. Output to be 5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav . How can we trim the file name using python?
There are several ways to do, for instance
import os
l=os.path.splitext("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav")
l[0].split("_")[0] + l[1]
I use os.path.splitext to separate the possible extension
Execution with and without '_' and with and without extension :
pi#raspberrypi:~ $ python3
Python 3.7.3 (default, Dec 20 2019, 18:57:59)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>>
>>> def f(s):
... l = os.path.splitext(s)
... return l[0].split("_")[0] + l[1]
...
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843'
>>>

I can't bigram a sentece with Python3

I'm using python3 and i'm traing to bigram a sentence but the interpreter gives me a problem that i can't understand.
~$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk.util import ngrams
>>> text = "Hi How are you? i am fine and you"
>>> token=nltk.word_tokenize(text)
>>> bigrams=ngrams(token,2)
>>> bigrams
<generator object ngrams at 0x7ff1d81d2468>
>>> print (bigrams)
<generator object ngrams at 0x7ff1d81d2468>
What does it means: "generator object ngrams at 0x7ff1d81d2468"?
Why I can neither inspect nor print n-grams?
Generator objects are iterable, but only once - see this answer. When print tries to display them, it shows their type rather than their actual 'items'. You can convert the generator object into a list using
>>> bigrams=list(ngrams(token,2))
and them print their items using
>>> print(bigrams)
as they are now a list object, so their items are printed instead of 'description' of them.

Pandas Sometimes, pd.Series.isin is callable and sometimes raises Series object is not callable exception

The following code throws a Series object is callable exception. I added a test to see if the isin method is callable: it returns True as expected. But I still get this exception:
```
data = {}
index_lst = []
our_lst = []
their_lst = []
for filename in glob.glob('data/tysonl.2018-05-2*.csv'):
# filename = f"fred_2018-05-0{m}.csv"
print(f"Processing {filename}")
mpd = pd.read_csv(filename, usecols=['Date', 'hostname', 'primary owner', 'status'])
# This is from
# https://stackoverflow.com/questions/28133018/convert-pandas-series-to-datetime-in-a-dataframe
datetime.date: df_date = pd.to_datetime(mpd["Date"]).dt.date[0]
pd.core.series.Series: hostnames = mpd.hostname
pd.core.series.Series: statuses = mpd.status
pd.core.series.Series: owners = mpd['primary owner'] # Space in column name means has to be an index[] not attribute
# I don't know why, but sometimes read_csv returns a numpy.ndarray in column primary owner. It happens in some CSV
# files and not others. So I am doing this "hail Mary" conversion.
if type(owners)!=type(pd.core.series.Series):
print( f"owners should be of type pandas.core.series.Series but are actually {type(primary_owners)}. Converting")
pd.core.series.Series: owners = pd.Series( primary_owners )
assert isinstance( owners, pd.core.series.Series), "Tried to convert owners to a Series and failed"
print(type(owners.isin), callable(owners.isin)))
ours=owners.isin(managers)
Which outputs:
Processing data/tysonl.2018-05-23-prod-quilt-status.csv
**<class 'method'> True**
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-70-235b583cc64f> in <module>()
21 assert isinstance( owners, pd.core.series.Series), "Tried to convert owners to a Series and failed"
22 print(type(owners.isin), callable(owners.isin))
---> 23 ours=owners.isin(managers)
24 # assert df_date not in index_lst, f"df_date {df_date} is already in the index_lst\n{index_lst}\nfilename is {filename}."
25 if dt_date not in index_lst:
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in isin(self, values)
2802 """
2803 result = algorithms.isin(_values_from_object(self), values)
-> 2804 return self._constructor(result, index=self.index).__finalize__(self)
2805
2806 def between(self, left, right, inclusive=True):
TypeError: 'Series' object is not callable
```
It drives me nuts that it is callable on the line before the .isin call, and yet python thinks it is not callable and raises the exception. I tried to reproduce the problem in a smaller, more concise environment, and I was UNABLE to do so.
jeffs#jeffs-desktop:/home/jeffs/learn_pandas (development) * $ python3
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
>>> s = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama','hippo'], name='animal')
>>> s.isin(['cow'])
0 False
1 True
2 False
3 False
4 False
5 False
Name: animal, dtype: bool
>>>
jeffs#jeffs-desktop:/home/jeffs/learn_pandas (development) * $ ipython
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import sys
In [2]: print(sys.version)
3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0]
In [3]: import pandas
In [4]: import pandas as pd
In [5]: s = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama','hippo'], name='animal')
In [6]: s.isin(['cow'])
Out[6]:
0 False
1 True
2 False
3 False
4 False
5 False
Name: animal, dtype: bool
In [7]:
Also as expected.
I cannot reproduce the problem at all outside of jupyter.
I am a pandas newbie. I'm also a jupyter newbie. I kinda like jupyter, but if I am going to have these kinds of strange errors, then I have to go back to using pycharm, which is not that bad. I have no idea what other information might be useful or what to look for. In particular, I utterly fail to understand why callable(owners.isin) but the exception is raised.
Thank you

Why do strings returned by TensorFlow show up with a 'b' prefix in Python 3?

I just finished installing Tensorflow 1.3 on RPi 3. When validating the installation (according to this https://www.tensorflow.org/install/install_sources) somehow a lowercase "b" shown up. See these codes:
root#raspberrypi:/home/pi# python
Python 3.4.2 (default, Oct 19 2014, 13:31:11)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
>>>
No, it's not a bug. Your installation is perfectly fine, this is the normal behavior.
The b before the string is due to the Tensorflow internal representation of strings.
Tensorflow represents strings as byte array, thus when you "extract" them (from the graph, thus tensorflow's internal representation, to the python enviroment) using sess.run(hello) you get a bytes type and not a str type.
You can verify this using the type function:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(type(sess.run(hello)))
results in <class 'bytes'> whilst if you do:
print(type('Hello, TensorFlow!'))
results in <class 'str'>

Computing the MD5 hash of an integer in python 3?

I need to compute a hash of an integer using python 3. Is there a cleaner and more efficient solution than the following?
>>> import hashlib
>>> N = 123
>>> hashlib.md5(str(N).encode("ascii")).hexdigest()
'202cb962ac59075b964b07152d234b70'
It seems weird to have to convert to a unicode string, then decode it to a byte array.
A cryptographic hash such as MD5 can only be applied to bytes. There are more efficient ways of encoding a number as bytes, but you still need to follow the contract.
>>> hashlib.md5(int(-123).to_bytes(8, 'big', signed=True)).hexdigest()
'fc1063e1bcb35f0d52cdceae4626c39b'
Ignacio's answer is perfect, but in case you need the code to work with both python 2 and python 3, and if you have NumPy installed, then this works great:
>>> import numpy as np
>>> import hashlib.md5
>>> N = 123
>>> hashlib.md5(np.int64(N)).hexdigest()
'f18b8dbefe02a0efce281deb55a209cd'

Resources