I'm using python3 and i'm traing to bigram a sentence but the interpreter gives me a problem that i can't understand.
~$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk.util import ngrams
>>> text = "Hi How are you? i am fine and you"
>>> token=nltk.word_tokenize(text)
>>> bigrams=ngrams(token,2)
>>> bigrams
<generator object ngrams at 0x7ff1d81d2468>
>>> print (bigrams)
<generator object ngrams at 0x7ff1d81d2468>
What does it means: "generator object ngrams at 0x7ff1d81d2468"?
Why I can neither inspect nor print n-grams?
Generator objects are iterable, but only once - see this answer. When print tries to display them, it shows their type rather than their actual 'items'. You can convert the generator object into a list using
>>> bigrams=list(ngrams(token,2))
and them print their items using
>>> print(bigrams)
as they are now a list object, so their items are printed instead of 'description' of them.
Related
I don't quite understand why no results are returned when the value is empty. Is there a way to get the key value pair when the value is empty? Thanks.
>>> urllib.parse.parse_qsl('a=b')
[('a', 'b')]
>>> urllib.parse.parse_qsl('a=')
[]
You can use keep_blank_values parameter. By the way, what version of python are you using. This is what I get when I use the keep_blank_values. By default it is set to False. And I use python version 3.8.2
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import parse_qsl
>>> parse_qsl('a=b')
[('a', 'b')]
>>> parse_qsl('a=')
[]
>>> parse_qsl('a=', keep_blank_values=True)
[('a', '')]
>>>
If my file name is 5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav .. Output to be 5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav . How can we trim the file name using python?
There are several ways to do, for instance
import os
l=os.path.splitext("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav")
l[0].split("_")[0] + l[1]
I use os.path.splitext to separate the possible extension
Execution with and without '_' and with and without extension :
pi#raspberrypi:~ $ python3
Python 3.7.3 (default, Dec 20 2019, 18:57:59)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>>
>>> def f(s):
... l = os.path.splitext(s)
... return l[0].split("_")[0] + l[1]
...
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC.wav")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843.wav'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843_20200616T10_50_UTC")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843'
>>>
>>> f("5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843")
'5f0756fa-75bc-4c70-9ba8-fbd1b6a9f843'
>>>
I have a list of integers and I thought I could use np.searchsorted() to perform a binary search to look for the the closest integer. So, I tried,
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> B = [0, 36, 75, 111, 162, 198, 237, 273]
>>> np.searchsorted(B, 210)
6
>>> B[np.searchsorted(B, 210)]
237
Should the closest neighbour of 210 not be 198? Is there a native Python 3 library that does what I want? I could implement it myself but I am looking a fastest implementation.
I think np.argmin() is proper for your purpose.
Try this code.
import numpy as np
B = np.array([1,2,3,4,5])
criterion = 4
ind = np.argmin(np.abs(B - criterion)) # find the index i, where i'th element is the closest to criterion
print(B[ind])
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import boto3
>>> import json
>>> import gzip
>>> import time
>>> from io import StringIO
>>> from base64 import b64decode
>>> import io
>>>
>>>
>>> orig_str = 'H4sIAAAAAAAAAL3U3WvbMBAA8H9F6Dk29yGdpL6FLO3DGCsk7GWUkaVKMeTD2G7LKP3fd0vTkm5ho52TF3O+Eyf/fLYe7Cq37ewmT3/U2Z7ZD8Pp8Nun8WQyvBjbgd3cr3OjaQgucYLoQhJNLzc3F83mttbKl8tRcb7c3Beaa59Kk67Js5XW8roqwMfFIjEHR9foKEMxm89z3enS9vZ7O2+quqs26/Nq2eWmtWdf7V09LxbaUTsVi20a7dW28fgur7tfax5sda392RM4n1wASYwAzjFr5JkjOSaUSKyXSHobUYIGEl0E3bur1N3NVkpAH6JohRwADJ7fh7Yns882BzUGUUqREgVLMRioZCwxlujZILFxQBI0b9AErb9s9RJGNsPRaHw5NZ8/2sfB/8mwV9lrzL5zp1LfqWTUqwy4DFI6KJ8Ivw/NE6fT0fiIQ3sF3bHePTUBIA4efUQWSRQjBErkMSTN6GNIEKeqlBIJsLjoDtJioP5pPuAhmu71HCbqk+Z7pf39EPGJwh+HyNFk0qvsn7/adnAnooUjfo/7I9wN7C2f49XjTwIzaQCLBwAA'
>>>
>>>
>>> print(b64decode(orig_str))
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xbd\xd4\xddk\xdb0\x10\x00\xf0\x7fE\xe896\xf7!\x9d\xa4\xbe\x85,\xed\xc3\x18+$\xece\x94\x91\xa5J1\xe4\xc3\xd8n\xcb(\xfd\xdfwK\xd3\x92na\xa3\x9d\x93\x17s\xbe\x13\'\xff|\xb6\x1e\xec*\xb7\xed\xec&O\x7f\xd4\xd9\x9e\xd9\x0f\xc3\xe9\xf0\xdb\xa7\xf1d2\xbc\x18\xdb\x81\xdd\xdc\xafs\xa3i\x08.q\x82\xe8B\x12M/77\x17\xcd\xe6\xb6\xd6\xca\x97\xcbQq\xbe\xdc\xdc\x17\x9ak\x9fJ\x93\xae\xc9\xb3\x95\xd6\xf2\xba*\xc0\xc7\xc5"1\x07G\xd7\xe8(C1\x9b\xcfs\xdd\xe9\xd2\xf6\xf6{;o\xaa\xba\xab6\xeb\xf3j\xd9\xe5\xa6\xb5g_\xed]=/\x16\xdaQ;\x15\x8bm\x1a\xed\xd5\xb6\xf1\xf8.\xaf\xbb_k\x1elu\xad\xfd\xd9\x138\x9f\\\x00I\x8c\x00\xce1k\xe4\x99#9&\x94H\xac\x97Hz\x1bQ\x82\x06\x12]\x04\xdd\xbb\xab\xd4\xdd\xcdVJ#\x1f\xa2h\x85\x1c\x00\x0c\x9e\xdf\x87\xb6\'\xb3\xcf6\x075\x06QJ\x91\x12\x05K1\x18\xa8d,1\x96\xe8\xd9 \xb1q#\x124o\xd0\x04\xad\xbfl\xf5\x12F6\xc3\xd1h|95\x9f?\xda\xc7\xc1\xff\xc9\xb0W\xd9k\xcc\xbes\xa7R\xdf\xa9d\xd4\xab\x0c\xb8\x0cR:(\x9f\x08\xbf\x0f\xcd\x13\xa7\xd3\xd1\xf8\x88C{\x05\xdd\xb1\xde=5\x01 \x0e\x1e}D\x16I\x14#\x04J\xe41$\xcd\xe8cH\x10\xa7\xaa\x94\x12\t\xb0\xb8\xe8\x0e\xd2b\xa0\xfei>\xe0!\x9a\xee\xf5\x1c&\xea\x93\xe6{\xa5\xfd\xfd\x10\xf1\x89\xc2\x1f\x87\xc8\xd1d\xd2\xab\xec\x9f\xbf\xdavp\'\xa2\x85#~\x8f\xfb#\xdc\r\xec-\x9f\xe3\xd5\xe3O\x023i\x00\x8b\x07\x00\x00'
>>>
>>>
The output was on bytes, while I wanted this to be the text/string; should look something like this,
2 074939084796 eni-0d882207508141cd4 432.150.28.36 352.67.89.12 123 52782 17 1 76 1578627847 1578627896 ACCEPT OK
I just finished installing Tensorflow 1.3 on RPi 3. When validating the installation (according to this https://www.tensorflow.org/install/install_sources) somehow a lowercase "b" shown up. See these codes:
root#raspberrypi:/home/pi# python
Python 3.4.2 (default, Oct 19 2014, 13:31:11)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
>>>
No, it's not a bug. Your installation is perfectly fine, this is the normal behavior.
The b before the string is due to the Tensorflow internal representation of strings.
Tensorflow represents strings as byte array, thus when you "extract" them (from the graph, thus tensorflow's internal representation, to the python enviroment) using sess.run(hello) you get a bytes type and not a str type.
You can verify this using the type function:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(type(sess.run(hello)))
results in <class 'bytes'> whilst if you do:
print(type('Hello, TensorFlow!'))
results in <class 'str'>