Python Numba - Convert DataFrame series object to numpy array - python-3.x

I have a pandas dataframe with strings I am trying to use the set operation using python numba to get the unique characters in the column that contains strings in the dataframe. Since, numba does note recognize pandas dataframes, I need to convert the string column to an numpy array. However, once converted the column shows the dtype as a object. Is there a way that I could convert the pandas dataframe (column of strings) to a normal array (not an object array)
Please find the code for your understanding.
z = train.head(2).sentence.values #Train is a pandas DataFrame
z
Output:
array(["Explanation\nWhy the edits made under my username Hardcore Metallica Fan were reverted? They weren't vandalisms, just closure on some GAs after I voted at New York Dolls FAC. And please don't remove the template from the talk page since I'm retired now.89.205.38.27",
"D'aww! He matches this background colour I'm seemingly stuck with. Thanks. (talk) 21:51, January 11, 2016 (UTC)"],
dtype=object)
Python Numba code:
#njit
def set_(z):
x = set(z.sum())
return x
set_(z)
Output:
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
<ipython-input-51-9d5bc17d106b> in <module>()
----> 1 set_(z)
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws)
342 raise e
343 else:
--> 344 reraise(type(e), e, None)
345 except errors.UnsupportedError as e:
346 # Something unsupported is present in the user code, add help info
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/six.py in reraise(tp, value, tb)
656 value = tp()
657 if value.__traceback__ is not tb:
--> 658 raise value.with_traceback(tb)
659 raise value
660
TypingError: Failed at nopython (nopython frontend)
Internal error at <numba.typeinfer.ArgConstraint object at 0x7fbe66c01a58>:
--%<----------------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/errors.py", line 491, in new_error_context
yield
File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/typeinfer.py", line 194, in __call__
assert ty.is_precise()
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/typeinfer.py", line 138, in propagate
constraint(typeinfer)
File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/typeinfer.py", line 195, in __call__
typeinfer.add_type(self.dst, ty, loc=self.loc)
File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/errors.py", line 499, in new_error_context
six.reraise(type(newerr), newerr, tb)
File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/numba/six.py", line 659, in reraise
raise value
numba.errors.InternalError:
[1] During: typing of argument at <ipython-input-50-566e4e12481d> (3)
--%<----------------------------------------------------------------------------
File "<ipython-input-50-566e4e12481d>", line 3:
def set_(z):
x = set(z.sum())
^
This error may have been caused by the following argument(s):
- argument 0: Unsupported array dtype: object
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile
If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new
Would anyone be able to help me in this regard.
Thanks & Best Regards
Michael

Related

TypeError while programming a discord bot

Ignoring exception in on_message
Traceback (most recent call last):
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/discord/client.py", line 343, in _run_event
await coro(*args, **kwargs)
File "main.py", line 162, in on_message
checo_fifa()
File "main.py", line 99, in checo_fifa
Debts['checo'] = Debts['checo']+1
TypeError: string indices must be integers
line 162 is:
if msg.startswith('$checo fifa'):
checo_fifa()
await message.channel.send('+1')
line 99 is:
def checo_fifa():
Debts['checo'] = Debts['checo']+1
Debts is a dictionary containing the key 'Checo' as well as a value of 0. There are some more usernames in the dictionary as well as functions pointing to all of them. Weirdly enough, the program worked for a while but now it only returns this when writing the command in discord.
Any help with this would be greatly appreciated.
You cannot add +1 to a string or a list, only to an integer. So you would have to convert this to a integer (if possible) e.g. with
def checo_fifa():
Debts['checo'] = int(Debts['checo']) + 1

invalid literal for int() with base 10 Keras pad sequence

Traceback (most recent call last):
File "C:/Users/Lenovo/PycharmProjects/ProjetFinal/venv/turkish.py", line 45, in <module>
sequences_matrix = sequence.pad_sequences(sequences,maxlen=max_len)
File "C:\Users\Lenovo\PycharmProjects\ProjetFinal\venv\lib\site-packages\keras_preprocessing\sequence.py", line 96, in pad_sequences
trunc = np.asarray(trunc, dtype=dtype)
File "C:\Users\Lenovo\PycharmProjects\ProjetFinal\venv\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: invalid literal for int() with base 10:
windows loading yapıo sonra mavi ekran iste. simdi repair yapıyo bkalım olmazsa
gelirim size.
X = dft.text
your straight using text with pad_sequence. this is how pad_sequence doesn't work
Generally, every text is converted into numbers within a given vocab which can be of characters or words depend upon task.
then it can be padded.
please refer some tutorial through you can understand how to process text first.
Tensorflow tutorials - text part are good to start.
https://www.tensorflow.org/guide/keras/masking_and_padding

Prgm_MAWS: "IndexError: list index out of range"

MAWS.py is a 4 years old program that uses Amber/AmberTools as calculation platform. Although I follow program user guide instructions, I do not know how to debug and find the solution to raise self._value IndexError: list index out of range
I replaced the force field(ff) specified in MAWS.py with a newer Amber ff.
As I am not familiar with python code and where the error could be generated I recommend to download the MAWS.py code from GitHub Repository: https://github.com/igemsoftware/Heidelberg_15
python3 MAWS_rev1.py -b 0.01 -i 200 -s 200 -l 15 -t 0.01 -f pdb -y HYBRID Prot1a.frcmod /home/bcramer/workdir-amber/Prot_1/
['DGN', 'DAN', 'DTN', 'DCN']
Choosing from candidates ...
Constructing Ligand/Aptamer complex ...
Constructing Ligand/Aptamer complex ...
etc..................
Loading Aptamer/Ligand complex ...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/bcramer/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/bcramer/miniconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "MAWS_rev1.py", line 1090, in initial
ligand_range = get_ligand_range(aptamer_top.topology)
File "MAWS_rev1.py", line 197, in get_ligand_range
return [get_ligand(topology)[0], len(get_ligand(topology))]
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "MAWS_rev1.py", line 1357, in <module>
positions_and_Ntides = loop()
File "MAWS_rev1.py", line 1237, in loop
pos_Nt_S_task = pool.map(initial, alphabet)
File "/home/bcramer/miniconda3/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/bcramer/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
IndexError: list index out of range

Py4JError: An error occurred while calling o90.fit

I want to apply random forest algorithm over a dataframe consisting of three columns namely JournalID, IndexedJournalID(Obtained using StringIndexer of Spark) and feature vector. I used below code to read the dataframe from parquet file and apply String Indexer over JournalID column to convert it to categorical type.
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import IndexToString, StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.sql.functions import udf
from pyspark.ml.linalg import Vectors
from pyspark.ml.linalg import VectorUDT
df=spark.read.parquet('JouID-UBTFIDFVectors-server22.parquet')
labelIndexer = StringIndexer(inputCol="journalid", outputCol="IndexedJournalID")
labelsDF=labelIndexer.fit(df)
df1=labelsDF.transform(df)
# This function converts sparse vectors to dense vectors....I applied this on raw features column to convert them to VectorUDT type.....
parse_ = udf(lambda l: Vectors.dense(l), VectorUDT())
df2 = df1.withColumn("featuresNew", parse_(df1["features"])).drop('features')
New Dataframe Schema(df2) is as follows:
root
|-- journalid: string (nullable = true)
|-- indexedLabel: double (nullable = false)
|-- featuresNew: vector (nullable = true)
Then I split df2 into training and test set and create object of random forest classifier as below:
(trainingData, testData) = df2.randomSplit([0.8, 0.2])
rf = RandomForestClassifier(labelCol="indexedLabel", featuresCol="featuresNew", numTrees=2 )
Finally apply fit() method over trainingData obtained above.
rfModel=rf.fit(trainingData)
With this I am able to train model on 100 instances of input dataframe. However,over whole training data, this line gives following error.
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 53652)
Traceback (most recent call last):
File "/data/sntps/code/conda3/lib/python3.6/socketserver.py", line 317, in _handle_request_noblock
self.process_request(request, client_address)
File "/data/sntps/code/conda3/lib/python3.6/socketserver.py", line 348, in process_request
self.finish_request(request, client_address)
File "/data/sntps/code/conda3/lib/python3.6/socketserver.py", line 361, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/data/sntps/code/conda3/lib/python3.6/socketserver.py", line 696, in __init__
self.handle()
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/pyspark/accumulators.py", line 235, in handle
num_updates = read_int(self.rfile)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/pyspark/serializers.py", line 685, in read_int
raise EOFError
EOFError
----------------------------------------
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:41060)
Traceback (most recent call last):
File "/data/sntps/code/conda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-10-46d7488961c7>", line 1, in <module>
rfModel=rf.fit(trainingData)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/pyspark/ml/base.py", line 132, in fit
return self._fit(dataset)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 288, in _fit
java_model = self._fit_java(dataset)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 285, in _fit_java
return self._java_obj.fit(dataset._jdf)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o90.fit
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/sntps/code/conda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 1828, in showtraceback
stb = value._render_traceback_()
AttributeError: 'Py4JError' object has no attribute '_render_traceback_'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/sp/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 929, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
.(traceback...not writing due to space issue)
.
.
Py4JError: An error occurred while calling o90.fit
This error is not very descriptive and hence it has become difficult for me to identify the where I am going wrong. Any help would help a lot.
Input Description:
Input Dataframe Contains 2696512 rows and each row's feature vector is of 262144 length.
After going through lot of related questions on stackoverflow , I thought this may be happening because of running this in jupyter-notebook. So later I ran it on commandline using spark-submit script and I am not getting this error anymore. I don't know though why this error is popping up if I run this in jupyter-notebook.

python 2/3 compatibility issue with exception

I wrote the following code that works with python3
try:
json.loads(text)
except json.decoder.JSONDecodeError:
(exception handling)
However, if I use python2, when json.loads throws the exception I get:
File "mycode.py", line xxx, in function
except json.decoder.JSONDecodeError:
AttributeError: 'module' object has no attribute 'JSONDecodeError'
And actually, https://docs.python.org/2/library/json.html doesn't mention any JSONDecodeError exception, while https://docs.python.org/3/library/json.html does.
How can I have the code running both with python 2 and 3?
In Python 2 json.loads raises ValueError:
Python 2.7.9 (default, Sep 17 2016, 20:26:04)
>>> json.loads('#$')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
You can try to use json.decoder.JSONDecodeError. If it fails you will know that you need to catch ValueError:
try:
json_parse_exception = json.decoder.JSONDecodeError
except AttributeError: # Python 2
json_parse_exception = ValueError
Then
try:
json.loads(text)
except json_parse_exception:
(exception handling)
Will work in either case.

Resources