how to use map with tuples in a tensorflow 2 dataset? - python-3.x

trying to map a tuple to a tuple in a dataset in tf 2 (please see code below). my output (please see below) shows that the map function is only called once. and i can not seem to get at the tuple.
how do i get at the "a","b","c" from the input parameter which is a:
tuple Tensor("args_0:0", shape=(3,), dtype=string)
type <class 'tensorflow.python.framework.ops.Tensor'>
edit: it seems like using Dataset.from_tensor_slices produces the data all at once. this explcains why map is only called once. so i probably need to make the dataset in some other way.
from __future__ import absolute_import, division, print_function, unicode_literals
from timeit import default_timer as timer
print('import tensorflow')
start = timer()
import tensorflow as tf
end = timer()
print('Elapsed time: ' + str(end - start),"for",tf.__version__)
import numpy as np
def map1(tuple):
print("<<<")
print("tuple",tuple)
print("type",type(tuple))
print("shape",tuple.shape)
print("tuple 0",tuple[0])
print("type 0",type(tuple[0]))
print("shape 0",tuple.shape[0])
# how do i get "a","b","c" from the input parameter?
print(">>>")
return ("1","2","3")
l=[]
l.append(("a","b","c"))
l.append(("d","e","f"))
print(l)
ds=tf.data.Dataset.from_tensor_slices(l)
print("ds",ds)
print("start mapping")
result = ds.map(map1)
print("end mapping")
$ py mapds.py
import tensorflow
Elapsed time: 12.002168990751619 for 2.0.0
[('a', 'b', 'c'), ('d', 'e', 'f')]
ds <TensorSliceDataset shapes: (3,), types: tf.string>
start mapping
<<<
tuple Tensor("args_0:0", shape=(3,), dtype=string)
type <class 'tensorflow.python.framework.ops.Tensor'>
shape (3,)
tuple 0 Tensor("strided_slice:0", shape=(), dtype=string)
type 0 <class 'tensorflow.python.framework.ops.Tensor'>
shape 0 3
>>>
end mapping

The value or values returned by map function (map1) determine the structure of each element in the returned dataset. [Ref]
In your case, result is a tf dataset and there is nothing wrong in your coding.
To check whether every touple is mapped correctly you can traverse every sample of your dataset like follows:
[Updated Code]
def map1(tuple):
print(tuple[0].numpy().decode("utf-8")) # Print first element of tuple
return ("1","2","3")
l=[]
l.append(("a","b","c"))
l.append(("d","e","f"))
ds=tf.data.Dataset.from_tensor_slices(l)
ds = ds.map(lambda tpl: tf.py_function(map1, [tpl], [tf.string, tf.string, tf.string]))
for sample in ds:
print(str(sample[0].numpy().decode()), sample[1].numpy().decode(), sample[2].numpy().decode())
Output:
a
1 2 3
d
1 2 3
Hope it will help.

Related

How to add type hints to scikit-learn functions?

I have the following simple function:
def f1(y_true, y_pred):
return {"f1": 100 * sklearn.metrics.f1_score(y_true, y_pred)}
According to the scikit-learn documentation, the arguments to f1_score can have the following types:
y_true: 1d array-like, or label indicator array / sparse matrix
y_pred: 1d array-like, or label indicator array / sparse matrix
and the output is of type:
float or array of float, shape = [n_unique_labels]
How do I add type hints to this function so that mypy doesn't complain?
I tried variations of the following:
Array1D = NewType('Array1D', Union[np.ndarray, List[np.float64]])
def f1(y_true: Union[List[float], Array1D], y_pred: Union[List[float], Array1D]) -> Dict[str, Union[List[float], Array1D]]:
return {"f1": 100 * sklearn.metrics.f1_score(y_true, y_pred)}
but that gave errors.
This is the approach I use to avoid similar mypy issues. It takes advantage of numpy typing introduced in 1.20. The ArrayLike type covers List[float], so no need to worry about covering it explicitly.
Running mypy v0.971 with numpy v1.23.1 on this shows no issues.
from typing import List, Dict
import numpy as np
import numpy.typing as npt
import sklearn.metrics
def f1(y_true: npt.ArrayLike, y_pred: npt.ArrayLike) -> Dict[str, npt.ArrayLike]:
return {"f1": 100 * sklearn.metrics.f1_score(y_true, y_pred)}
y_true_list: List[float] = [1, 0, 1, 0]
y_pred_list: List[float] = [1, 0, 1, 1]
y_true_np: npt.ArrayLike = np.array(y_true_list)
y_pred_np: npt.ArrayLike = np.array(y_pred_list)
assert f1(y_true_list, y_pred_list) == f1(y_true_np, y_pred_np)
Instead of
Array1D = NewType("Array1D", Union[np.ndarray, List[np.float64]])
you may use
Array1D = Union[np.ndarray, List[np.float64]]

Cannot create a numpy array using numpy's `full()` method and a python list

I can create a numpy array from a python list as follows:
>>> a = [1,2,3]
>>> b = np.array(a).reshape(3,1)
>>> print(b)
[[1]
[2]
[3]]
However, I don't know what causes error in the following code:
Code :
>>> a = [1,2,3]
>>> b = np.full((3,1), a)
Error :
ValueError Traceback (most recent call last)
<ipython-input-275-1ab6c109dda4> in <module>()
1 a = [1,2,3]
----> 2 b = np.full((3,1), a)
3 print(b)
/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order)
324 dtype = array(fill_value).dtype
325 a = empty(shape, dtype, order)
--> 326 multiarray.copyto(a, fill_value, casting='unsafe')
327 return a
328
<__array_function__ internals> in copyto(*args, **kwargs)
ValueError: could not broadcast input array from shape (3) into shape (3,1)
Even though the list a has 3 elements inside it and I expect a 3x1 numpy array, the full() method fails to deliver it.
I referred the broadcasting article of numpy too. However, they are much more focused towards the arithmetic operation perspective, hence I couldn't obtain anything useful from there.
So it would be great if you can help me to understand the difference in b/w. the above mentioned array creation methods and the cause of the error too.
Numpy is unable to broadcast the two shapes together because your list is interpreted as a 'row vector' (np.array(a).shape = (3,)) while you are asking for a 'column vector' (shape = (3, 1)). If you are set on using np.full, then you can shape your list as a column vector initially:
>>> import numpy as np
>>>
>>> a = [[1],[2],[3]]
>>> b = np.full((3,1), a)
Another option is to convert a into a numpy array ahead of time and add a new axis to match the desired output shape.
>>> a = [1,2,3]
>>> a = np.array(a)[:, np.newaxis]
>>> b = np.full((3,1), a)

Can't reshape my numpy array for training a KNN model

I try to train a KNN model using a Local Binary Pattern (LBP) descriptor.
My data is a numpy.array of shape (67, 26) elements, but myaray.shape returns (67, ).
I tried to reshape the array like:
myarray.reshape(-1, 26)
but it resulted in the following error:
ValueError: cannot reshape array of size 67 into shape (26)**
Thanks you so much
As I'm not sure I've clearly understood your question, first I'm going to try to mock up your data:
In [101]: import numpy as np
In [102]: myarray = np.empty(shape=67, dtype=object)
In [103]: for i in range(len(myarray)):
...: myarray[i] = np.random.rand(26)
Please, run the following code:
In [104]: type(myarray)
Out[104]: numpy.ndarray
In [105]: myarray.shape
Out[105]: (67,)
In [106]: myarray.dtype
Out[106]: dtype('O')
In [107]: type(myarray[0])
Out[107]: numpy.ndarray
In [108]: myarray[0].shape
Out[108]: (26,)
If you get the same results as above, numpy.stack should do the trick as pointed out by #hpaulj in the comments:
In [109]: x = np.stack(myarray)
In [110]: type(x)
Out[110]: numpy.ndarray
In [111]: x.shape
Out[111]: (67, 26)

Semantic similarity to compare two columns in data frames using sklearn

i face an issue to pass a function to compare between two column
import nltk, string
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(tokenizer=normalize, stop_words='english')
def cosine_sim1(text1, text2):
tfidf = vectorizer.fit_transform([text1, text2])
return ((tfidf * tfidf.T).A)[0,1]
after i apply the function
cosine_sim1('like football', 'football')
The results is:
0.5797386715376657
I face a little issue to pass that function between two column in dataframe to calculate the score. here is a small sample of the data
d = pd.DataFrame({'A': ['my name is', 'i live in', 'i like football'], 'B': ['london is nice city', 'london city', 'football']})
i have tried to do like that. However there are some errors appears.
def cosine_sim1(text1, text2):
tfidf = vectorizer.fit_transform([text1(d['A']), text2(d['B'])])
return ((tfidf * tfidf.T).A)[0,1]
d.apply(cosine_sim1, axis=1)
The error is:
TypeError: ("cosine_sim1() missing 1 required positional argument: 'text2'", 'occurred at index 0')
I believe it should be
def cosine_sim1(text1, text2):
tfidf = vectorizer.fit_transform([text1, text2])
return ((tfidf * tfidf.T).A)[0,1]
d.apply(lambda x: cosine_sim1(x.A, x.B), axis=1)
You are applying function to DataFrame but you are not passing the parameters that you have defined.

numpy code works in REPL, script says type error

Copy and pasting this code into the python3 REPL works, but when I run it as a script, I get a type error.
"""Softmax."""
scores = [3.0, 1.0, 0.2]
import numpy as np
from math import e
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
results = []
x = np.transpose(x)
for j in range(len(x)):
exps = [np.exp(s) for s in x[j]]
_sum = np.sum(np.exp(x[j]))
softmax = [i / _sum for i in exps]
results.append(softmax)
final = np.vstack(results)
return np.transpose(final)
# pass # TODO: Compute and return softmax(x)
print(softmax(scores))
# Plot softmax curves
import matplotlib.pyplot as plt
x = np.arange(-2.0, 6.0, 0.1)
scores = np.vstack([x, np.ones_like(x), 0.2 * np.ones_like(x)])
plt.plot(x, softmax(scores).T, linewidth=2)
plt.show()
The error I get running the script via CLI is the following:
bash$ python3 softmax.py
Traceback (most recent call last):
File "softmax.py", line 22, in <module>
print(softmax(scores))
File "softmax.py", line 13, in softmax
exps = [np.exp(s) for s in x[j]]
TypeError: 'numpy.float64' object is not iterable
This kind of crap makes me so nervous about running interpreted code in production with libraries like these, seriously unreliable and undefined behaviour is totally unacceptable IMO.
At the top of your script, you define
scores = [3.0, 1.0, 0.2]
This is the argument in your first call of softmax(scores). When converted to a numpy array, scores is 1-d array with shape (3,).
You pass scores into the function, and then it is converted to a numpy array by the call
x = np.transpose(x)
However, it is still 1-d, with shape (3,). The transpose function swaps dimensions, but it does not add a dimension to a 1-d array. In effect, transpose is a "no-op" when applied to a 1-d array.
Then, in the loop that follows, x[j] is a scalar of type numpy.float64, so it does not make sense to write [np.exp(s) for s in x[j]]. x[j] is a scalar, not a sequence, so you can't iterate over it.
In the bottom part of your script, you redefine scores as
x = np.arange(-2.0, 6.0, 0.1)
scores = np.vstack([x, np.ones_like(x), 0.2 * np.ones_like(x)])
Now scores is 2-d array (scores.shape is (3, 80)), so you don't get an error when you call softmax(scores).

Resources