I'm doing a tutorial onusing keras and in the tutorial there is a funtion called timeseriesgenerator to split temporal data as follows:
enter image description here
But when importing timeseriesgenerator there is a warning that the function timeseriesgenerator will soon be deprecated and that it is advised to use tf.data.dataset instead. Question is how? I.e how do I rewrite the code
`for i in TimeseriesGenerator(range(10),range(10),length=3,batch_size=1):
print(i)``
`(array([[0, 1, 2]]), array([3]))
(array([[1, 2, 3]]), array([4]))
(array([[2, 3, 4]]), array([5]))
(array([[3, 4, 5]]), array([6]))
(array([[4, 5, 6]]), array([7]))
(array([[5, 6, 7]]), array([8]))
(array([[6, 7, 8]]), array([9]))`
using tf.data.datasets instead?
And why did google choose to make a rather good function obsolete and straight forward complex?
I tried this
` ds = Dataset.from_tensor_slices(range(10))
# Apply the desired transformations
ds = ds.window(3, shift=1, drop_remainder=True)
ds = ds.flat_map(lambda x: x.batch(3))
`
Iterate over the dataset and print the elements
for i in ds.batch(1): print(i)
but it produces tf.Tensor([[0 1 2]], shape=(1, 3), dtype=int32)
`tf.Tensor([[1 2 3]], shape=(1, 3), dtype=int32)
tf.Tensor([[2 3 4]], shape=(1, 3), dtype=int32)
tf.Tensor([[3 4 5]], shape=(1, 3), dtype=int32)
tf.Tensor([[4 5 6]], shape=(1, 3), dtype=int32)
tf.Tensor([[5 6 7]], shape=(1, 3), dtype=int32)
tf.Tensor([[6 7 8]], shape=(1, 3), dtype=int32)
tf.Tensor([[7 8 9]], shape=(1, 3), dtype=int32)
Which is obviously missing the last column.
Related
Example:
nums = [1,2,3,5,10,9,8,9,10,11,7,8,7]
I am trying to find the first index of numbers in consecutive runs of -1 or 1 direction where the runs are >= 3.
So the desired output from the above nums would be:
[0,4,6,7]
I have tried
grplist = [list(group) for group in more_itertools.consecutive_groups(A)]
output: [[1, 2, 3], [5], [10], [9], [8, 9, 10, 11], [7, 8], [7]]
It returns nested lists but does not but that only seems to go in +1 direction. And it does not return the starting index.
listindx = [list(j) for i, j in groupby(enumerate(A), key=itemgetter(1))]
output: [[(0, 1)], [(1, 2)], [(2, 3)], [(3, 5)], [(4, 10)], [(5, 9)], [(6, 8)], [(7, 9)], [(8, 10)], [(9, 11)], [(10, 7)], [(11, 8)], [(12, 7)]]
This does not check for consecutive runs but it does return indices.
I'm working on a ML project for which I'm using numpy arrays instead of pandas for faster computation.
When I intend to bootstrap, I wish to subset the columns from a numpy ndarray.
My numpy array looks like this:
np_arr =
[(187., 14.45 , 20.22, 94.49)
(284., 10.44 , 15.46, 66.62)
(415., 11.13 , 22.44, 71.49)]
And I want to index columns 1,3.
I have my columns stored in a list as ix = [1,3]
However, when I try to do np_arr[:,ix] I get an error saying too many indices for array .
I also realised that when I print np_arr.shape I only get (3,), whereas I probably want (3,4).
Could you please tell me how to fix my issue.
Thanks!
Edit:
I'm creating my numpy object from my pandas dataframe like this:
def _to_numpy(self, data):
v = data.reset_index()
np_res = np.rec.fromrecords(v, names=v.columns.tolist())
return(np_res)
The reason here for your issue is that the np_arr which you have is a 1-D array. Share your code snippet as well so that it can be looked into as in what is the exact issue. But in general, while dealing with 2-D numpy arrays, we generally do this.
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
You have created a record array (also called a structured array). The result is a 1d array with named columns (fields).
To illustrate:
In [426]: df = pd.DataFrame(np.arange(12).reshape(4,3), columns=['A','B','C'])
In [427]: df
Out[427]:
A B C
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
In [428]: arr = df.to_records()
In [429]: arr
Out[429]:
rec.array([(0, 0, 1, 2), (1, 3, 4, 5), (2, 6, 7, 8), (3, 9, 10, 11)],
dtype=[('index', '<i8'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8')])
In [430]: arr['A']
Out[430]: array([0, 3, 6, 9])
In [431]: arr.shape
Out[431]: (4,)
I believe to_records has a parameter to eliminate the index field.
Or with your method:
In [432]:
In [432]: arr = np.rec.fromrecords(df, names=df.columns.tolist())
In [433]: arr
Out[433]:
rec.array([(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11)],
dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<i8')])
In [434]: arr['A'] # arr.A also works
Out[434]: array([0, 3, 6, 9])
In [435]: arr.shape
Out[435]: (4,)
And multifield access:
In [436]: arr[['A','C']]
Out[436]:
rec.array([(0, 2), (3, 5), (6, 8), (9, 11)],
dtype={'names':['A','C'], 'formats':['<i8','<i8'], 'offsets':[0,16], 'itemsize':24})
Note that the str display of this array
In [437]: print(arr)
[(0, 1, 2) (3, 4, 5) (6, 7, 8) (9, 10, 11)]
shows a list of tuples, just as your np_arr. Each tuple is a 'record'. The repr display shows the dtype as well.
You can't have it both ways, either access columns by name, or make a regular numpy array and access columns by number. The named/record access makes most sense when columns are a mix of dtypes - string, int, float. If they are all float, and you want to do calculations across columns, its better to use the numeric dtype.
In [438]: arr = df.to_numpy()
In [439]: arr
Out[439]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
I am trying to use NCHW ie channel first data format in my cpu. It is a maxpool layer as a part of Resnet18.
MaxPooling2D(pool_size=[3, 3], strides=2, padding='same', data_format='channels_first')
And the error i am getting is:
InvalidArgumentError (see above for traceback): Default MaxPoolingOp only supports NHWC on device type CPU
[[Node: max_pooling2d_3/MaxPool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="SAME", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_normalization_51/cond/Merge)]]
Is there a way to fix this? I have also tried data_format="NCHW" but it gave the same error.
Can you please try with a simple model to debug the issue? This works on my system with CPU.
model = Sequential()
model.add(MaxPooling2D(pool_size=[3, 3], strides=2, padding='same',
data_format='channels_first', input_shape=(3,224,224)))
model.summary()
X = np.random.randn(1,3,224,224)
Y = model.predict(X)
print(Y.shape)
(1, 3, 112, 112)
pip install intel-tensorflow
solved the problem, but the training seems to be slower than before.
I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.
The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.
Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112
I am having small problem dealing with python spark rdd. My rdd looks like
old_rdd = [( A1, Vector(V1)), (A2, Vector(V2)), (A3, Vector(V3)), ....].
I want to use flatMap, so as to get new rdd like:
new_rdd = [((A1, A2), (V1, V2)), ((A1, A3), (V1, V3))] and so on.
The problem is flatMap removed tuple like [(A1, V1, A2, V2)...]. Do you have any alternative suggestions with or without flatMap(). Thank you in advance.
It is related to Explicit sort in Cartesian transformation in Scala Spark. However, I will suppose that you already cleaned up the RDD for duplicates, and I will assume that the ids have some simple pattern to parse and then identify, and for simplicity I will think on Lists instead of Vectors
old_rdd = sc.parallelize([(1, [1, -2]), (2, [5, 7]), (3, [8, 23]), (4, [-1, 90])])
# It will provide all the permutations, but combinations are a subset of the permutations, so we need to filter.
combined_rdd = old_rdd.cartesian(old_
combinations = combined_rdd.filter(lambda (s1, s2): s1[0] < s2[0])
combinations.collect()
# The output will be...
# -----------------------------
# [((1, [1, -2]), (2, [5, 7])),
# ((1, [1, -2]), (3, [8, 23])),
# ((1, [1, -2]), (4, [-1, 90])),
# ((2, [5, 7]), (3, [8, 23])),
# ((2, [5, 7]), (4, [-1, 90])),
# ((3, [8, 23]), (4, [-1, 90]))]
# Now we need to set the tuple as you want
combinations = combinations.map(lambda (s1, s1): ((s1[0], s2[0]), (s1[1], s2[1]))).collect()
# The output will be...
# ----------------------
# [((1, 2), ([1, -2], [5, 7])),
# ((1, 3), ([1, -2], [8, 23])),
# ((1, 4), ([1, -2], [-1, 90])),
# ((2, 3), ([5, 7], [8, 23])),
# ((2, 4), ([5, 7], [-1, 90])),
# ((3, 4), ([8, 23], [-1, 90]))]