Passing a python list to keras model.fit - python-3.x

So right now I'm using keras and training my model works perfectly fine, but I have to pass my data as numpy ndarray. So I have to convert my list of data to numpy ndarray first and then pass it to keras for training. When I try to pass my python list/array, even tho it's the same shape as numpy array I get back errors. Is there any way to not use numpy for this or am I stuck with it?

Can you further explain your problem. What is the error message you are getting and are you getting this error during training or predicting?
Also if you could post some code samples that would help to

Related

Python XGBoost prediction discrepancies with DMatrix

I found there are 2 problems with xbgoost predictions. I trained the model with XGBClassifier and tried to load the model using Booster for prediction, I found
Predictions are slightly different using xbg.Booster and xgb.Classifier, see below.
Predictions are different between list and numpy array when using DMatrix, see below,
Some difference is quite big, I am not sure why this is happening and which prediction should be the source of truth?
For the second question, your data types could change when you convert a list to a numpy array (depending on the numpy version you're using). For example on numpy 1.19.5, try converting list ["1",1] to a numpy array and see the result.

Numpy arrays used in training in TF1--Keras have much lower accuracy in TF2

I had a neural net in keras that performed well. Now with the deprecation that came with Tensorflow 2 I had to rewrite the model. Now it is giving me worse accuracy metrics.
My suspicion is that tf2 wants you to use their data structure to train models and they give a example of how to go from Numpy to tf.data.Dataset here.
So I did:
train_dataset = tf.data.Dataset.from_tensor_slices((X_train_deleted_nans, y_train_no_nans))
train_dataset = train_dataset.shuffle(SHUFFLE_CONST).batch(BATCH_SIZE)
Once the training starts I get this warning error:
2019-10-04 23:47:56.691434: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
Appending .repeat() to the creation of my tf.data.Dataset solved my error. Like suggested by duysqubix in his eloquent solution posted here:
https://github.com/tensorflow/tensorflow/issues/32817#issuecomment-539200561

How to use keras with tensorboard but not through callbacks?

Wand to use keras.backend.get_session() to log scalars and images to tensorboard. Session closes with I do this and keras errors out. I think there must be some way of getting the keras graph and doing things in it without breaking it.

Loading dataset in UINT8 format- python

I was actually looking through the "load_data()" function in python that returns X_train, X_test, Y_train and Y_test as in this link. As you see it is for CIFAR10 and CIFAR100 dataset, that returns the above mentioned values as uint8 array.
I wanted to know is there some other function like this for loading datasets in our system locally ?
If so please help me with its usage and if not please suggest me some other alternative.
Thanks in advance.
load_data() is not a part of python but rather is defined in keras.datasets.cifar10 module. To load cifar dataset (or any other dataset), there might be many methods depending upon how the dataset in packaged/formatted. Usually, the module pandas can be used for loading/saving/manipulating table-like data.
For cifar data, here is another example: loading an image from cifar-10 dataset
Here the author is using the pickle module to unpack the dataset and then PIL and numpy modules to load and manipulate indivdual images.

Training Keras model with Dask Array is very slow

I want to use Dask to read a large dataset and feed with it a Keras model. The data consists of audio files and I am using a custom function to read them. I have tried to apply delayed to this function and I collect all of the files in a dask array, as:
x = da.stack([da.from_delayed(delayed(get_item_data)(fp, sr, mono, post_processing, data_shape), shape=data_shape, dtype=np.float32) for fp in df['path']])
(See the source)
To train the Keras model, I compute X and Y as above and I input them to the function fit.
However, the training is very slow. I have tried to change the chunksizeand it is still very slow.
Could you tell me if I am doing something wrong when creating the array? Or any good practices for it?
Thanks
As far as I know Keras doesn't have any built-in support for Dask.arrays. So I'm not sure what will happen when you provide a dask.array directly to Keras functions. My guess is that it will automatically convert the dask.array into a (possibly very large) numpy array.

Resources