Spark deep learning Import error - apache-spark

I am trying to replicate a deep learning project from https://medium.com/linagora-engineering/making-image-classification-simple-with-spark-deep-learning-f654a8b876b8 . I am working on spark version 1.6.3. I have installed keras and tensorflow. But everytime i try to import from sparkdl it throws an error. I am working on Pyspark. When I run this:-
from sparkdl import readImages
I get this error:-
File "C:\Users\HP\AppData\Local\Temp\spark-802a2258-3089-4ad7-b8cb-
6815cbbb019a\userFiles-c9514201-07fa-45f9-9fd8-
c8a3a0b4bf70\databricks_spark-deep-learning-0.1.0-spark2.1-
s_2.11.jar\sparkdl\transformers\keras_image.py", line 20, in <module>
ImportError: cannot import name 'TypeConverters'
Can someone pls help?

Its not a full fix, as i have yet to be able to import things from sparkdl in jupyter notebooks aswell, but!
readImages is a function in pyspark.ml.image package
so to import it you need to:
from pyspark.ml.image import ImageSchema
to use it:
imagesDF = ImageSchema.readImages("/path/to/imageFolder")
This will give you a dataframe of the images, with column "image"
You can add a label column as such:
labledImageDF = imagesDF.withColumn("label", lit(0))
but remember to import functions from pyspark.sql to use lit function
from pyspark.sql.functions import *
Hope this at least partially helps

Related

getting error 'tensorflow.python.ops.rnn_cell_impl' has no attribute '_linear'

I tried the below line of code, but it is giving me the below error
y = rnn_cell_impl._linear(slot_inputs, attn_size, True)
AttributeError: module 'tensorflow.python.ops.rnn_cell_impl' has no attribute '_linear'
I am currently using Tensorflow version 2.10, I tried with all possible solutions by using
#from tensorflow.contrib.rnn.python.ops import core_rnn_cell
or
#from tensorflow.keras.layers import RNN
still no solution.
Can someone help me with the same?

ModuleNotFoundError: No module named 'google.cloud.automl_v1beta1.proto'

I am trying to follow this tutorial on Google Cloud Platform,
https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/tables/census_income_prediction/getting_started_notebook.ipynb, however, I am running into issues when I try to import the autoML module, specifically the below two lines
# AutoML library.
from google.cloud import automl_v1beta1 as automl
import google.cloud.automl_v1beta1.proto.data_types_pb2 as data_types
The first line works, but for the 2nd one, I get the error: ModuleNotFoundError: No module named 'google.cloud.automl_v1beta1.proto'. It seems for some reason there is no module called proto and I cannot figure out how to resolve this. There are a couple of posts regarding the issue of not being able to find module google.cloud. In my case I am able to import automl_v1beta1 from google.cloud but not proto.data_types_pb2 from google.cloud.automl_v1beta1
I think you can:
from google.cloud import automl_v1beta1 as automl
import google.cloud.automl_v1beta1.types as data_types
Or:
import google.cloud.automl_v1beta1 as automl
import google.cloud.automl_v1beta1.types as data_types
But (!) given the import errors, there may be other changes to the SDK in the code that follows.

can anyone explain why this code regarding plotly libraries worked well in jupyter notebook but showed an error when i ran it in intellij

I saw this video on youtube : using plotting a dataframe using iplot method imported from plotly.offline module.
I ran this code on intellij but got an error saying :
latin-1' codec can't encode characters in position 0-9: ordinal not in range(256)
i looked up for a solution but couldn't anything. Then i ran this code in jupyter notebook and it worked just fine
can anyone explain this.
source code -
import pandas as pd
import numpy as np
import chart_studio.plotly as py
from plotly.offline import *
import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()
df=pd.DataFrame(np.random.randn(50,4),columns=['a','b','c','d'])
df.iplot()

How can I solve cannot import name 'fetch_openml' from 'sklearn.datasets'

I'm learning sklearn, but I can't use fetch_openml(). It says,
ImportError: cannot import name 'fetch_openml' from 'sklearn.datasets'
In the new version of sklearn, it's even easier to fetch open ML Datasets. For example, you can add import and fetch mnist dataset as:
from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
print(X.shape, y.shape)
For more details check official example.
You can use this:
from sklearn.datasets import fetch_openml
Apparently, fetch_mldata has been deprecated in the newer sklearn. Use load_digits to achieve loading the MNIST data.
To solve this problem in jupyter follow these steps:
Download file mnist-original from " https://osf.io/jda6s/"
after download file copy it into C:\Users\YOURUSERNAME\scikit_learn_data\mldata
in notebook jupyter do:
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('mnist-original')

Error when import Spark GaussianMixture

I get following error
object GaussianMixture is not a member of package org.apache.spark.ml.clustering
when I try to do following import from spark-shell
import org.apache.spark.ml.clustering.GaussianMixture
As this is part of Spark, I don't think any dependencies need to be added. Please help me with this issue.
I belive the GaussianMixture uses the mllib package. Try to import:
import org.apache.spark.mllib.clustering.GaussianMixture

Resources