Error when import Spark GaussianMixture - apache-spark

I get following error
object GaussianMixture is not a member of package org.apache.spark.ml.clustering
when I try to do following import from spark-shell
import org.apache.spark.ml.clustering.GaussianMixture
As this is part of Spark, I don't think any dependencies need to be added. Please help me with this issue.

I belive the GaussianMixture uses the mllib package. Try to import:
import org.apache.spark.mllib.clustering.GaussianMixture

Related

ModuleNotFoundError: No module named 'google.cloud.automl_v1beta1.proto'

I am trying to follow this tutorial on Google Cloud Platform,
https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/tables/census_income_prediction/getting_started_notebook.ipynb, however, I am running into issues when I try to import the autoML module, specifically the below two lines
# AutoML library.
from google.cloud import automl_v1beta1 as automl
import google.cloud.automl_v1beta1.proto.data_types_pb2 as data_types
The first line works, but for the 2nd one, I get the error: ModuleNotFoundError: No module named 'google.cloud.automl_v1beta1.proto'. It seems for some reason there is no module called proto and I cannot figure out how to resolve this. There are a couple of posts regarding the issue of not being able to find module google.cloud. In my case I am able to import automl_v1beta1 from google.cloud but not proto.data_types_pb2 from google.cloud.automl_v1beta1
I think you can:
from google.cloud import automl_v1beta1 as automl
import google.cloud.automl_v1beta1.types as data_types
Or:
import google.cloud.automl_v1beta1 as automl
import google.cloud.automl_v1beta1.types as data_types
But (!) given the import errors, there may be other changes to the SDK in the code that follows.

ImportError Faker Package Still

Trying to create dummy data from a dataset.
Looked for hours to see why I'm getting this ImportError. I have Faker 2.0.0 installed.
import unicodecsv as csv
from faker import Faker
from collections import defaultdict
ImportError: cannot import name 'Faker' from 'faker' (unknown
location)
Receiving this error message still! I tried using solutions from other forum questions, to avail. Anyone have suggestions?
I found it. If you are dealing with the same issue look in your Scripts folder within your Python directory. You'll find an application named faker also. Rename it and you'll be good to go.

Why the import "from tensorflow.train import Feature" doesn't work

That's probably totally noob question which has something to do with python module importing, but I can't understand why the following is valid:
> import tensorflow as tf
> f = tf.train.Feature()
> from tensorflow import train
> f = train.Feature()
But the following statement causes an error:
> from tensorflow.train import Feature
ModuleNotFoundError: No module named 'tensorflow.train'
Can please somebody explain me why it doesn't work this way? My goal is to use more short notation in the code like this:
> example = Example(
features=Features(feature={
'x1': Feature(float_list=FloatList(value=feature_x1.ravel())),
'x2': Feature(float_list=FloatList(value=feature_x2.ravel())),
'y': Feature(int64_list=Int64List(value=label))
})
)
tensorflow version is 1.7.0
Solution
Replace
from tensorflow.train import Feature
with
from tensorflow.core.example.feature_pb2 import Feature
Explanation
Remarks about TensorFlow's Aliases
In general, you have to remember that, for example:
from tensorflow import train
is actually an alias for
from tensorflow.python.training import training
You can easily check the real module name by printing the module. For the current example you will get:
from tensorflow import train
print (train)
<module 'tensorflow.python.training.training' from ....
Your Problem
In Tensorflow 1.7, you can't use from tensorflow.train import Feature, because the from clause needs an actual module name (and not an alias). Given train is an alias, you will get an ImportError.
By doing
from tensorflow import train
print (train.Feature)
<class 'tensorflow.core.example.feature_pb2.Feature'>
you'll get the complete path of train. Now, you can use the import path as shown above in the solution above.
Note
In TensorFlow 1.9.0, from tensorflow.train import Feature will work, because tensorflow.train is an actual package, which you can therefore import. (This is what I see in my installed Tensorflow 1.9.0, as well as in the documentation, but not in the Github repository. It must be generated somewhere.)
Info about the path of the modules
You can find the complete module path in the docs. Every module has a "Defined in" section. See image below (taken from Module: tf.train):
I would advise against importing Feature (or any other object) from the non-public API, which is inconvenient (you have to figure out where Feature is actually defined), verbose, and subject to change in future versions.
I would suggest as an alternative to simply define
import tensorflow as tf
Feature = tf.train.Feature

Spark deep learning Import error

I am trying to replicate a deep learning project from https://medium.com/linagora-engineering/making-image-classification-simple-with-spark-deep-learning-f654a8b876b8 . I am working on spark version 1.6.3. I have installed keras and tensorflow. But everytime i try to import from sparkdl it throws an error. I am working on Pyspark. When I run this:-
from sparkdl import readImages
I get this error:-
File "C:\Users\HP\AppData\Local\Temp\spark-802a2258-3089-4ad7-b8cb-
6815cbbb019a\userFiles-c9514201-07fa-45f9-9fd8-
c8a3a0b4bf70\databricks_spark-deep-learning-0.1.0-spark2.1-
s_2.11.jar\sparkdl\transformers\keras_image.py", line 20, in <module>
ImportError: cannot import name 'TypeConverters'
Can someone pls help?
Its not a full fix, as i have yet to be able to import things from sparkdl in jupyter notebooks aswell, but!
readImages is a function in pyspark.ml.image package
so to import it you need to:
from pyspark.ml.image import ImageSchema
to use it:
imagesDF = ImageSchema.readImages("/path/to/imageFolder")
This will give you a dataframe of the images, with column "image"
You can add a label column as such:
labledImageDF = imagesDF.withColumn("label", lit(0))
but remember to import functions from pyspark.sql to use lit function
from pyspark.sql.functions import *
Hope this at least partially helps

Pyspark reads csv - NameError: name 'spark' is not defined

I am trying to run the following code in databricks in order to call a spark session and use it to open a csv file:
spark
fireServiceCallsDF = spark.read.csv('/mnt/sf_open_data/fire_dept_calls_for_service/Fire_Department_Calls_for_Service.csv', header=True, inferSchema=True)
And I get the following error:
NameError:name 'spark' is not defined
Any idea what might be wrong?
I have also tried to run:
from pyspark.sql import SparkSession
But got the following in response:
ImportError: cannot import name SparkSession
If it helps, I am trying to follow the following example (you will understand better if you watch it from from 17:30 on):
https://www.youtube.com/watch?v=K14plpZgy_c&list=PLIxzgeMkSrQ-2Uizm4l0HjNSSy2NxgqjX
I got it worked by using the following imports:
from pyspark import SparkConf
from pyspark.context import SparkContext
from pyspark.sql import SparkSession, SQLContext
I got the idea by looking into the pyspark code as I found read csv was working in the interactive shell.
Please note the example code your are using is for Spark version 2.x
"spark" and "SparkSession" are not available on Spark 1.x. The error messages you are getting point to a possible version issue (Spark 1.x).
Check the Spark version you are using.

Resources