Could not parse Master URL: 'spark.bluemix.net' - apache-spark

I'm trying to connect to IBM's Spark as a Service running on Bluemix from RStudio running on my desktop machine.
I have copied the config.yml from the automatically configured RStudio environment running on IBM's Data Science Experience:
default:
method: "shell"
CS-DSX:
method: "bluemix"
spark.master: "spark.bluemix.net"
spark.instance.id: "myinstanceid"
tenant.id: "mytenantid"
tenant.secret: "mytenantsecret"
hsui.url: "https://cdsx.ng.bluemix.net"
I am attempting to connect like so:
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "1.6.2") # installed spark to '~/Library/Caches/spark/spark-1.6.2-bin-hadoop2.6'
spark_home = '~/Library/Caches/spark/spark-1.6.2-bin-hadoop2.6'
config = spark_config(file = "./config.yml", use_default = FALSE, config = "CSX-DSX")
sc <- spark_connect(spark_home = spark_home, config = config)
The error:
17/03/07 09:36:19 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: 'spark.bluemix.net'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2735)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:522)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2281)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
There are a few other questions on stackoverflow with similar error messages, but they are not trying to connect to the Spark service running on Bluemix.
Update 1
I've changed my config.yml to look like this:
default:
method: "bluemix"
spark.master: "spark://spark.bluemix.net:7070"
spark.instance.id: "7a4089bf-3594-4fdf-8dd1-7e9fd7607be5"
tenant.id: "sdd1-7e9fd7607be53e-39ca506ba762"
tenant.secret: "6146a713-949f-4d4e-84c3-9913d2165b9e"
hsui.url: "https://cdsx.ng.bluemix.net"
... and my connection code to look like this:
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "1.6.2")
spark_home = '~/Library/Caches/spark/spark-1.6.2-bin-hadoop2.6'
config = spark_config(file = "./config.yml", use_default = FALSE)
sc <- spark_connect(spark_home = spark_home, config = config)
However, the error is now:
Error in force(code) :
Failed during initialize_connection: java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:583)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2281)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sparklyr.Invoke$.invoke(invoke.scala:94)
...

The library tries to parse a URL, but you're giving it a hostname.
Try spark://spark.bluemix.net for spark.master.

Please follow the blog post http://datascience.ibm.com/blog/access-ibm-analytics-for-apache-spark-from-rstudio/ to connect Bluemix SparkaaS from DSX RStudio.

I received the following response from the engineering team:
RStudio desktop version doesn't support at this time to use sparklyr package to connect Bluemix SparkaaS service

Related

A request was made to load the default HttpClient provider but one could not be found on the classpath

I'm trying to remove some blobs from Azure storage blob using azure-storage-blob lib, my app is deployed in databricks as a spark job. In addition, my code worked correctly on my local machine
I have the error bellow :
IllegalStateException: A request was made to load the default HttpClient provider but one could not be found on the classpath. If you are using a dependency manager, consider including a dependency on azure-core-http-netty or azure-core-http-okhttp. Depending on your existing dependencies, you have the choice of Netty or OkHttp implementations. Additionally, refer to https://aka.ms/azsdk/java/docs/custom-httpclient to learn about writing your own implementation
My code :
val accountName: String = spark.conf.get("AZURE_BLOB_STORAGE_ACCOUNT_NAME")
val accountKey: String = spark.conf.get(s"fs.azure.account.key.$accountName.blob.core.windows.net")
val endpoint = "https://" + accountName + ".blob.core.windows.net"
val credential = new StorageSharedKeyCredential(accountName, accountKey)
val client = new BlobServiceClientBuilder().endpoint(endpoint).credential(credential).buildClient
val containerClient = client.getBlobContainerClient(containerName)
containerClient
.listBlobsByHierarchy(s"$folderName/")
.forEach(blob =>
containerClient
.getBlobClient(blob.getName)
.deleteIfExists()
)
Any idea to resolve this problem?
Thank you
You probably have following merge strategy configured in build.sbt:
case PathList("META-INF", xs # _*) => MergeStrategy.discard
Please check if META-INF/services directory is available in your jar.

Databricks Spark connection issue over Simba JDBC

I am trying to connect Spark Databricks from PERL code over Simba JDBC (Databricks recommended way) .For ref this is the JDBC driver: https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/jdbc/2.6.17/SimbaSparkJDBC42-2.6.17.1021.zip
So far I managed to setup PERL and all PERL related module config and below issue is nothing to do with PERL which I strongly believe.
I have below code trying to connect Spark Databricks.
Note : 'replaceme' in the password is databricks personaL ACCESS TOKEN.
#!/usr/bin/perl
use strict;
use DBI;
my $user = "token";
my $pass = "replaceme";
my $host = "DBhost.azuredatabricks.net";
my $port = 9001;
my $url = "jdbc:spark://DBhost.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/853imaskedthis14/1005-imaskedthis-okra138;AuthMech=3;UID=token;PWD=replaceme"; # Get this URL from JDBC data src
my %properties = ('user' => $user,
'password' => $pass,
'host.name' => $host,
'host.port' => $port);
my $dsn = "dbi:JDBC:hostname=localhost;port=$port;url=$url";
my $dbh = DBI->connect($dsn, undef, undef,
{ PrintError => 0, RaiseError => 1, jdbc_properties => \%properties })
or die "Failed to connect: ($DBI::err) $DBI::errstr\n";
my $sql = qq/select * from table/;
my $sth = $dbh->prepare($sql);
$sth->execute();
my #row;
while (#row = $sth->fetchrow_array) {
print join(", ", #row), "\n";
}
I am ending up below issue and error with SIMBA driver connecting to SPARK THRIFT server as Authentication issue.
failed: [Simba][SparkJDBCDriver](500164) Error initialized or created transport for authentication: Invalid status 21
Also, could not send response: com.simba.spark.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed). at ./perldatabricksconntest.pl line 18.
The logger recorded below Java stack trace:
[Thread-1] 05:40:16,718 WARN - Error
java.sql.SQLException: [Simba][SparkJDBCDriver](500164) Error initialized or created transport for authentication: Invalid status 21
Also, could not send response: com.simba.spark.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed).
at com.simba.spark.hivecommon.api.HiveServer2ClientFactory.createTransport(Unknown Source)
at com.simba.spark.hivecommon.api.ServiceDiscoveryFactory.createClient(Unknown Source)
at com.simba.spark.hivecommon.core.HiveJDBCCommonConnection.establishConnection(Unknown Source)
at com.simba.spark.spark.core.SparkJDBCConnection.establishConnection(Unknown Source)
at com.simba.spark.jdbc.core.LoginTimeoutConnection.connect(Unknown Source)
at com.simba.spark.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
at com.simba.spark.jdbc.common.AbstractDriver.connect(Unknown Source)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at com.vizdom.dbd.jdbc.Connection.handleRequest(Connection.java:417)
at com.vizdom.dbd.jdbc.Connection.run(Connection.java:211)
Caused by: com.simba.spark.support.exceptions.GeneralException: [Simba][SparkJDBCDriver](500164) Error initialized or created transport for authentication: Invalid status 21
Also, could not send response: com.simba.spark.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed).
... 11 more
Also as per SIMBA JDBC connector document I have tried NO authentication mode, Username , Username / Password none of them working .
So wonder where is the Authentication issue here in transport layer . Note I already have created token and mentioned that in password section while initiating jdbc:spark call .
You need to generate personal access token and put it instead of the replaceme string in the JDBC url? After that you don't need to specify user & password fields in the %properties.

UnknownHostException for https url

I'm doing
HttpsURLConnection conn = (HttpsURLConnection) new URL("https", "www.sec.gov", 443, "/cgi-bin/browse-edgar?action=getcurrent&CIK=&type=SC%2013D&company=&dateb=&owner=include&start=0&count=40&output=atom").openConnection();
InputStream stream = conn.getInputStream();
but it fails with
java.net.UnknownHostException: www.sec.gov
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:289)
at java.base/sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:182)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
at java.base/sun.net.www.protocol.https.HttpsClient.(HttpsClient.java:265)
at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:372)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1071)
at java.base/sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1069)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:795)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1068)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
at java.base/sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1512)
at java.base/sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1510)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:795)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1509)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
I'm successfully able to ping the host www.sec.gov or curl the url. Why is my Java program alone not working? Please help.
It is working today without doing any change. I suspect a host name DNS timeout may be the reason it didn't work before.

Can not connect Apache Ignite on Azure Kuberntes from .net core app

I am new to Ignite and Kubernetes. I have a.Net Core 3.1 web application which is hosted Azure Linux App Service.
I followed the instructions (Apache Ignite Instructions Offical Site) and Apache Ignite could run on Azure Kubernetes. I could create a sample table and read-write actions worked successfully. Here is the screenshot of my success tests on PowerShell.
Please see the success test
Now, I try to connect Apache Ignite from my .net core web app but I couldn't make it.
My code is as below. I try to connect with IgniteConfiguration and SpringCfgXml, but both of them getting error.
private void Initialize()
{
var cfg = GetIgniteConfiguration();
_ignite = Ignition.Start(cfg);
InitializeCaches();
}
public IgniteConfiguration GetIgniteConfiguration()
{
var appSettingsJson = AppSettingsJson.GetAppSettings();
var igniteNodes = appSettingsJson["AppSettings:IgniteNodes"];
var nodeList = igniteNodes.Split(",");
var config = new IgniteConfiguration
{
Logger = new IgniteLogger(),
DiscoverySpi = new TcpDiscoverySpi
{
IpFinder = new TcpDiscoveryStaticIpFinder
{
Endpoints = nodeList
},
SocketTimeout = TimeSpan.FromSeconds(5)
},
IncludedEventTypes = EventType.CacheAll,
CacheConfiguration = GetCacheConfiguration()
};
return config;
}
The first error I get:
Apache.Ignite.Core.Common.IgniteException HResult=0x80131500
Message=Java class is not found (did you set IGNITE_HOME environment
variable?):
org/apache/ignite/internal/processors/platform/PlatformIgnition
Source=Apache.Ignite.Core
Also, I have no idea what I am gonna set for IGNITE_HOME, and username and secret to authentication.
Solution :
I finally connect the Ignite on Azure Kubernetes.
Here is my connection method.
public void TestConnection()
{
var cfg = new IgniteClientConfiguration
{
Host = "MyHost",
Port = 10800,
UserName = "user",
Password = "password"
};
using (IIgniteClient client = Ignition.StartClient(cfg))
{
var employeeCache1 = client.GetOrCreateCache<int, Employee>(
new CacheClientConfiguration(EmployeeCacheName, typeof(Employee)));
employeeCache1.Put(1, new Employee("Bilge Wilson", 12500, 1));
}
}
To find to host IP, user name and client secret please check the below images.
Client Id and Secret
IP Addresses
Note: I don't need to set any IGNITE_HOME ana JAVA_HOME variable.
The simplest way is to download Apache Ignite binary distribution (of the same version as one that you use), unzip it to a directory, and point IGNITE_HOME environment variable or IgniteConfiguration.IgniteHome configuration property to unzipped apache-ignite-n.n.n-bin/ directory absolute path.
We support doing that automatically for Windows-hosted apps but not for Linux-based deployments.

Connecting to bigquery with spark on local

I am trying to a run a code to read from bigquery do some transformation using spark. Getting the below error
Exception in thread "main" java.io.IOException: Error accessing: bucket: test-n, object: spark/output/wordcount
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.wrapException(GoogleCloudStorageImpl.java:1707)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1733)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1618)
at com.google.cloud.hadoop.gcsio.ForwardingGoogleCloudStorage.getItemInfo(ForwardingGoogleCloudStorage.java:214)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1094)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1422)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at com.spark.bigquery.App.main(App.java:67)
Caused by: com.google.api.client.auth.oauth2.TokenResponseException: 400 Bad Request
{
"error" : "invalid_grant",
"error_description" : "Robot is missing a project number."
}
I have done the configuration of service account and email in the code. Placed the P12 file also.
conf.set("fs.gs.project.id", projectId);
// Use service account for authentication. The service account key file is located at the path
// specified by the configuration property google.cloud.auth.service.account.json.keyfile.
conf.set(EntriesCredentialConfiguration.BASE_KEY_PREFIX +
EntriesCredentialConfiguration.ENABLE_SERVICE_ACCOUNTS_SUFFIX,
"true");
conf.set(EntriesCredentialConfiguration.BASE_KEY_PREFIX +
EntriesCredentialConfiguration.SERVICE_ACCOUNT_KEYFILE_SUFFIX,
"aesthetic-genre-216711-3a23f8112565.p12");
conf.set(EntriesCredentialConfiguration.BASE_KEY_PREFIX +
EntriesCredentialConfiguration.SERVICE_ACCOUNT_EMAIL_SUFFIX,
"reddevil.c06#gmail.com");

Resources