Could not parse Master URL: 'spark.bluemix.net'

Could not parse Master URL: 'spark.bluemix.net' - apache-spark

I'm trying to connect to IBM's Spark as a Service running on Bluemix from RStudio running on my desktop machine.
I have copied the config.yml from the automatically configured RStudio environment running on IBM's Data Science Experience:
default:
method: "shell"
CS-DSX:
method: "bluemix"
spark.master: "spark.bluemix.net"
spark.instance.id: "myinstanceid"
tenant.id: "mytenantid"
tenant.secret: "mytenantsecret"
hsui.url: "https://cdsx.ng.bluemix.net"
I am attempting to connect like so:
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "1.6.2") # installed spark to '~/Library/Caches/spark/spark-1.6.2-bin-hadoop2.6'
spark_home = '~/Library/Caches/spark/spark-1.6.2-bin-hadoop2.6'
config = spark_config(file = "./config.yml", use_default = FALSE, config = "CSX-DSX")
sc <- spark_connect(spark_home = spark_home, config = config)
The error:
17/03/07 09:36:19 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: 'spark.bluemix.net'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2735)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:522)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2281)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
There are a few other questions on stackoverflow with similar error messages, but they are not trying to connect to the Spark service running on Bluemix.
Update 1
I've changed my config.yml to look like this:
default:
method: "bluemix"
spark.master: "spark://spark.bluemix.net:7070"
spark.instance.id: "7a4089bf-3594-4fdf-8dd1-7e9fd7607be5"
tenant.id: "sdd1-7e9fd7607be53e-39ca506ba762"
tenant.secret: "6146a713-949f-4d4e-84c3-9913d2165b9e"
hsui.url: "https://cdsx.ng.bluemix.net"
... and my connection code to look like this:
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "1.6.2")
spark_home = '~/Library/Caches/spark/spark-1.6.2-bin-hadoop2.6'
config = spark_config(file = "./config.yml", use_default = FALSE)
sc <- spark_connect(spark_home = spark_home, config = config)
However, the error is now:
Error in force(code) :
Failed during initialize_connection: java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:583)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2281)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sparklyr.Invoke$.invoke(invoke.scala:94)
...

The library tries to parse a URL, but you're giving it a hostname.
Try spark://spark.bluemix.net for spark.master.

Please follow the blog post http://datascience.ibm.com/blog/access-ibm-analytics-for-apache-spark-from-rstudio/ to connect Bluemix SparkaaS from DSX RStudio.

I received the following response from the engineering team:
RStudio desktop version doesn't support at this time to use sparklyr package to connect Bluemix SparkaaS service

Related

A request was made to load the default HttpClient provider but one could not be found on the classpath

I'm trying to remove some blobs from Azure storage blob using azure-storage-blob lib, my app is deployed in databricks as a spark job. In addition, my code worked correctly on my local machine
I have the error bellow :
IllegalStateException: A request was made to load the default HttpClient provider but one could not be found on the classpath. If you are using a dependency manager, consider including a dependency on azure-core-http-netty or azure-core-http-okhttp. Depending on your existing dependencies, you have the choice of Netty or OkHttp implementations. Additionally, refer to https://aka.ms/azsdk/java/docs/custom-httpclient to learn about writing your own implementation
My code :
val accountName: String = spark.conf.get("AZURE_BLOB_STORAGE_ACCOUNT_NAME")
val accountKey: String = spark.conf.get(s"fs.azure.account.key.$accountName.blob.core.windows.net")
val endpoint = "https://" + accountName + ".blob.core.windows.net"
val credential = new StorageSharedKeyCredential(accountName, accountKey)
val client = new BlobServiceClientBuilder().endpoint(endpoint).credential(credential).buildClient
val containerClient = client.getBlobContainerClient(containerName)
containerClient
.listBlobsByHierarchy(s"$folderName/")
.forEach(blob =>
containerClient
.getBlobClient(blob.getName)
.deleteIfExists()
)
Any idea to resolve this problem?
Thank you

You probably have following merge strategy configured in build.sbt:
case PathList("META-INF", xs # _*) => MergeStrategy.discard
Please check if META-INF/services directory is available in your jar.

Databricks Spark connection issue over Simba JDBC

I am trying to connect Spark Databricks from PERL code over Simba JDBC (Databricks recommended way) .For ref this is the JDBC driver: https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/jdbc/2.6.17/SimbaSparkJDBC42-2.6.17.1021.zip
So far I managed to setup PERL and all PERL related module config and below issue is nothing to do with PERL which I strongly believe.
I have below code trying to connect Spark Databricks.
Note : 'replaceme' in the password is databricks personaL ACCESS TOKEN.
#!/usr/bin/perl
use strict;
use DBI;
my $user = "token";
my $pass = "replaceme";
my $host = "DBhost.azuredatabricks.net";
my $port = 9001;
my $url = "jdbc:spark://DBhost.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/853imaskedthis14/1005-imaskedthis-okra138;AuthMech=3;UID=token;PWD=replaceme"; # Get this URL from JDBC data src
my %properties = ('user' => $user,
'password' => $pass,
'host.name' => $host,
'host.port' => $port);
my $dsn = "dbi:JDBC:hostname=localhost;port=$port;url=$url";
my $dbh = DBI->connect($dsn, undef, undef,
{ PrintError => 0, RaiseError => 1, jdbc_properties => \%properties })
or die "Failed to connect: ($DBI::err) $DBI::errstr\n";
my $sql = qq/select * from table/;
my $sth = $dbh->prepare($sql);
$sth->execute();
my #row;
while (#row = $sth->fetchrow_array) {
print join(", ", #row), "\n";
}
I am ending up below issue and error with SIMBA driver connecting to SPARK THRIFT server as Authentication issue.
failed: [Simba][SparkJDBCDriver](500164) Error initialized or created transport for authentication: Invalid status 21
Also, could not send response: com.simba.spark.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed). at ./perldatabricksconntest.pl line 18.
The logger recorded below Java stack trace:
[Thread-1] 05:40:16,718 WARN - Error
java.sql.SQLException: [Simba][SparkJDBCDriver](500164) Error initialized or created transport for authentication: Invalid status 21
Also, could not send response: com.simba.spark.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed).
at com.simba.spark.hivecommon.api.HiveServer2ClientFactory.createTransport(Unknown Source)
at com.simba.spark.hivecommon.api.ServiceDiscoveryFactory.createClient(Unknown Source)
at com.simba.spark.hivecommon.core.HiveJDBCCommonConnection.establishConnection(Unknown Source)
at com.simba.spark.spark.core.SparkJDBCConnection.establishConnection(Unknown Source)
at com.simba.spark.jdbc.core.LoginTimeoutConnection.connect(Unknown Source)
at com.simba.spark.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
at com.simba.spark.jdbc.common.AbstractDriver.connect(Unknown Source)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at com.vizdom.dbd.jdbc.Connection.handleRequest(Connection.java:417)
at com.vizdom.dbd.jdbc.Connection.run(Connection.java:211)
Caused by: com.simba.spark.support.exceptions.GeneralException: [Simba][SparkJDBCDriver](500164) Error initialized or created transport for authentication: Invalid status 21
Also, could not send response: com.simba.spark.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed).
... 11 more
Also as per SIMBA JDBC connector document I have tried NO authentication mode, Username , Username / Password none of them working .
So wonder where is the Authentication issue here in transport layer . Note I already have created token and mentioned that in password section while initiating jdbc:spark call .

You need to generate personal access token and put it instead of the replaceme string in the JDBC url? After that you don't need to specify user & password fields in the %properties.

UnknownHostException for https url

I'm doing
HttpsURLConnection conn = (HttpsURLConnection) new URL("https", "www.sec.gov", 443, "/cgi-bin/browse-edgar?action=getcurrent&CIK=&type=SC%2013D&company=&dateb=&owner=include&start=0&count=40&output=atom").openConnection();
InputStream stream = conn.getInputStream();
but it fails with
java.net.UnknownHostException: www.sec.gov
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:289)
at java.base/sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:182)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
at java.base/sun.net.www.protocol.https.HttpsClient.(HttpsClient.java:265)
at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:372)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1071)
at java.base/sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1069)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:795)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1068)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
at java.base/sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1512)
at java.base/sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1510)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:795)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1509)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
I'm successfully able to ping the host www.sec.gov or curl the url. Why is my Java program alone not working? Please help.

It is working today without doing any change. I suspect a host name DNS timeout may be the reason it didn't work before.

Can not connect Apache Ignite on Azure Kuberntes from .net core app

I am new to Ignite and Kubernetes. I have a.Net Core 3.1 web application which is hosted Azure Linux App Service.
I followed the instructions (Apache Ignite Instructions Offical Site) and Apache Ignite could run on Azure Kubernetes. I could create a sample table and read-write actions worked successfully. Here is the screenshot of my success tests on PowerShell.
Please see the success test
Now, I try to connect Apache Ignite from my .net core web app but I couldn't make it.
My code is as below. I try to connect with IgniteConfiguration and SpringCfgXml, but both of them getting error.
private void Initialize()
{
var cfg = GetIgniteConfiguration();
_ignite = Ignition.Start(cfg);
InitializeCaches();
}
public IgniteConfiguration GetIgniteConfiguration()
{
var appSettingsJson = AppSettingsJson.GetAppSettings();
var igniteNodes = appSettingsJson["AppSettings:IgniteNodes"];
var nodeList = igniteNodes.Split(",");
var config = new IgniteConfiguration
{
Logger = new IgniteLogger(),
DiscoverySpi = new TcpDiscoverySpi
{
IpFinder = new TcpDiscoveryStaticIpFinder
{
Endpoints = nodeList
},
SocketTimeout = TimeSpan.FromSeconds(5)
},
IncludedEventTypes = EventType.CacheAll,
CacheConfiguration = GetCacheConfiguration()
};
return config;
}
The first error I get:
Apache.Ignite.Core.Common.IgniteException HResult=0x80131500
Message=Java class is not found (did you set IGNITE_HOME environment
variable?):
org/apache/ignite/internal/processors/platform/PlatformIgnition
Source=Apache.Ignite.Core
Also, I have no idea what I am gonna set for IGNITE_HOME, and username and secret to authentication.

Solution :
I finally connect the Ignite on Azure Kubernetes.
Here is my connection method.
public void TestConnection()
{
var cfg = new IgniteClientConfiguration
{
Host = "MyHost",
Port = 10800,
UserName = "user",
Password = "password"
};
using (IIgniteClient client = Ignition.StartClient(cfg))
{
var employeeCache1 = client.GetOrCreateCache<int, Employee>(
new CacheClientConfiguration(EmployeeCacheName, typeof(Employee)));
employeeCache1.Put(1, new Employee("Bilge Wilson", 12500, 1));
}
}
To find to host IP, user name and client secret please check the below images.
Client Id and Secret
IP Addresses
Note: I don't need to set any IGNITE_HOME ana JAVA_HOME variable.

The simplest way is to download Apache Ignite binary distribution (of the same version as one that you use), unzip it to a directory, and point IGNITE_HOME environment variable or IgniteConfiguration.IgniteHome configuration property to unzipped apache-ignite-n.n.n-bin/ directory absolute path.
We support doing that automatically for Windows-hosted apps but not for Linux-based deployments.

Connecting to bigquery with spark on local

I am trying to a run a code to read from bigquery do some transformation using spark. Getting the below error
Exception in thread "main" java.io.IOException: Error accessing: bucket: test-n, object: spark/output/wordcount
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.wrapException(GoogleCloudStorageImpl.java:1707)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1733)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1618)
at com.google.cloud.hadoop.gcsio.ForwardingGoogleCloudStorage.getItemInfo(ForwardingGoogleCloudStorage.java:214)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1094)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1422)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at com.spark.bigquery.App.main(App.java:67)
Caused by: com.google.api.client.auth.oauth2.TokenResponseException: 400 Bad Request
{
"error" : "invalid_grant",
"error_description" : "Robot is missing a project number."
}
I have done the configuration of service account and email in the code. Placed the P12 file also.
conf.set("fs.gs.project.id", projectId);
// Use service account for authentication. The service account key file is located at the path
// specified by the configuration property google.cloud.auth.service.account.json.keyfile.
conf.set(EntriesCredentialConfiguration.BASE_KEY_PREFIX +
EntriesCredentialConfiguration.ENABLE_SERVICE_ACCOUNTS_SUFFIX,
"true");
conf.set(EntriesCredentialConfiguration.BASE_KEY_PREFIX +
EntriesCredentialConfiguration.SERVICE_ACCOUNT_KEYFILE_SUFFIX,
"aesthetic-genre-216711-3a23f8112565.p12");
conf.set(EntriesCredentialConfiguration.BASE_KEY_PREFIX +
EntriesCredentialConfiguration.SERVICE_ACCOUNT_EMAIL_SUFFIX,
"reddevil.c06#gmail.com");

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Could not parse Master URL: 'spark.bluemix.net' - apache-spark

The library tries to parse a URL, but you're giving it a hostname. Try spark://spark.bluemix.net for spark.master.

Please follow the blog post http://datascience.ibm.com/blog/access-ibm-analytics-for-apache-spark-from-rstudio/ to connect Bluemix SparkaaS from DSX RStudio.

I received the following response from the engineering team: RStudio desktop version doesn't support at this time to use sparklyr package to connect Bluemix SparkaaS service

Related

A request was made to load the default HttpClient provider but one could not be found on the classpath

Databricks Spark connection issue over Simba JDBC

UnknownHostException for https url

Can not connect Apache Ignite on Azure Kuberntes from .net core app

Connecting to bigquery with spark on local

Categories

Resources