Pass parameters to the jar when using spark launcher - apache-spark

I am trying to create an executable jar which is using a spark launcher to run another jar with data transformation task(this jar creates spark session).
I need to pass java parameters(some java arrays) to the jar which is executed by the launcher.
object launcher {
#throws[Exception]
// How do I pass parameters to spark_job_with_spark_session.jar
def main(args: Array[String]): Unit = {
val handle = new SparkLauncher()
.setAppResource("spark_job_with_spark_session.jar")
.setVerbose(true)
.setMaster("local[*]")
.setConf(SparkLauncher.DRIVER_MEMORY, "4g")
.launch()
}
}
How can I do that?

need to pass java parameters(some java arrays)
It is equivalent to executing spark-submit so you cannot pass Java objects directly. Use app args
addAppArgs(String... args)
to pass application arguments, and parse them in your app.

/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.meow.woof.meow_spark_launcher.app;
import com.meow.woof.meow_spark_launcher.common.TaskListener;
import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;
/**
*
* #author hahattpro
*/
public class ExampleSparkLauncherApp {
public static void main(String[] args) throws Exception {
SparkAppHandle handle = new SparkLauncher()
.setAppResource("/home/cpu11453/workplace/experiment/SparkPlayground/target/scala-2.11/SparkPlayground-assembly-0.1.jar")
.setMainClass("me.thaithien.playground.ConvertToCsv")
.setMaster("spark://cpu11453:7077")
.setConf(SparkLauncher.DRIVER_MEMORY, "3G")
.addAppArgs("--input" , "/data/download_hdfs/data1/2019_08_13/00/", "--output", "/data/download_hdfs/data1/2019_08_13/00_csv_output/")
.startApplication(new TaskListener());
handle.addListener(new SparkAppHandle.Listener() {
#Override
public void stateChanged(SparkAppHandle handle) {
System.out.println(handle.getState() + " new state");
}
#Override
public void infoChanged(SparkAppHandle handle) {
System.out.println(handle.getState() + " new state");
}
});
System.out.println(handle.getState().toString());
while (!handle.getState().isFinal()) {
//await until job finishes
Thread.sleep(1000L);
}
}
}
Here is example code that work

Related

SparkListener in Spark on YARN-CLUSTER not works?

My main purpose is to get the appId after submitting the yarn-cluster task through java code, which is convenient for more business operations.
Add the --conf=spark.extraListeners=Mylistener
While SparkListener does work when I use Spark in standalone mode, it doesn't work when I run Spark on a cluster over Yarn. Is it possible for SparkListener to work when running over Yarn? If so, what steps should I do to enable that?
Here is the Mylistener class code:
public class Mylistener extends SparkListener {
private static Logger logger = LoggerFactory.getLogger(EnvelopeSparkListener.class);
#Override
public void onApplicationStart(SparkListenerApplicationStart sparkListenerApplicationStart) {
Option<String> appId = sparkListenerApplicationStart.appId();
EnvelopeSubmit.appId = appId.get();
logger.info("====================start");
}
#Override
public void onBlockManagerAdded(SparkListenerBlockManagerAdded blockManagerAdded) {
logger.info("=====================add");
}
}
Here is the Main class to submit the application:
public static void main(String[] args) {
String jarpath = args[0];
String childArg = args[1];
System.out.println("jarpath:" + jarpath);
System.out.println("childArg:" + childArg);
System.setProperty("HADOOP_USER_NAME", "hdfs");
String[] arg = {"--verbose=true", "--class=com.cloudera.labs.envelope.EnvelopeMain",
"--master=yarn", "--deploy-mode=cluster","--conf=spark.extraListeners=Mylistener","--conf","spark.eventLog.enabled=true", "--conf","spark.yarn.jars=hdfs://192.168.6.188:8020/user/hdfs/lib/*", jarpath, childArg};
SparkSubmit.main(arg);
}
If you just want to get the app id you can simply do this,
logger.info(s"Application id: ${sparkSession.sparkContext.applicationId}")
Hope this answers your question!

org.testng.TestNGException: Method tearDown requires 1 parameters but 0 were supplied in the #Configuration annotation

I am not able to understand what is required as parameter, can anyone help me with this.
I have written below code:-
#Test(groups = "cucumber", description = "Runs Cucumber Feature", dataProvider = "features")
public void feature(CucumberFeatureWrapper cucumberFeature) throws Exception {
testNGCucumberRunner.runCucumber(cucumberFeature.getCucumberFeature());
}
#AfterMethod(alwaysRun = true)
public void tearDown(Scenario scenario) {
scenario.write("Finished Scenario");
if (scenario.isFailed()) {
String screenshotName = scenario.getName().replaceAll(" ", "_");
try {
File sourcePath =((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
File destinationPath = new File(System.getProperty("user.dir") + "/Screenshots/" + screenshotName + ".png");
Files.copy(sourcePath, destinationPath);
Reporter.addScreenCaptureFromPath(destinationPath.toString());
} catch (IOException e) {
}
driver.close();
}
}
And I am getting below error:-
FAILED CONFIGURATION: #AfterMethod tearDown
org.testng.TestNGException: Method tearDown requires 1 parameters but
0 were supplied in the #Configuration annotation.
You cannot pass a Cucumber Scenario object to a TestNg configuration method. The AfterMethod will be called by TestNg and will not be able to inject the Scenario object. For a list of objects that are injected automatically refer to this - http://testng.org/doc/documentation-main.html#native-dependency-injection
Either use the After annotation of Cucumber and pass the Scenario object.
#cucumber.api.java.After
public void tearDown(Scenario scenario)
Or use the AfterMethod of TestNg and pass the ITestResult object.
#org.testng.annotations.AfterMethod
public void tearDown(ITestResult result)

Commons Configuration2 ReloadingFileBasedConfiguration

I am trying to implement the Apache Configuration 2 in my codebase
import java.io.File;
import java.util.concurrent.TimeUnit;
import org.apache.commons.configuration2.PropertiesConfiguration;
import org.apache.commons.configuration2.builder.ConfigurationBuilderEvent;
import org.apache.commons.configuration2.builder.ReloadingFileBasedConfigurationBuilder;
import org.apache.commons.configuration2.builder.fluent.Parameters;
import org.apache.commons.configuration2.convert.DefaultListDelimiterHandler;
import org.apache.commons.configuration2.event.EventListener;
import org.apache.commons.configuration2.ex.ConfigurationException;
import org.apache.commons.configuration2.reloading.PeriodicReloadingTrigger;
import org.apache.commons.configuration2.CompositeConfiguration;
public class Test {
private static final long DELAY_MILLIS = 10 * 60 * 5;
public static void main(String[] args) {
// TODO Auto-generated method stub
CompositeConfiguration compositeConfiguration = new CompositeConfiguration();
PropertiesConfiguration props = null;
try {
props = initPropertiesConfiguration(new File("/tmp/DEV.properties"));
} catch (ConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
compositeConfiguration.addConfiguration( props );
compositeConfiguration.addEventListener(ConfigurationBuilderEvent.ANY,
new EventListener<ConfigurationBuilderEvent>()
{
#Override
public void onEvent(ConfigurationBuilderEvent event)
{
System.out.println("Event:" + event);
}
});
System.out.println(compositeConfiguration.getString("property1"));
try {
Thread.sleep(14*1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// Have a script which changes the value of property1 in DEV.properties
System.out.println(compositeConfiguration.getString("property1"));
}
protected static PropertiesConfiguration initPropertiesConfiguration(File propsFile) throws ConfigurationException {
if(propsFile.exists()) {
final ReloadingFileBasedConfigurationBuilder<PropertiesConfiguration> builder =
new ReloadingFileBasedConfigurationBuilder<PropertiesConfiguration>(PropertiesConfiguration.class)
.configure(new Parameters().fileBased()
.setFile(propsFile)
.setReloadingRefreshDelay(DELAY_MILLIS)
.setThrowExceptionOnMissing(false)
.setListDelimiterHandler(new DefaultListDelimiterHandler(';')));
final PropertiesConfiguration propsConfiguration = builder.getConfiguration();
PeriodicReloadingTrigger trigger = new PeriodicReloadingTrigger(builder.getReloadingController(),
null, 1, TimeUnit.SECONDS);
trigger.start();
return propsConfiguration;
} else {
return new PropertiesConfiguration();
}
}
}
Here is a sample code that I using to check whether the Automatic Reloading works or not. However when the underlying property file is updated, the configuration doesn't reflect it.
As per the documentation :
One important point to keep in mind when using this approach to reloading is that reloads are only functional if the builder is used as central component for accessing configuration data. The configuration instance obtained from the builder will not change automagically! So if an application fetches a configuration object from the builder at startup and then uses it throughout its life time, changes on the external configuration file become never visible. The correct approach is to keep a reference to the builder centrally and obtain the configuration from there every time configuration data is needed.
https://commons.apache.org/proper/commons-configuration/userguide/howto_reloading.html#Reloading_File-based_Configurations
This is different from what the old implementation was.
I was able to successfully execute your sample code by making 2 changes :
make the builder available globally and access the configuration from the builder :
System.out.println(builder.getConfiguration().getString("property1"));
add the listener to the builder :
`builder.addEventListener(ConfigurationBuilderEvent.ANY, new EventListener() {
public void onEvent(ConfigurationBuilderEvent event) {
System.out.println("Event:" + event);
}
});
Posting my sample program, where I was able to successfully demonstrate it
import java.io.File;
import java.util.concurrent.TimeUnit;
import org.apache.commons.configuration2.PropertiesConfiguration;
import org.apache.commons.configuration2.builder.ConfigurationBuilderEvent;
import org.apache.commons.configuration2.builder.ReloadingFileBasedConfigurationBuilder;
import org.apache.commons.configuration2.builder.fluent.Parameters;
import org.apache.commons.configuration2.event.EventListener;
import org.apache.commons.configuration2.reloading.PeriodicReloadingTrigger;
public class TestDynamicProps {
public static void main(String[] args) throws Exception {
Parameters params = new Parameters();
ReloadingFileBasedConfigurationBuilder<PropertiesConfiguration> builder =
new ReloadingFileBasedConfigurationBuilder<PropertiesConfiguration>(PropertiesConfiguration.class)
.configure(params.fileBased()
.setFile(new File("src/main/resources/override.properties")));
PeriodicReloadingTrigger trigger = new PeriodicReloadingTrigger(builder.getReloadingController(),
null, 1, TimeUnit.SECONDS);
trigger.start();
builder.addEventListener(ConfigurationBuilderEvent.ANY, new EventListener<ConfigurationBuilderEvent>() {
public void onEvent(ConfigurationBuilderEvent event) {
System.out.println("Event:" + event);
}
});
while (true) {
Thread.sleep(1000);
System.out.println(builder.getConfiguration().getString("property1"));
}
}
}
The problem with your implementation is, that the reloading is done on the ReloadingFileBasedConfigurationBuilder Object and is not being returned to the PropertiesConfiguration Object.

TestNG Close Browsers after Parallel Test Execution

I want to close browsers after completion of all test. Problem is I am not able to close the browser since the object created ThreadLocal driver does not recognize the driver after completion of test value returning is null.
Below is my working code
package demo;
import java.lang.reflect.Method;
import org.openqa.selenium.By;
import org.testng.annotations.AfterMethod;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
public class ParallelMethodTest {
private static ThreadLocal<dummy> driver;
private int input;
private int length;
#BeforeMethod
public void beforeMethod() {
System.err.println("Before ID" + Thread.currentThread().getId());
System.setProperty("webdriver.chrome.driver", "chromedriver.exe");
if (driver == null) {
driver = new ThreadLocal<dummy>();
}
if (driver.get()== null) {
driver.set(new dummy());
}
}
#DataProvider(name = "sessionDataProvider", parallel = true)
public static Object[][] sessionDataProvider(Method method) {
int len = 12;
Object[][] parameters = new Object[len][2];
for (int i = 0; i < len; i++) {
parameters[i][0] = i;
parameters[i][1]=len;
}
return parameters;
}
#Test(dataProvider = "sessionDataProvider")
public void executSessionOne(int input,int length) {
System.err.println("Test ID---" + Thread.currentThread().getId());
this.input=input;
this.length=length;
// First session of WebDriver
// find user name text box and fill it
System.out.println("Parameter size is:"+length);
driver.get().getDriver().findElement(By.name("q")).sendKeys(input + "");
System.out.println("Input is:"+input);
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
#AfterMethod
public void afterMethod() {
System.err.println("After ID" + Thread.currentThread().getId());
driver.get().close();
}
}
package demo;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.AfterClass;
public class dummy {
public WebDriver getDriver() {
return newDriver;
}
public void setNewDriver(WebDriver newDriver) {
this.newDriver = newDriver;
}
private WebDriver newDriver;
public dummy() {
newDriver = new ChromeDriver();
newDriver.get("https://www.google.co.in/");
}
#AfterClass
public void close(){
if(newDriver!=null){
System.out.println("In After Class");
newDriver.quit();
}
}
}
Thanks in Advance.
private static ThreadLocal<dummy> driver is added at the class level. What is happening is that you have already declared the variable at class level. i.e. memory is already allocated to it. Multiple threads are just setting and resetting the values of the same variable.
What you need to do is create a factory that will return an instance of Driver based on a parameter you pass to it.Logic can be anything but taking a general use case example the factory will create a new object and return only if an existing object doesn't exist. Declare and initialise the driver (from factory) in your #Test Methods
Sample code for the factory would be something like
static RemoteWebDriver firefoxDriver;
static RemoteWebDriver someOtherDriver;
static synchronized RemoteWebDriver getDriver(String browser, String browserVersion, String platform, String platformVersion)
{
if (browser == 'firefox')
{
if (firefoxDriver == null)
{
DesiredCapabilities cloudCaps = new DesiredCapabilities();
cloudCaps.setCapability("browser", browser);
cloudCaps.setCapability("browser_version", browserVersion);
cloudCaps.setCapability("os", platform);
cloudCaps.setCapability("os_version", platformVersion);
cloudCaps.setCapability("browserstack.debug", "true");
cloudCaps.setCapability("browserstack.local", "true");
firefoxDriver = new RemoteWebDriver(new URL(URL),cloudCaps);
}
}
else
{
if (someOtherDriver == null)
{
DesiredCapabilities cloudCaps = new DesiredCapabilities();
cloudCaps.setCapability("browser", browser);
cloudCaps.setCapability("browser_version", browserVersion);
cloudCaps.setCapability("os", platform);
cloudCaps.setCapability("os_version", platformVersion);
cloudCaps.setCapability("browserstack.debug", "true");
cloudCaps.setCapability("browserstack.local", "true");
someOtherDriver = new RemoteWebDriver(new URL(URL),cloudCaps);
}
return someOtherDriver;
}
You have a concurrency issue: multiple threads can create a ThreadLocal instance because dummy == null can evaluate to true on more than one thread when run in parallel. As such, some threads can execute driver.set(new dummy()); but then another thread replaces driver with a new ThreadLocal instance.
In my experience it is simpler and less error prone to always use ThreadLocal as a static final to ensure that multiple objects can access it (static) and that it is only defined once (final).
You can see my answers to the following Stack Overflow questions for related details and code samples:
How to avoid empty extra browser opens when running parallel tests with TestNG
Session not found exception with Selenium Web driver parallel execution of Data Provider test case
This is happening because you are creating the driver instance in beforeMethod function so it's scope ends after the function ends.
So when your afterMethod start it's getting null because webdriver instance already destroy as beforeMethod function is already completed.
Refer below links:-
http://www.java-made-easy.com/variable-scope.html
What is the default scope of a method in Java?

Can access Azure SQL Database in the driver method of a Hadoop job running in HDInsight?

I'd like to work on a Hadoop application which runs on HDInsight. In the driver method of my application, I need to get some information from Azure SQL Database. I wonder to know whether that's possible to query Azure SQL Database in the driver method of my Hadoop job?
You can access Azure SQL Database using java.sql classes but you may need to add your headnode IP to your Database firewall rules.
package org.microsoft.andrewmoll.SqlExample;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
/**
* Hello world!
*
*/
public class SQLExample
{
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
//You should put some awesome map logic here
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
//You should put some awesome reducer logic here
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String jobName = getData();
System.out.println(jobName);
Job job = Job.getInstance(conf, jobName);
job.setJarByClass(SQLExample.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
public static String getData()
{
String driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
String url = "jdbc:sqlserver:<servername>.database.windows.net;DatabaseName=<dbname>";
String username = "DarthMoll";
String password = "Luke,Iamnotyourfather";
try {
/* Load database driver */
Class.forName(driver);
/* Establish database connection */
Connection con = DriverManager.getConnection(url, username, password);
/* Run query */
PreparedStatement stmt = con.prepareStatement("select top 1 * from dbo.SithWarriors");
/* Get return result */
ResultSet resultset = stmt.executeQuery();
/* get users first name */
String result = resultset.getString("FirstName");
/* Close result set */
resultset.close();
/* Close database connection */
con.close();
return result;
} catch (Exception e) {
e.printStackTrace();
}
return "Implement Some Throwable Here";
}
}
If possible, I suggest storing the data in a blob and using the Java SDK to access the data. Saves you from having to worry about the headnode IP address.

Resources