SparkSession doesn't shutdown properly between unit tests - apache-spark

I have a few unit tests that need to have their own sparkSession. I extended SQLTestUtils, and am overriding the beforeAll and afterAll functions that are used in many other Spark Unit tests (from the source). I have a few test suites that look something like this:
class MyTestSuite extends QueryTest with SQLTestUtils {
protected var spark: SparkSession = null
override def beforeAll(): Unit = {
super.beforeAll()
spark = // initialize sparkSession...
}
override def afterAll(): Unit = {
try {
spark.stop()
spark = null
} finally {
super.afterAll()
}
}
// ... my tests ...
}
If I run one of these, it's fine, but if I run two or more, I get this error:
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/jenkins/workspace/Query/apache-spark/sql/hive-thriftserver-cat-server/metastore_db.
But I thought that the afterAll() was supposed to properly shut spark down so that I could create a new one. Is this not right? How do I accomplish this?

One way to do it this is to disable parallel test execution for your Spark app project to make sure only one instance of Spark Session object is active at the time. In sbt syntax it would like this:
project.in(file("your_spark_app"))
.settings(parallelExecution in Test := false)
The downside is that this is a per project setting and it would also affect the tests that would benefit from parallelization. A workaround would be to create a separate project for Spark tests.

Related

When are custom TableCatalogs loaded?

I've created a custom Catalog in Spark 3.0.0:
class ExCatalogPlugin extends SupportsNamespaces with TableCatalog
I've provided the configuration asking Spark to load the Catalog:
.config("spark.sql.catalog.ex", "com.test.ExCatalogPlugin")
But Spark never loads the plugin, during debug no breakpoints are ever hit inside the initialize method, and none of the namespaces it exposes are recognized. There are also no error messages logged. If I change the class name to an invalid class name no errors are thrown either.
I wrote a small TEST case similar to the test cases in the Spark code, and I am able to load the plugin if I call:
package org.apache.spark.sql.connector.catalog
....
class CatalogsTest extends FunSuite {
test("EX") {
val conf = new SQLConf()
conf.setConfString("spark.sql.catalog.ex", "com.test.ExCatalogPlugin")
val plugin:CatalogPlugin = Catalogs.load("ex", conf)
}
}
Spark is using it's normal Lazy loading techniques, and doesn't instantiate the custom Catalog Plugin until it's needed.
In my case referencing the plugin in one of two ways worked:
USE ex, this explicit USE statement caused Spark to lookup the catalog and instantiate it.
I have a companion TableProvider defined as class DefaultSource extends SupportsCatalogOptions. This class has a hard coded extractCatalog set to ex. If I create a reader for this source, it sees the name of the catalog provider and will instantiate it. It then uses the Catalog Provider to create the table.

LiquiBase usage for tests in Jhipster

I want to repopulate database after each test with LiquiBase in Jhipster app. How cat I set up Junit test to do it?
I see that Initially LiquiBase is run at application start up via:
#Bean
public SpringLiquibase liquibase(#Qualifier("taskExecutor") Executor executor,
DataSource dataSource, LiquibaseProperties liquibaseProperties) {
// Use liquibase.integration.spring.SpringLiquibase if you don't want Liquibase to start asynchronously
SpringLiquibase liquibase = new SpringLiquibase();
liquibase.setDataSource(dataSource);
liquibase.setChangeLog("classpath:config/liquibase/master.xml");
liquibase.setContexts(liquibaseProperties.getContexts());
liquibase.setDefaultSchema(liquibaseProperties.getDefaultSchema());
liquibase.setDropFirst(liquibaseProperties.isDropFirst());
liquibase.setChangeLogParameters(liquibaseProperties.getParameters());
if (env.acceptsProfiles(Profiles.of(JHipsterConstants.SPRING_PROFILE_NO_LIQUIBASE))) {
liquibase.setShouldRun(false);
} else {
liquibase.setShouldRun(liquibaseProperties.isEnabled());
log.debug("Configuring Liquibase");
}
return liquibase;
}
but I cant find a way to drop all tables and rerun all change sets.
Isn't it enough to use #Transactional annotation on your tests?
Liquibase is mainly useful for building your schema and partially for loading some test data in JHipster but if your tests are transactional, data inserted or modified by your test will be rollbacked automatically after each test method.
Droping the schema and re-creating it for each test would much slower.
Even if you don't want to use a transactional test, it would be faster to delete table contents than droping tables and re-creating them.

how can I get instance of the session of Cassandra in TestCassandra

I am unable to figure out how to unit-test my class which access Cassandra. I might have to redesign!
I have created a Play Components class (https://www.playframework.com/documentation/2.6.x/api/scala/index.html#play.api.BuiltInComponentsFromContext) which creates a cassandra session at application start up.
trait CassandraRepositoryComponents {
def environment: Environment
def configuration: Configuration
def applicationLifecycle: ApplicationLifecycle
...
lazy private val cassandraSession: Session = {
val cluster = new Cluster.Builder().
addContactPoints(uri.hosts.toArray: _*).
withPort(uri.port).
withQueryOptions(new QueryOptions().setConsistencyLevel(defaultConsistencyLevel)).build
val session = cluster.connect
}
}
The session thus created is passed to my repo class when the repo class is instantiaited
class UsersRepository(utilities:HelperMethods, session: Session,tablename:String)
extends CassandraRepository[UserKeys,User](session, tablename, List("bucket","email")) {
//UsersRepository CLASS DOESN'T USE session DIRECTLY. IT PASSES THE session TO CassandraRepository WHICH EVENTUALLY CALLS session.execute TO RUN QUERIES...}
I want to unit test UsersRepository. I am using embedded-cassandra to test it but it seems embedded-cassandra doesn't provide a way to get instance of the session it creates.
question1 - Is there a way I could get the session of Cassandra started by TestCassandra?
question2 - is there a better way for me to organise the classes?
question1 - Is there a way I could get the session of Cassandra
started by TestCassandra?
There is no way to get the Session of Cassandra started by TestCassandra.
You can use either com.github.nosan.embedded.cassandra.test.ClusterFactory
or com.github.nosan.embedded.cassandra.test.CqlSessionFactory to create Cluster or CqlSession classes.

Running tests in parallel using TestNG and gradle

I am trying to run my TestNG tests in parallel but they seem to just be running single-threaded. I am trying to run them using IntelliJ 14.1.4 Community Edition with the default built in gradle wrapper and Java 1.8.0_45.
I've also tried using standalone gradle-2.5.
The test section of my current build.gradle file looks like:
test {
systemProperties System.getProperties()
useTestNG() {
parallel 'tests'
threadCount 3
}
}
I've also tried:
test {
systemProperties System.getProperties()
useTestNG {
options {
parallel = 'tests'
threadCount = 3
}
}
}
and:
test {
systemProperties System.getProperties()
useTestNG {
options ->
options.parallel = 'tests'
options.threadCount = 3
}
}
I needed to use 'methods' instead of 'tests' because I was only running one test class, using -Dtest.single=TestClassName, expecting all the #Test methods inside that to be run in parallel.
The relevant documentation:
parallel="methods": TestNG will run all your test methods in separate threads. Dependent methods will also run in separate threads but they will respect the order that you specified.
parallel="tests": TestNG will run all the methods in the same <test> tag in the same thread, but each <test> tag will be in a separate thread. This allows you to group all your classes that are not thread safe in the same <test> and guarantee they will all run in the same thread while taking advantage of TestNG using as many threads as possible to run your tests.
From: http://testng.org/doc/documentation-main.html#parallel-tests
It depends on you testng.xml you have made.
Refer below link:-
http://howtodoinjava.com/2014/12/02/testng-executing-parallel-tests/
http://testng.org/doc/documentation-main.html#parallel-tests

Quit driver after each Geb Spock test

I am running Geb/Spock tests in Sauce Connect, and I would prefer to have unique instances of the RemoteWebDriver per test. This way, the Sauce reports would be divided by test, which makes it easy to diagnose failures. I'm not concerned (right now) about the additional performance overhead, because as it stands running all our Geb tests via one RemoteWebDriver instance is not helpful at all - it takes a very long time to coordinate the results with the Sauce screenshots/screencasts, and when timeouts occur (which is a high possibility in a long running job over Sauce Connect) there is usually some test failure spillover.
I tried this in a class that extends GebReportingSpec:
def cleanup() {
if (System.getProperty('geb.env')?.contains('sauce')) {
setSauceJobStatus()
driver.quit()
}
}
And of course, I create a new RemoteWebDriver in the setup() method.
With this approach I get a unique Sauce Connect session per test, and the results are all beautifully organized in Sauce. HOWEVER, all the tests fail due to:
"org.openqa.selenium.remote.SessionNotFoundException: Session ID is null. Using WebDriver after calling quit()?"
It turns out that the cleanup() method in GebReportingSpec calls out to this method:
void report(String label = "") {
browser.report(ReporterSupport.toTestReportLabel(_gebReportingSpecTestCounter, _gebReportingPerTestCounter++, _gebReportingSpecTestName.methodName, label))
}
Which throws this stack trace:
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:125)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:572)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:622)
at org.openqa.selenium.remote.RemoteWebDriver.getPageSource(RemoteWebDriver.java:459)
at geb.report.PageSourceReporter.getPageSource(PageSourceReporter.groovy:42)
at geb.report.PageSourceReporter.writePageSource(PageSourceReporter.groovy:38)
at geb.report.PageSourceReporter.writeReport(PageSourceReporter.groovy:29)
at geb.report.CompositeReporter.writeReport(CompositeReporter.groovy:31)
at geb.Browser.report(Browser.groovy:788)
at geb.spock.GebReportingSpec.report(GebReportingSpec.groovy:44)
at geb.spock.GebReportingSpec.cleanup(GebReportingSpec.groovy:39)
It's assuming that the WebDriver instance is still around when the GebReportingSpec cleanup() method is called, so that the reporting information can be prepared.
So, my approach is obviously not the "Geb way".... I'm wondering if anyone can clue me in on how to properly create a unique driver per Spock test?
Unfortunately you've hit a limitation of GebReportingSpec implementation and the fixed order of execution of Spock's setup and cleanup methods in inheritance hierarchy. What you should do is to quit your browser in a method that overrides GebSpec.resetBrowser() instead of cleanup():
void resetBrowser() {
def driver = browser.driver
super.resetBrowser()
if (System.getProperty('geb.env')?.contains('sauce')) {
driver.quit()
}
}
Getting a local reference to the driver and then calling super method is important because calling the super method will clear browser reference which means that you won't be able to get hold of the driver after that.
Also, you should not create a new RemoteWebDriver in setup() but you should disable driver caching which means that a new driver will be created per driver request (a driver is requested per browser creation and a new browser is created per each test) instead of the cached one being reused.
If you use browser.quit(), then you will get an exception and the test will be failed. You can try the following snippet at the very beginning of your class, it should work just fine:
def setup() {
browser.config.cacheDriver = false
browser.driver = browser.config.driver
}
def cleanup() {
browser.close()
}
Cheers!

Resources