Distributed Load testing using Gatling tool - performance-testing

Can some one please provide me the information that if open source Gatling tool support distributed load testing the way how we implement in jmeter tool master slave setup.
Regards,
Chithra

Gatling distributed load testing is featured in Gatling Enterprise.

Related

How do we execute load test in cassandra DB using JMeter 5.1. Is there any plugins available?

I am trying to do load testing on cassandra DB. But when I check for JMeter cassandra pluggins there are 7 Cassandra Samplers in JMeter while using JMeter Pluggins installation. I have pretty good idea of the servers, keyspaces and connection
There is limited help in this regard and when searched for it is with the JMeter 2.9 versions.
It would be better that you use the cassandra-stress tool for this purpose. There are several resources available; as a starting point, you can look at datastax-academy or the last pickle blog.
Just like any other database (given it supports JDBC protocol)
Download Cassandra JDBC Driver (with all the dependencies) and drop it to JMeter Classpath
Restart JMeter to pick the libraries up
Add JSR223 Sampler to your Test Plan
Put the following code into "Script" area:
def cluster = com.datastax.driver.core.Cluster.builder().addContactPoint("IP Address or hostname of your Cassandra cluster").build();
def session = cluster.connect("your space");
def results = session.execute("SELECT * FROM users");
session.close();
cluster.close();
More information: Cassandra Load Testing with Groovy

Can we configure presto's data base connector information from its GUI

I am using presto version 179 and I need to manually create a database.properties file in /etc/presto/catalog through the CLI.
Can I do the same process from the GUI of presto?
Presto's built-in web interface does not provide any configuration capabilities.
Usually, such things are handled as part of deployment/configuration management on a cluster. Thus, configuration is provided by some external means just as is Presto installation.

What is the differences between Apache Spark and Apache Apex?

Apache Apex - is an open source enterprise grade unified stream and batch processing platform. It is used in GE Predix platform for IOT.
What are the key differences between these 2 platforms?
Questions
From a data science perspective, how is it different from Spark?
Does Apache Apex provide functionality like Spark MLlib? If we have to built scalable ML models on Apache apex how to do it & which language to use?
Will data scientists have to learn Java to built scalable ML models? Does it have python API like pyspark?
Can Apache Apex be integrated with Spark and can we use Spark MLlib on top of Apex to built ML models?
Apache Apex an engine for processing streaming data. Some others which try to achieve the same are Apache storm, Apache flink. Differenting factor for Apache Apex is: it comes with built-in support for fault-tolerance, scalability and focus on operability which are key considerations in production use-cases.
Comparing it with Spark: Apache Spark is actually a batch processing. If you consider Spark streaming (which uses spark underneath) then it is micro-batch processing. In contrast, Apache apex is a true stream processing. In a sense that, incoming record does NOT have to wait for next record for processing. Record is processed and sent to next level of processing as soon as it arrives.
Currently, work is under progress for adding support for integration of Apache Apex with machine learning libraries like Apache Samoa, H2O
Refer https://issues.apache.org/jira/browse/SAMOA-49
Currently, it has support for Java, Scala.
https://www.datatorrent.com/blog/blog-writing-apache-apex-application-in-scala/
For Python, you may try it using Jython. But, I haven't not tried it myself. So, not very sure about it.
Integration with Spark may not be good idea considering they are two different processing engines. But, Apache apex integration with Machine learning libraries is under progress.
If you have any other questions, requests for features you can post them on mailing list for apache apex users: https://mail-archives.apache.org/mod_mbox/incubator-apex-users/

Bluemix Spark Service

Firstly, I need to admit that I am new to Bluemix and Spark. I just want to try out my hands with Bluemix Spark service.
I want to perform a batch operation over, say, a billion records in a text file, then I want to process these records with my own set of Java APIs.
This is where I want to use the Spark service to enable faster processing of the dataset.
Here are my questions:
Can I call Java code from Python? As I understand it, presently only Python boilerplate is supported? There are few a pieces of JNI as well beneath my Java API.
Can I perform the batch operation with the Bluemix Spark service or it is just for interactive purposes?
Can I create something like a pipeline (output of one stage goes to another) with Bluemix, do I need to code for it ?
I will appreciate any and all help coming my way with respect to above queries.
Look forward to some expert advice here.
Thanks.
The IBM Analytics for Apache Spark sevices is now available and it allow you to submit a java code/batch program with spark-submit along with notebook interface for both python/scala.
Earlier, the beta code was limited to notebook interactive interface.
Regards
Anup

spark streaming visualization

I am using spark streaming to stream data from kafka broker. I am performing transformations on the data using spark streaming. Can someone suggest a visualization tool which I can use to show real-time graphs and charts which update as data streams in?
You could store your results in ElasticSearch and then use Kibana to perform visualizations.
Apart from looking at spark's own streaming UI tab, I highly recommend use of graphite sinks. Spark streaming is a long running application so for monitoring purposes this can be really handy.
In no time using graphite dashboards you will kick start monitoring your spark streaming application.
The best literature I know is, here in section monitoring. and [here too] (https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/)
It provides configuration and other details. Some of the dashboards you will find ready-made in json format on some or other github links but again I found these two posts most useful in my production application.
I hope this will help you for visualizing and monitoring your application internals in spark streaming application.
you have use Websockets for building real-time streaming Graphs.
As such there are no BI tools but there are JS libraries which can help in building real-time graphs - http://www.pubnub.com/blog/tag/d3-js/
Check out Lightning: A Data Visualization Server
http://lightning-viz.org/
The server is designed to for making web-based interactive visualizations using D3. It is designed for large data sets and continuously updating data streams.
You can use Pro BI Tools like Tableau, Power BI or even MS Excel.. For testing, I use MS Excel with 1 min auto refresh.
You can also write python code for this.

Resources