How to control concurrent sessions per user in Apache Presto? - presto

I'm wondering how to control/limit max concurrent sessions per user querying "Apache Presto" that is installed on a distributed mode?
Thanks!
Is there a management console that we can use in Apache Presto to control max number of concurrent queries per logged-in users?

Related

If we create multiple Spark Sessions using newSession() method, how is the driver memory shared between multiple spark sessions

In my Spark Application I am creating multiple (2 - 3) spark sessions with the help of newSession() method. While submitting the applications, I am configuring spark.driver.memory to 24g.
How will this memory get distributed between 2 spark sessions if those are processing 2 different datasets in parallel. Thanks.
Sessions are used for configuration management not for resource management or parallel in-application processing. There is no built-in mechanism for any resource allocation, and are part of the same app from the manager perspective.
It means that first-comes-first-gets - there is no separation and however occupies resources first wins.

Is it possible to add Fair scheduler pools programmatically in Spark?

I'm developing an application where several users use the same SparkContext to launch their queries to a Spark Cluster.
As the Spark documentation states (https://spark.apache.org/docs/2.2.0/job-scheduling.html#fair-scheduler-pools), with the Fair scheduler, you can assign a different pool for every user and they'll get a fair share of the cluster resources but every pool will be set up with the default pool configuration (scheduling mode FIFO, weight 1, and minShare 0).
Given that we don't know in advance which users can connect to the application, we can't set up a configuration file for the fair scheduler pools for all the users.
So, in order to give a pool to every user dynamically and set up every pool with a FAIR scheduling mode, I think there might be 2 options:
Change the default pool behaviour in order to change the scheduling mode to FAIR. Is it possible? How?
Generate a scheduler pool dynamically and programmatically in order to add a scheduler pool when a user connects to the application for first time and that pool should be created with a FAIR scheduling mode. Is it possible? How?
Thanks in advance

Spark: Writing to DynamoDB, limited write capacity

My use case is to write to DynamoDB from a Spark application. As I have limited write capacity for DynamoDB and do not want to increase it for cost implications, how can I limit the Spark application to write at a regulated speed?
Can this be achieved by reducing the partitions to 1 and then executing foreachPartition()?
I already have auto-scaling enabled but don't want to increase it any further.
Please suggest other ways of handling this.
EDIT: This needs to be achieved when the Spark application is running on a multi-node EMR cluster.
Bucket scheduler
The way I would do this is to create a token bucket scheduler in your Spark application. A token bucket pattern is a common to design to ensure an application does not breach API limits. I have used this design successfully in very similar situations. You may find someone has written a library you can use for this purpose.
DynamoDB retry
Another (less attractive), option would be to increase the retry times on your DynamoDB connection. When your write does not succeed due to throughput provision exceeded, you can essentially instruct your DyanmoDB SDK to keep retrying for as long as you like. Details in this answer. This option may appeal if you want a 'quick and dirty' solution.

Tableau performance with 500 concurrent users

We are planning to use Tableau server with 8 core machine and 64 GB RAM for Data visualization as it has many rich features. Our idea is to use Spark SQL with hive metadata as the data input and mostly Live queries on top of it.
Can anyone who used the same architecture provide their thoughts on how tableau will behave with more than 500 concurrent users

Maximum parallel queries in Cassandra

I'm new in Cassandra DB, and I have a very trivial question: how much parallel queries can O do without compromising perfomance? The queries are going to be like
Select data from table where id='asdasdasd';
Its a server in a datacenter, it should work properly with 3000 read querys? Sorry for the poor information but its all i have.
It all depends on the server's capacity where you have installed your cluster of Cassandra, and how you have configured the nodes.
There is a configuration parameter in cassandra.yaml that is concurrent_reads
Tune it to get a better read rate.

Resources