Create a custom LoadBalancing Policy for spark cassandra connector

Create a custom LoadBalancing Policy for spark cassandra connector - apache-spark

I know that the spark-cassandra connector comes with its own default loadbalancing policy implementation(DefaultLoadBalancingPolicy). How can I go about implementing my own custom LoadBalancing class? I want to have the application use the WhiteListRoundRobin policy. What steps would I need to take? I'm still a newbie in working with spark and Cassandra and I would appreciate any guidance in this. Thanks

You can look into implementation of LocalNodeFirstLoadBalancingPolicy - basically you need to create (if it doesn't exist) a class inherited from LoadBalancingPolicy, and implement your required logic for load balancing.
Then you need to create a class implementing CassandraConnectionFactory that will configure Cassandra session with required load balancing implementation. The simplest way is to take the code of DefaultConnectionFactory, but instead of using LocalNodeFirstLoadBalancingPolicy, specify your load balancing class.
And then you specify that connection factory class name in the spark.cassandra.connection.factory configuration property.

Related

Transaction Manger and JDBC template to dynamically route to multiple data sources(Physically different amazon RDS connections) on runtime

Transaction Manger and JDBC template to dynamically route to multiple data sources(Physically different amazon RDS connections) on runtime
We went through multiple examples on the internet and most forums refered to Springs abstractroutingdatasource implementation, but the challenge we are facing is to identify unknown databases on runtime. The business requirement is such that we dont have the opportunity to change the appliaction properties file by adding this new/unknown database and redeploying the code again.
For that we had gone ahead with the usage of Spring's DefaultTransactionDefination - pragrammatic implmenation rather than XML implementation.
Is there a way to implement an XML implementation of the same for an unknown/new datasourse without changing the application properties or redeploying it.

Dynamic configuration and persistent storage for config

I have the following two requirements in my NestJS application:
Dynamic configuration - allow users to change configuration values at runtime and allow for observing changes in configuration and take some action based on it.
Database storage - Store configuration in persistent store like Redis.
I have been looking into the #nestjs/config library but I couldn't find anything regarding this.
I was wondering if anybody has used it for such purpose? In case it doesn't support these features, does the library provide any extension hooks? How easy or difficult would it be to implement it in the library or on top of it? Or it's simply not meant for such usage and I should look elsewhere?
I'm quite competent when it comes to NodeJS and JS/TS ecosystem, so wouldn't mind implementing it myself if I get some clues. Would appreciate any thoughts on it.

how to write a custom hadoop group mapping class

I have a use case where I would like to integrate hdfs with my application. User management is handeled by the application. Now on HDFS side to fetch groups of a user we can use any of the predined ways defined here.
https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/GroupsMapping.html#Composite_Groups_Mapping
But in my case since my application handles users and groups is there a way to create a custom GroupMapping which talks to my application to get user and group details ?

This component of Hadoop is entirely extensible. You simply need to write a custom implementation of the GroupMappingServiceProvider, which has just 3 methods - one to translate a user to a list of groups that user is in, and two for managing the caching of the mapping. All you need to do after implementing this interface is place a JAR with your custom implementation onto the classpath of your HDFS JVMs and then specify it in the hadoop.security.group.mapping configuration.

How to prevent other servers from connecting to a local hazelcast instance

I want to use hazelcast as a local-only inter-JVM shared cache. Or put another way, I want to run a secure/private instance.
Is this possible? If so, how?
If it matters, it will be spring-managed.
The motivation is that confidential data will be stored in hazelcast and I want to protect it from external attacks.

You need to define your own Group configuration credentials which would be needed to connect to your cluster.
<hz:group name="dev" password="password"/>
Best practices:
Always define your own Hazelcast xml/spring configuration instead of using the default one from the jar file
Better to make use of TCP/IP network configuration wherever possible instead of multicast, so your cluster won't collide with others.
Define custom group credentials, as mentioned above.

Choice of technical solution to handling and processing data for a Liferay Project

I am researching to start a new project based on Liferay.
It relies on a system that will require its own data model and a certain agility and flexibility in data management as well as its visualization.
These are my options:
Using Liferay Expando fields and define their own data models. I must do all the view layer.
Using Liferay ECMS adding patches creating structures and hooks that allow me to define data models Master - Detail. It makes much easier viewing issue (velocity templates), but perhaps is the most "dirty" way.
Generating data layer and access to services with Hibernate and Spring. (using Service Factory, for example).
Liferay Service Builder would be similar to the option of creating the platform with Hibernate and Spring.
CRUD generation systems as OpenXava or your XMLPortletFactory
And now my question, what is your advice? What advantages or disadvantages do you think would provide one or another option?
Thanks in advance.

I can't speak for the other CRUD generation systems but I can tell you about the Liferay approaches.
I would take a hybrid approach.
First, I would create the required data models as best as I can with the current requirements in Liferay Service Builder and maintain them there as much as possible. This would require that you rebuild and redeploy your plugin every time you changed the data model but would greatly enhance performance compared to all the other Liferay approaches you've mentioned. Service Builder in that regard is much more rigid and cannot be changed via GUI.
However, in the event for some reason you cannot use Service Builder to redefine your data models and you need certain aspects of it the be changed via GUI, you can also use Expandos to extend the models you've created with Service Builder. So, it is the best of both worlds.
On the other option, using the ECMS would be a specialized case and I would only take this approach if there is a particular requirement it satisfies (like integration with the ECMS).
With that said, Liferay provides you many different ways to create your application. It ultimately depends on how you're going to use your application.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Create a custom LoadBalancing Policy for spark cassandra connector - apache-spark

Related

Transaction Manger and JDBC template to dynamically route to multiple data sources(Physically different amazon RDS connections) on runtime

Dynamic configuration and persistent storage for config

how to write a custom hadoop group mapping class

How to prevent other servers from connecting to a local hazelcast instance

Choice of technical solution to handling and processing data for a Liferay Project

Categories

Resources