I want to use hazelcast as a local-only inter-JVM shared cache. Or put another way, I want to run a secure/private instance.
Is this possible? If so, how?
If it matters, it will be spring-managed.
The motivation is that confidential data will be stored in hazelcast and I want to protect it from external attacks.
You need to define your own Group configuration credentials which would be needed to connect to your cluster.
<hz:group name="dev" password="password"/>
Best practices:
Always define your own Hazelcast xml/spring configuration instead of using the default one from the jar file
Better to make use of TCP/IP network configuration wherever possible instead of multicast, so your cluster won't collide with others.
Define custom group credentials, as mentioned above.
Related
We plan to use Cassandra 3.x and we want to allow our customers to connect to Cassandra directly for exporting the data into their data warehouses.
They will connect via ODBC from remote.
Is there any way to prevent that the customer executes huge or bad SELECT statements that will result in a high load for all nodes? We use an extra data center in our replication strategy where only customers can connect, so live system will not be affected. But we want to setup some workers that will run on this shadow system also. Most important thing is, that a connected remote client will not have any noticable impact on other remote connections or our local worker jobs. There is a materialized view already and I want to force customers to get data based on primary key only (i.e. disallow usage of ALLOW FILTERING). It would be great also, if one can limit the number of rows returned (e.g. 1 million) to prevent a pull of all data.
Is there a best practise for this use case?
I know of BlackRocks video related to multi-tenant strategy in C* which advises to use tenant_id in schema. That is what we're doing already, but how can I ensure security/isolation via ODBC connected tenants/customers? Or do I have to write an API on my own which handles security?
I would recommend to expose access via API, not via ODBC - at least you would have greater control on what is executed, and enforce tenant_id, and other checks, like limits, etc. You can try to utilize the Cassandra's CQL parser to decompose query, and put all required things back.
Theoretically, you can could utilize Apache Calcite, for example. It has implementation of JDBC driver that could be used, plus there is existing Cassandra adapter that you can modify to accomplish your task (mapping authentication into tenant_ids, etc.), but this will be quite a lot of work.
There are multiple node services currently deployed and running through pm2 in aws environment.
Difficulty(in terms of maintenance) I see in my current code base is that each of these node services have a a separate configuration file (config\app.json) - Though, most of the properties in these configuration files are common for all the services, each of the property is mentioned in each individual service in code. If there is a change is any of these properties, I will have to modify the change in multiple places.
I would like to centralise the configurations across multiple node services. Is there a way to do that? Expectation is to have a centralised place for maintaining configurations. Any references would help.
I am not sure how your architecture is but if you do not mind creating a small library or microservice, which will just fetch you configurations from a small NoSQL database such as Redis which stores key-value pairs, then it will provide you with configurations at a centralized place.
Now the only configuration remains here is of redis which you can add while building the service by providing it's configuration as an environment variable using some thing like yargs.
Then in every service you'll have to make only one API call to fill up your config json in your case config/app.json
While discovering SF Reliable Services I want to make sure that next basic statements are true.
Reliable Services Default Communication stack (DefaultStack) and Reliable Actors Communication stack (using ServiceProxy/ActorProxy) can only be used for communicating inside SF Cluster. Customers from outside must use WebAPI/WCF stacks.
ServicePartitionResolver, CommunicationClientFactory, ServicePartitionClient are stuff that already implemented inside DefaultStack. I don't have to worry about it if I use only DefaultStack.
Some Stateful service has more then one partition, and I want for example to post an item to process it. It is not SF's responsibility to decide what exactly partition should be used by posting customer. I need manually implement an algorithm resolving partition key or name and use it in ServiceProxy constructor (for DefaultStack).
You're correct on all those points,
If you want to communicate outside Service Fabric you need to use something like an OwinCommunicationListener (see here).
You’d only have to implement those if you wanted to plug in your own communication stack.
Yep, you’d need to define the partition key when you’re creating a ServiceProxy.
I am exploring the notion of using Hazelcast (or any another caching framework) to advertise services within a cluster. Ideally when a cluster member departs then its services (or objects advertising them) should be removed from the cache.
Is this at all possible?
It is possible for sure.
The question is: which solution do you like.
If the services can be stored in a map, you could create a map with a ttl of e.g. a few minutes and each member needs to refresh its service to prevent the services from expiring.
An alternative solution is to listen to member changes using the membershiplistener and once a member leaves, the services that belong to that member need to be removed from the map.
If you don't like none of this, you could create your own SPI based implementation. The SPI is the lower level infrastructure used by hazelcast to create its distributed datastructures. A lot more work, but also a lot of flexibility.
So there are many solutions.
We want to use Milton WebDav to transfer files in our web application which eventually is going to be deployed on cloud environment (most likely azure) as IaaS.
Now we are aware that WebDAV standard is stateless and hence it should not create any problems with cloud load balancer, but what we are not sure about Milton and have few questions:
1.) Is Milton implemented WebDAV as it is, do all the communication remains stateless? I assume that it passes Authentication token with every request but I am not sure where is the token stored at server? Does it store it in the database or some sort of cache etc.?
2.) Do locking mechanism works fine if a load balance is used and there are 5-6 servers to handle the load? Again where does Milton server store Lock Token?
Sorry for the late comment, the two most important aspects of webdav which affect load balancing are digest authentication tokens (Nonce values) and lock tokens.
As the Resource implementor you get to control both of those. Lock tokens are typically stored in a database (you must implement the methods on LockableResource which will do the persistence) so will be shared across servers, although its not uncommon to use memory based lock tokens, in which case you need to find some way to share that information across servers.
Digest nonces are only a consideration if you've implemented DigestResource. The default NonceProvider uses a simple HashMap so this will not be shared across servers. But the interface is trivial so you can easily implement a database store. If your load-balancing solution uses sticky sessions then that won't be an issue because clients will go to the server which has their nonce.
Note that Tomcat session replication won't help with the above issues, because webdav clients typically dont support cookies, so there is no Servlet session.
I have never used Milton WebDAV before but from the looks of it, it is used to modify and edit files on a server.
However Azure's local storage is not shared. Each instance is a completely seperate server. If you modify a file on 1 server, it will not be replicated to the next.
Azure works by uploading a deployment package. When a new instance needs to come up it uses the deployment package and starts a completely new server.
From a your perspective they don't share anything in common. Because of this you will never know which server you are hitting.
If you have a shared file storage system behind, then it may be a different story. However that scenario looks odd from using Azure. Amazon EC2 with a shared EBS might do it though.