Cassandra: customer data per keyspace

Cassandra: customer data per keyspace - cassandra

Problem: one of our new customers want the data to be stored in his own country (law regulations). However we use existing customer's data spread across few datacenters in different countries.
Question: how we can separate new customer's data to reside in its own country without much changing existing Cassandra architecture?
Potential Solution #1: to use separate keyspace for this customer. Schemas will be the same between keyspaces what adds the complexity for data migration and so on. DataStax support confirmed that it is possible to configure keyspace per region.
However Spring Data Cassandra we use, doesn't allow to choose keyspace dynamically.
The only way is to use CqlTemplate and to run use keyspace blabla everytime before the call or to add keyspace before the table select * from blabla.mytable but it sounds as a hack for me.
Potential Solution #2 to use separate environment for new client but management rejects to do it.
Any other ways to achieve this goal?

Update 3
Example and explanation below is same as in GitHub
Update 2
The example in GitHub is now working. The most future proof solution seemed to be using repository extensions. Will update the example below soon.
Update
Notice that the solution I originally posted had some flaws that I discovered during JMeter tests. The Datastax Java driver reference advises to avoid setting keyspace through Session object. You have to set keyspace explicitly in every query.
I've updated the GitHub repository and also changed solution's description.
Be very careful though: if the session is shared by multiple threads,
switching the keyspace at runtime could easily cause unexpected query failures.
Generally, the recommended approach is to use a single session with no
keyspace, and prefix all your queries.
Solution Description
I would set-up a separate keyspace for this specific customer and provide support for changing keyspace in the application. We used this approach previously with RDBMS and JPA in production. So, I would say it can work with Cassandra as well. Solution was similar as below.
I will describe briefly how to prepare and set-up Spring Data Cassandra to configure target keyspace on each request.
Step 1: Preparing your services
I would define first how to set the tenant ID on each request. A good example would be in-case-of REST API is to use a specific HTTP header that defines it:
Tenant-Id: ACME
Similarly on every remote protocol you can forward tenant ID on every message. Let's say if you're using AMQP or JMS, you can forward this inside message header or properties.
Step 2: Getting tenant ID in application
Next, you should store the incoming header on each request inside your controllers. You can use ThreadLocal or you can try using a request-scoped bean.
#Component
#Scope(scopeName = "request", proxyMode= ScopedProxyMode.TARGET_CLASS)
public class TenantId {
private String tenantId;
public void set(String id) {
this.tenantId = id;
}
public String get() {
return tenantId;
}
}
#RestController
public class UserController {
#Autowired
private UserRepository userRepo;
#Autowired
private TenantId tenantId;
#RequestMapping(value = "/userByName")
public ResponseEntity<String> getUserByUsername(
#RequestHeader("Tenant-ID") String tenantId,
#RequestParam String username) {
// Setting the tenant ID
this.tenantId.set(tenantId);
// Finding user
User user = userRepo.findOne(username);
return new ResponseEntity<>(user.getUsername(), HttpStatus.OK);
}
}
Step 3: Setting tenant ID in data-access layer
Finally you should extend Repository implementations and set-up keyspace according to the tenant ID
public class KeyspaceAwareCassandraRepository<T, ID extends Serializable>
extends SimpleCassandraRepository<T, ID> {
private final CassandraEntityInformation<T, ID> metadata;
private final CassandraOperations operations;
#Autowired
private TenantId tenantId;
public KeyspaceAwareCassandraRepository(
CassandraEntityInformation<T, ID> metadata,
CassandraOperations operations) {
super(metadata, operations);
this.metadata = metadata;
this.operations = operations;
}
private void injectDependencies() {
SpringBeanAutowiringSupport
.processInjectionBasedOnServletContext(this,
getServletContext());
}
private ServletContext getServletContext() {
return ((ServletRequestAttributes) RequestContextHolder.getRequestAttributes())
.getRequest().getServletContext();
}
#Override
public T findOne(ID id) {
injectDependencies();
CqlIdentifier primaryKey = operations.getConverter()
.getMappingContext()
.getPersistentEntity(metadata.getJavaType())
.getIdProperty().getColumnName();
Select select = QueryBuilder.select().all()
.from(tenantId.get(),
metadata.getTableName().toCql())
.where(QueryBuilder.eq(primaryKey.toString(), id))
.limit(1);
return operations.selectOne(select, metadata.getJavaType());
}
// All other overrides should be similar
}
#SpringBootApplication
#EnableCassandraRepositories(repositoryBaseClass = KeyspaceAwareCassandraRepository.class)
public class DemoApplication {
...
}
Let me know if there are any issues with the code above.
Sample code in GitHub
https://github.com/gitaroktato/spring-boot-cassandra-multitenant-example
References
Example: Using JPA interceptors
Spring #RequestHeader example
Spring request-scoped beans
Datastax Java Driver Reference

After many back and forth, we have decided to not do the dynamic keyspace resolution within the same JVM.
It was taken the decision to have dedicated Jetty/Tomcat per keyspace and on nginx router level to define to what server the request should be redirected to (based on companyId from the request url).
For example, all our endpoints have /companyId/<value> so based on the value, we can redirect the request to proper server which uses correct keyspace.

Advice with 2 keyspaces is correct.
If question is about having just 2 keyspaces why not configure 2 keyspaces.
for Region Dependent client - write to both
for others - write to one (main) keyspace only.
No data migration will be required.
Here is sample how to configure Spring Repositories to hit different keyspaces:
http://valchkou.com/spring-boot-cassandra.html#multikeyspace
the choice of repository can be simple if else
if (org in (1,2,3)) {
repoA.save(entity)
repoB.save(entity)
} else {
repoA.save(entity)
}

Related

Why is data access tightly coupled to the Service base in ServiceStack

I'm curious why the decision was made to couple the Service base class in ServiceStack to data access (via the Db property)? With web services it is very popular to use a Data Repository pattern to fetch the raw data from the database. These data repositories can be used by many services without having to call a service class.
For example, let's say I am supporting a large retail chain that operates across the nation. There are a number of settings that will differ across all stores like tax rates. Each call to one of the web services will need these settings for domain logic. In a repository pattern I would simply create a data access class whose sole responsibility is to return these settings. However in ServiceStack I am exposing these settings as a Service (which it needs to be as well). In my service call the first thing I end up doing is newing up the Setting service and using it inside my other service. Is this the intention? Since the services return an object I have to cast the result to the typed service result.

ServiceStack convenience ADO.NET IDbConnection Db property allows you to quickly create Database driven services (i.e. the most popular kind) without the overhead and boilerplate of creating a repository if preferred. As ServiceStack Services are already testable and the DTO pattern provides a clean endpoint agnostic Web Service interface, there's often not a lot of value in wrapping and proxying "one-off" data-access into a separate repository.
But at the same time there's nothing forcing you to use the base.Db property, (which has no effect if unused). The Unit Testing Example on the wiki shows an example of using either base.Db or Repository pattern:
public class SimpleService : Service
{
public IRockstarRepository RockstarRepository { get; set; }
public List<Rockstar> Get(FindRockstars request)
{
return request.Aged.HasValue
? Db.Select<Rockstar>(q => q.Age == request.Aged.Value)
: Db.Select<Rockstar>();
}
public RockstarStatus Get(GetStatus request)
{
var rockstar = RockstarRepository.GetByLastName(request.LastName);
if (rockstar == null)
throw HttpError.NotFound("'{0}' is no Rockstar".Fmt(request.LastName));
var status = new RockstarStatus
{
Alive = RockstarRepository.IsAlive(request.LastName)
}.PopulateWith(rockstar); //Populates with matching fields
return status;
}
}
Note: Returning an object or a strong-typed DTO response like RockstarStatus have the same effect in ServiceStack, so if preferred you can return a strong typed response and avoid any casting.

How to specify and organize OXM_METADATA_SOURCE in glassfish v4 MOXy Provider？

I am a fan of both Glassfish and MOXy, and it's good news for me that MOXy had been bundled into Glassfish v4.
I had read and tried a few of MOXy examples on the internet, I like the dynamic OXM_META_DATA_SOURCE part, since while providing RESTful services, the "client perspective" is very flexible than domain classes.
So here is the problem:
Different RESTful services can have different views from same domain classes, and in my work it's very common case. So there can be a lot of binding OXM metadata files for every service. And as we know a single OXM metadata file can only correspond to a single java package. So there will be much more OXM metadata files to maintain.
Back to JAX-RS, Is there any framework to design patterns or best practices to finish the mapping between OXM metadata file set and the service itself?

You can try new feature called Entity Filtering which has been introduced in Jersey 2.3. Even though Entity Filtering is not based on OXM_META_DATA_SOURCE you can achieve your goal with it:
Let's assume you have a following domain class (annotations are custom entity-filtering annotations):
public class Project {
private Long id;
private String name;
private String description;
#ProjectDetailedView
private List<Task> tasks;
#ProjectAnotherDetailedView
private List<User> users;
// ...
}
And, of course, some JAX-RS resources, i.e.:
#Path("projects")
#Produces("application/json")
public class ProjectsResource {
#GET
#Path("{id}")
public Project getProject(#PathParam("id") final Long id) {
return ...;
}
// ...
}
Now, we have 2 detailed views defined on domain class (via annotations) and the resource class. If you annotate getProject resource method with:
#ProjectDetailedView - returned entity would contain id, name, description AND a list of tasks from Project
#ProjectAnotherDetailedView - returned entity would contain id, name, description AND a list of users from Project
If you leave the resource method un-annotated the resulting entity would contain only: id, name, description.
You can find more information about Entity Filtering in the User Guide or you can directly try it in our example: entity-filtering.
Note 1: Entity Filtering works only with JSON media type (via MOXy) at the moment. Support for other media types / providers is planned to be added in the future.
Note 2: Jersey 2.3 is not integrated into any (promoted) build of GF 4.0. The next Jersey version that should be part of GF 4.0 is 2.4. We plan to release 2.4 in the next few weeks.

NServiceBus Unit of Work For Multitenancy with Custom ORM

Here are my parameters:
Simple NServiceBus Saga implementation using the default builder
In-house ORM on top of SQL Server
Multitenancy - I have two ASP.NET MVC 4 domains running on the same website, each with their own databases
We configure our ORM using a static method like so:
public class EndpointConfig: IConfigureThisEndpoint, IWantCustomInitialization {
public void Init() {
var bus = Configure.With()
.AutofacBuilder()
.UnicastBus().LoadMessageHandlers().DoNotAutoSubscribe()
.XmlSerializer()
.MsmqTransport().IsTransactional(true).PurgeOnStartup(false)
.MsmqSubscriptionStorage()
.Sagas().RavenSagaPersister().InstallRavenIfNeeded()
.UseInMemoryTimeoutPersister()
.CreateBus()
.Start();
SlenderConfiguration.Init(bus);
}
}
public class SlenderCofnigruation {
private static ORMScope scope { get; set; }
public static void Init(IBus bus)
{
ORMConfig.GetScope = () =>
{
var environment = "dev";
if (bus.CurrentMessageContext.Headers.ContainsKey("Environment"))
environment = bus.CurrentMessageContext.Headers["Environment"];
if (scope == null)
scope = new SlenderScope(ConfigurationManager.ConnectionStrings[environment].ConnectionString);
return scope;
};
}
}
This works fine in our single-tenant Beta environment - it's fine for that static scope to get re-used because the environment header is always the same for a given deployment.
It's my understanding that this won't work for the multitenant situation described above, because NServiceBus will reuse threads across messages. The same scope would then be used, causing problems if the message was intended for a different environment.
What I think I want is a single scope per message, but I'm really not sure how to get there.
I've seen Unit Of Work Implementation for RavenDB, and the unit of work implementation in the full duplex sample, but I'm not sure that's the right path.
I've also seen the DependencyLifecycle enum, but I'm not sure how I can use that to resolve the scope given the way I have to set up the GetScope func.
Obviously I have no idea what's going on here. Any suggestions?

If you need to do something on a per-message basis, consider using message mutators (IMutateIncomingMessages) in addition to your unit-of-work management with some thread-static state.

Plugin bypassed when Entity queried from console application

My plugin encrypts/decrypts a field. Works on the field within a CRM form.
From my console application, a retrieve bypasses my plugin, e.g., it retrieves the encrypted value directly from the database without running the plugin. When debugging, breakpoints in the plugin are hit when the field is accessed from a form , but they are not hit when accessed from my console program.
I'm surprised that my plugin isn't invoked from a program. It bypasses my business rules.
Here is how I'm accessing the entity and the field from a program:
private static OrganizationServiceProxy service = null;
private static OrganizationServiceContext orgSvcContext = null;
public static void RetrieveSSNs()
{
var query = orgSvcContext.CreateQuery("bpa_consumer");
foreach (Entity consumer in query)
{
if (consumer.Attributes.Contains("bpa_ssn"))
{
string ssn = consumer["bpa_ssn"].ToString();
Console.WriteLine(string.Format("Consumer \"{0}\" has SSN {1}", consumer.Attributes["bpa_name"], ssn));
}
else
{
Console.WriteLine(string.Format("Consumer \"{0}\" doesn't have a SSN", consumer.Attributes["bpa_name"]));
}
}
}

I'm guessing you have the plugin registered on the Retrieve method? If so, add another identical registration on the RetrieveMultiple. This should get your plugin to execute on your foreach. I should warn you that this is an extremely dangerous thing to do from a performance standpoint though...

If you are concerned about performance my recommendation is to put the encrypted data into a separate entity with a lookup back. Using this method CRM only has to execute the Retrieve/RetrieveMultiple plug-in when a user needs to access the encrypted data, not every time a user accesses the primary record. This will also make it easier to secure the encrypted data.

Turns out the you must register your plugin for the event RetrieveMultiple when you query for a collection of Entities.

Do I must expose the aggregate children as public properties to implement the Persistence ignorance?

I'm very glad that i found this website recently, I've learned a lot from here.
I'm from China, and my English is not so good. But i will try to express myself what i want to say.
Recently, I've started learning about Domain Driven Design, and I'm very interested about it. And I plan to develop a Forum website using DDD.
After reading lots of threads from here, I understood that persistence ignorance is a good practice.
Currently, I have two questions about what I'm thinking for a long time.
Should the domain object interact with repository to get/save data?
If the domain object doesn't use repository, then how does the Infrastructure layer (like unit of work) know which domain object is new/modified/removed?
For the second question. There's an example code:
Suppose i have a user class:
public class User
{
public Guid Id { get; set; }
public string UserName { get; set; }
public string NickName { get; set; }
/// <summary>
/// A Roles collection which represents the current user's owned roles.
/// But here i don't want to use the public property to expose it.
/// Instead, i use the below methods to implement.
/// </summary>
//public IList<Role> Roles { get; set; }
private List<Role> roles = new List<Role>();
public IList<Role> GetRoles()
{
return roles;
}
public void AddRole(Role role)
{
roles.Add(role);
}
public void RemoveRole(Role role)
{
roles.Remove(role);
}
}
Based on the above User class, suppose i get an user from the IUserRepository, and add an Role for it.
IUserRepository userRepository;
User user = userRepository.Get(Guid.NewGuid());
user.AddRole(new Role() { Name = "Administrator" });
In this case, i don't know how does the repository or unit of work can know that user has a new role?
I think, a real persistence ignorance ORM framework should support POCO, and any changes occurs on the POCO itself, the persistence framework should know automatically. Even if change the object status through the method(AddRole, RemoveRole) like the above example.
I know a lot of ORM can automatically persistent the changes if i use the Roles property, but sometimes i don't like this way because of the performance reason.
Could anyone give me some ideas for this? Thanks.
This is my first question on this site. I hope my English can be understood.
Any answers will be very appreciated.

Should the domain object interact with repository to get/save data?
No, it should not. Reason for that is simple - encapsulation. If we take away everything persistence related from our domain model, we can describe our domain much clearer.
If the domain object doesn't use repository, then how does the Infrastructure layer (like unit of work) know which domain object is new/modified/removed?
Simplest version is - it doesn't. You retrieve and save it back (after operation on aggregate has completed) as a whole:
var user = users.Find(guid);
user.AssignRole(Role.Administrator);
users.Save(user);
I personally rely on NHibernate - it tracks changes itself. If I optimize queries with proper eager/lazy loading, save changes only on http request end, don't forget about transactions, use caching - there is no performance penalty. But for a price - it takes some knowledge to handle that.
One more thing - think twice before using domain driven design for development of forum. This approach fits only for unknown (yet) and complex business domains. It's an overkill for simple applications.
And another thing - stop being ashamed of Your English. It will get better in no time. :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra: customer data per keyspace - cassandra

Related

Why is data access tightly coupled to the Service base in ServiceStack

How to specify and organize OXM_METADATA_SOURCE in glassfish v4 MOXy Provider？

NServiceBus Unit of Work For Multitenancy with Custom ORM

Plugin bypassed when Entity queried from console application

Do I must expose the aggregate children as public properties to implement the Persistence ignorance?

Categories

Resources