CouchDB V2.0 required hardware and software configuration - couchdb

I want to deploy a couchDB V2 to manage a DataBase with 30 Terabyte. Can you please suggest me the minimal hardware configuration ?
- Number of server
- Number of nodes
- Number of cluster
- Number of replication
- Size of disk per couchDB instance
- etc.
Thanks !

You want 3 servers minimum due to quorum. Other than that, I would recommend at least 2 clusters of 3. If you want to be geographically dispersed, then you want a cluster of 3 in each location. I think those are the basic rules.

If its a single database with 30 TB, I think there needs to some way to avoid it... Here are some ideas:
Look at the nature of the docs stored in it and see if you can move out the type of doc that are frequently accessed to a different db and change the application for using it.
As suggested by fred above, the 3 servers and multiple clusters
Backup and recovery - If the database is 30TB, the backup would also take the same space. You might want the backup normally in a different datacenter. Replication for 30 TB will take a lot of time.
Read the docs of CouchDB on how deletion happens, you might want to use the filtered replication which will again take more space.
Keeping the above points in mind you might want 3 servers as suggested by fred to run couchdb for your business and more servers to maintain backups and doc deletions over a long time.

Related

Data analysis of prod Scylla DB without hitting prod?

Is there a way to extract data from a Scylla database for use in data analysis without directly querying from the production DB?
I want to do large queries against the data but don't want to take down production
The typical way folks accomplish this, is to build an “analytics DC” in the same physical DC.
So if (one of) your prod DCs is named “dc-west” you would create a new one named “dc-west-analytics” or something like that. Once the new DCs nodes are out there, change the keyspace to replicate to it. Run a repair on the new DC, and it should be ready for use.
On the app side or wherever the queries are running from, make sure it uses the LOCAL consistency levels and points to “dc-west-analytics” as its “local” DC.
In ScyllaDB Enterprise, a feature called Workload Prioritization allows you to assign CPU and I/O shares to your analytics and production workloads, isolating them from each other.

Azure Cosmos DB: How to create read replicas for a specific container

In Azure Cosmos DB, is it possible to create multiple read replicas at a database / container / partition key level to increase read throughput? I have several containers that will need more than 10K RU/s per logical partition key, and re-designing my partition key logic is not an option right now. Thus, I'm thinking of replicating data (eventual consistency is fine) several times.
I know Azure offers global distribution with Cosmos DB, but what I'm looking for is replication within the same region and ideally not a full database replication but a container replication. A container-level replication will be more cost effective since I don't need to replicate most containers and I need to replicate the others up to 10 times.
Few options available though:
Within same region no replication option but you could use the Change Feed to replicate to another DB with the re-design in mind just for the purpose of using with read queries. Though it might be a better idea to either use the serverless option which is in preview or use the auto scale option as well. But you can also look at provisioned throughput and reserve the provisioned RUs for 1 year or 3 year and pay monthly just as you would in PAYG model but with a huge discount. One option would also be to do a latency test from the VM in the region where your main DB and app is running and finding out the closest region w.r.t Latency (ms) and then if the latency is bearable then you can use global replication to that region and start using it. I use this tool for latency tests but run it from the VM within the region where your app\DB is running.
My guess is your queries are all cross-partition and each query is consuming a ton of RU/s. Unfortunately there is no feature in Cosmos DB to help in the way you're asking.
Your options are to create more containers and use change feed to replicate data to them in-region and then add some sort of routing mechanism to route requests in your app. You will of course only get eventual consistency with this so if you have high concurrency needs this won't work. Your only real option then is to address the scalability issues with your design today.
There is something that can help is this live data migrator. You can use this to keep a second container in sync with your original that will allow you to eventually migrate off your first design to another that scales better.

How do I determine the number of Node Types, Number of nodes and VM size in Service Fabric cluster for a relatively simple but high throughput API?

I have an Asp.Net core 2.0 Wen API that has a relatively simple logic (simple select on a SQL Azure DB, return about 1000-2000 records. No joins, aggregates, functions etc.). I have only 1 GET API. which is called from an angular SPA. Both are deployed in service fabric as as stateless services, hosted in Kestrel as self hosting exes.
considering the number of users and how often they refresh, I've determined there will be around 15000 requests per minute. in other words 250 req/sec.
I'm trying to understand the different settings when creating my Service Fabric cluster.
I want to know:
How many Node Types? (I've determined as Front-End, and Back-End)
How many nodes per node type?
What is the VM size I need to select?
I have ready the azure documentation on cluster capacity planning. while I understand the concepts, I don't have a frame of reference to determine the actual values i need to provide to the above questions.
In most places where you read about the planning of a cluster they will suggest that this subject is part science and part art, because there is no easy answer to this question. It's hard to answer it because it depends a lot on the complexity of your application, without knowing the internals on how it works we can only guess a solution.
Based on your questions the best guidance I can give you is, Measure first, Measure again, Measure... Plan later. Your application might be memory intensive, network intensive, CPU, Disk and son on, the only way to find the best configuration is when you understand it.
To understand your application before you make any decision on SF structure, you could simply deploy a simple cluster with multiple node types containing one node of each VM size and measure your application behavior on each of them, and then you would add more nodes and span multiple instances of your service on these nodes and see which configuration is a best fit for each service.
1.How many Node Types?
I like to map node types as 1:1 to roles on your application, but is not a law, it will depend how much resource each service will consume, if the service consume enough resource to make a single VM(node) busy (Memory, CPU, Disk, IO), this is a good candidate to have it's own node type, in other cases there are services that are light-weight that would be a waste of resources provisioning an entire VM(node) just for it, an example is scheduled jobs, backups, and so on, In this case you could provision a set of machines that could be shared for these services, one important thing you have to keep in mind when you share a node-type with multiple service is that they will compete for resources(memory, CPU, network, disk) and the performance measures you took for each service in isolation might not be the same anymore, so they would require more resources, the option is test them together.
Another point is the number of replicas, having a single instance of your service is not reliable, so you would have to create replicas of it(the right number I describe on next answer), in this case you end up with a service load split in to multiple nodes, making this node-type under utilized, is where you would consider joining services on same node-type.
2.How many nodes per node type?
As stated before, it will depend on your service resource consumption, but a very basic rule is a minimum of 3 per node type.
Why 3?
Because 3 is the lowest number where you could have a rolling update and guarantee a quorum of 51% of nodes\service\instances running.
1 Node: If you have a service running 1 instance in a node-type of 1 node, when you deploy a new version of your service, you would have to bring down this instance before the new comes up, so you would not have any instance to serve the load while upgrading.
2 Nodes: Similar to 1 node, but in this case you keep only 1 node running, in case of failure, you wouldn't have a failover to handle the load until the new instance come up, it will worse if you are running a stateful service, because you will have only one copy of your data during the upgrade and in case of failure you might loose data.
3 Nodes: During a update you still have 2 nodes available, when the one being updated get back, the next one is put down and you still have 2 nodes running, in case of failure of one node, the other node can support the load until a new node is deployed.
3 nodes does not mean the your cluster will be highly reliable, it means the chances of failure and data loss will be lower, you might be unlucky a loose 2 nodes at same time. As suggested in the docs, in production is better to always keep the number of nodes as 5 or more, and plan to have a quorum of 51% nodes\services available. In this case I would recommend 5, 7 or 9 nodes in cases you really need higher uptime 99.9999...%
3.What is the VM size I need to select?
As said before, only measurements will give this answer.
Observations:
These recommendations does not take into account the planning for primary node types, it is recommended to have at least 5 nodes on primary Node Types, it is where SF system services are placed, they are responsible to manage the
cluster, so they must be highly reliable, otherwise you risk losing control of your cluster. If you plan to share these nodes with your application services, keep in mind that your services might impact them, so you have to always monitor them to check for any impact it might cause.

Need help in setting production environment for 300 million records elasticsearch

I am new to elastic search. I am doing POC on set-up production environment. I need help to do this.
1) What are the production parameters we need to consider when setting the environment ?
3) what are all the watermark need to set-up production ready environment ?
There are two process running live server(improve performance Ex: performance 20 to 40 milliseconds), Batch process server (improve throughput. Ex: In 1 hour 1 server will serve 200 transaction).
live server will have 8 dedicated server nodes. Batch Process will have 12 servers.
How to distribute request between live server and Batch node to not compromise live server performance while Batch in progress. How to scale up application without performance compromise.
Live server transaction 250K/hour in single server (We have 8 Online servers)
Batch process 1MN/hour in single server ( we have 8 Batch servers)
What are all the requirements needed for the above scenario for setting production environment ?
Firstly I would say definitely read the Elasticsearch: The Definitive Guide chapter on Production Deployment:
https://www.elastic.co/guide/en/elasticsearch/guide/current/deploy.html
There are far too many factors that contribute to needed settings and deployment topology to provide an exhaustive answer. Indexing and serving 300 million tweets is a lot different than 300 million science papers. Fulltext search is very different from numeric aggregations and analytics. Really the only way to know for sure is for you to test it (and monitor it!), using as close to real data and access patterns as you can.
Also, I'm a little bit confused between your "batch" and "live" servers, but you have several options for mixed workloads. For the best isolation, use two completely separate clusters. Alternatively, if you have separate indices for "batch" and "live" but want to be able to move an index from "live" to "batch", you can use Shard Allocation Filtering to control which servers each indices' shards go on. This will separate the data, and in many scenarios this is sufficient. The nice thing is those rules can be changed dynamically and ES will move around your shards to match.
Depending on your workloads, the client (coordinator) role might have troubles if a "live" request hits a "batch" client node, and in this case it can be useful to have one or more client only (no data) nodes allocated to just serving the "live" requests (direct your "live" application to these nodes), and another allocated to just serving the "batch" requests (direct your "batch" application to these nodes). Your options here depend on the type of client also.
Unfortunately your question is pretty generic, and the answer has to be "test it with your specific workload"

Whether replicationFactor=2 makes sense in SolrCloud?

We are trying to building our solr cloud servers, we want to increase replicationFactor, but don't want to set it as 3 as we have a lot of data.
So I am wondering whether it makes sense to set replicationFactor as 2, and what's the impact, whether this will cause problem for replica leader election such as split brain etc?
Thanks
replicationFactor will not affect whether a split brain situation arises or not. The cluster details are stored in Zookeeper. As long as you have a working Zookeper ensemble Solr will not have this issue. This means you should make sure you have 2xF+1 zookeper nodes (minimum 3) .
From zookeeper documentation:
For the ZooKeeper service to be active, there must be a majority of
non-failing machines that can communicate with each other.
To create a deployment that can tolerate the failure of F machines,
you should count on deploying 2xF+1 machines.
Here are some links explaining it further:
http://lucene.472066.n3.nabble.com/SolrCloud-and-split-brain-tp3989857p3989868.html
http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300206.html

Resources