Opencart *Multistore with shared DB* and 1000 concurrent req/store VS "1000 Seperate Stores" with 1 concurrent request/store - opencart2.x

I'm planning to run a multistore using open-cart with more than 1000 stores.
I realized that by using Multistore all the stores will be on a single database.
Another way is to install each store separately with separate DB.
Which one is better for about 1000 stores with 100 products in each store and 1 concurrent request/store ? "Multistore with shared DB" or "seperate Stores with seperate databases"?
If i had separate database for each store, a search query in each store was an easy query. But if all the stores have single shared database, is this simple search query as easy as before?

Related

MongoDB indexing IdAccount field in order to make searches by account efficient

Currently I store data on my MongoDB like so:
1 database per customer
each customer has about 40 collections
a total of 50 customers (there are 40*50=2000 collections)
database size is 3 GB for all 50 customers
My MongoDB service crashes with the error
"Too many open files"
and MongoDB "Too Many Open Files"? Raise the limit shows how to solve the problem but I am still getting crashes.
The error "Too many open files" means that MongoDB process has too many files in use and the OS is complaining. This is how I know MongoDB is using a lot of files:
Obtain the processes id of MongoDB with service mongodb status. The process id will show in the info
Then to get the files used by MongoDB I use this command lsof -a -p <ProcessId>
When I run that command I see 1010 files being used by that processes!
The more customer databases I create the larger that number becomes! So I guess my solution is combining all the databases into one. If I do that I will have to add the column AccountId to all my collections. If I make that change what index should I assign to AccountId so that my searches can be efficient? For example I will like to get all PurchaseOrders where IdAccount=34 fast. Performing this change is something you guys recommend? Should I place all 50 Databases into one?
PS: On a different Linux computer I created a MongoDB database with only 1 database and 40 collections. I filled the 40 collections with 6 GB of data (double of what I have now). MongoDB used 200 files even though this database is twice as big!
The same day after posting this question I combined all the databases into one. Moreover, I added the follwoing indexes:
db.CollectionA.createIndex({Id_Account:1})
db.CollectionB.createIndex({Id_Account:1})
// etc...
To prove that my queries continued to be as efficient as before I do:
db.getCollection('CollectionA').find({"Id_Account":28}).explain("executionStats")
That query gives the execution stats. It tells you how many documents it searched for and how many matched. Using NO index will result in scanning the whole collection every time I did find({"Id_Account":28})
Mongo has not crashed so far and it never locks more than 300 files. From know on I will always use the same database instead of having multiple databases.

Loopback - Bulk Insert with a lot of records

Am currently running into memory usage issues when bulk inserting a lot of records. I'm attempting to grab data from an external API, format into an array which matches the structure of one of my tables, and then insert that using the create method on my model. Is there a loopback way to queue up records for insertion (so that they can be inserted in chunks) so I'm not killing my server? Would be great to not have to hack around with timers and such.
Postgres is my backend DB, if that matters.
How many columns are you inserting? Have you added indexes to any of them? If you're doing any upserts, indexes can speed up updates from minutes to seconds.
If the bottleneck is the node server, does the external api support pagination? If so, you can insert one page at a time and wait for a response from the database before inserting the next page of data.

is it good to use different collections in a database in mongodb

I am going to do a project using nodejs and mongodb. We are designing the schema of database, we are not sure that whether we need to use different collections or same collection to store the data. Because each has its own pros and cons.
If we use single collection, whenever the database is invoked, total collection will be loaded into memory which reduces the RAM capacity.If we use different collections then to retrieve data we need to write different queries. By using one collection retrieving will be easy and by using different collections application will become faster. We are confused whether to use single collection or multiple collections. Please Guide me which one is better.
Usually you use different collections for different things. For example when you have users and articles in the systems, you usually create a "users" collection for users and "articles" collection for articles. You could create one collection called "objects" or something like that and put everything there but it would mean you would have to add some type fields and use it for searches and storage of data. You can use a single collection in the database but it would make the usage more complicated. Of course it would let you to load the entire collection at once but whether or not it is relevant for the performance of your application, that is something that would have to be profiled and tested to give your the performance impact for your particular use case.
Usually, developers create the different collection for different things. Like for post management, people create 'post' collection and save the posts in post collection and same for users and all.
Using different collection for different purpose is a good pratices.
MongoDB is great at scaling horizontally. It can shard a collection across a dynamic cluster to produce a fast, querable collection of your data.
So having a smaller collection size is not really a pro and I am not sure where this theory comes that it is, it isn't in SQL and it isn't in MongoDB. The performance of sharding, if done well, should be relative to the performance of querying a single small collection of data (with a small overhead). If it isn't then you have setup your sharding wrong.
MongoDB is not great at scaling vertically, as #Sushant quoted, the ns size of MongoDB would be a serious limitation here. One thing that quote does not mention is that index size and count also effect the ns size hence why it describes that:
By default MongoDB has a limit of approximately 24,000 namespaces per
database. Each namespace is 628 bytes, the .ns file is 16MB by
default.
Each collection counts as a namespace, as does each index. Thus if
every collection had one index, we can create up to 12,000
collections. The --nssize parameter allows you to increase this limit
(see below).
Be aware that there is a certain minimum overhead per collection -- a
few KB. Further, any index will require at least 8KB of data space as
the b-tree page size is 8KB. Certain operations can get slow if there
are a lot of collections and the meta data gets paged out.
So you won't be able to gracefully handle it if your users exceed the namespace limit. Also it won't be high on performance with the growth of your userbase.
UPDATE
For Mongodb 3.0 or above using WiredTiger storage engine, it will no longer be the limit.
Yes personally I think having multiple collections in a DB keeps it nice and clean. The only thing I would worry about is the size of the collections. Collections are used by a lot of developers to cut up their db into, for example, posts, comments, users.
Sorry about my grammar and lack of explanation I'm on my phone

How manage big data in MongoDb collections

I have a collection called data which is the destination of all the documents sent from many devices each n seconds.
What is the best practice to keep the collection alive in production without documents overflow?
How could I "clean" the collection and save the content in another one? Is it the correct way?
Thank you in advance.
You cannot overflow, if you use sharding you have almost unlimited space.
https://docs.mongodb.com/manual/reference/limits/#Sharding-Existing-Collection-Data-Size
Those are limits for single shard, and you have to start sharding before reaching them.
It depends on your architecture, however limit (in worst case) of 8.19200 exabytes (or 8,192,000 terabytes) is unreachable for most of even big data apps, if you multiply number of shard possible in replica set by max collection size in one of them.
See also:
What is the max size of collection in mongodb
Mongodb is a best database for storing large collection. You can do below steps for better performance.
Replication
Replication means copying your data several times on a single server or multiple server.
It provides a backup of your data every time when you insert data in your db.
Embedded document
Try to make your collection with refreences. It means that try to make refrences in your db.

Potential issue with Couchbase paging

It may be too much turkey over the holidays, but I've been thinking about a potential problem that we could have with Couchbase.
Currently we paginate based on time, but I'm thinking a similar issue could occur with other values used for paging for example the atomic counter. I'll try to explain best I can, this would only occur in a load balanced environment.
For example say we have 4 servers load balanced and storing data to our Couchbase cluster. We sort our records based on timestamps currently. If any of the 4 servers writing the data starts to lag behind the others than our pagination would possibly be missing records when retrieving client side. A SQL DB auto-increment and timestamps for example can be created when the record is stored to the DB which will avoid similar issues. Using a NoSql DB like Couchbase you define the data you need to retrieve on before it is stored to the DB. So what I am getting at is if there is a delay in storing to the DB and you are retrieving in a pagination fashion while this delay has occurred, you run the real possibility of missing data. Since we are paging that data may never be viewed.
Interested in what other thoughts people have on this.
EDIT**
Response to Andrew:
Example a facebook or pintrest type app is storing data to a DB, they have many load balanced servers from the frontend writing to the db. If for some reason writing is delayed its a non issue with a SQL DB because a timestamp or auto increment happens when the data is actually stored to the DB. There will be no missing data when paging. asking for 1-7 will give you data that is only stored in the DB, 7-* will contain anything that is delayed because an auto-increment value has not been created for that record becuase it is not actually stored.
In Couchbase its different, you actually get your auto increment value (atomic counter) and then save it. So for example say a record is going to be stored as atomic counter number 4. For some reasons this is delayed in storing to the DB. Other servers are grabbing 5, 6, 7 and storing that data just fine. The client now asks for all data between 1 and 7, 4 is still not stored. Then the next paging request is 7 to *. 4 will never be viewed.
Is there a way around this? Can it be modelled differently in CB, or is this just a potential weakness in CB when needing to page results. As I mentioned are paging is timestamp sensitive.
Michael,
Couchbase is an eventually consistent database with respect to views. It is ACID with respect to documents. There are durability interfaces that let you manage this. This means that you can rest assured you won't lose data and that indexes will catch up eventually.
In my experience with Couchbase, you need to expect that the nodes will never be in-sync. There are many things the database is doing, such as compaction and replication. The most important thing you can do to enhance performance is to put your views on a separate spindle from the data. And you need to ensure that your main data spindles across your cluster can sustain between 3-4 times your ingestion bandwidth. Also, make sure your main document key hashes appropriately to distribute the load.
It sounds like you are discussing a situation where the data exists in your system for less time than it takes to be processed through the view system. If you are removing data that fast, you need either a bigger cluster or faster disk arrays. Of the two choices, I would expand the size of your cluster. I like to think of Couchbase as building a RAIS, Redundant Array of Independent Servers. By expanding the cluster, you reduce the coincidence of hotspots and gain disk bandwidth. My ideal node has two local drives, one each for data and views, and enough RAM for my working set.
Anon,
Andrew

Resources