I'm just getting started with CouchDB and looking for some best practices. My current project is a CMS/Wiki-like tool that contains many pages of content. So far, this seems to fit well with CouchDB. The next thing I want to do is track every time a page on the site is accessed.
Each access log should contain the timestamp, the URI of the page that was accessed and the UUID of the user who accessed it. How is the best way to structure this access log information in CouchDB? It's likely that any given page will be accessed up to 100 times per day.
A couple thoughts I've had so far:
1 CouchDB document per page that contains ALL access logs.
1 CouchDB document per log.
If it's one document per log, should all the logs be in their own CouchDB database to keep the main DB cleaner?
Definitely not 1st option. Because CouchDb is an append only storage, each time you update document, new document with same ID but different revision is created. And if you have 100 hits for a page in a day 100 new documents will be created, as a result you database will quickly get huge. So its better to use your second option.
As for the separate database for logs, it depends on your data and how you plan to use it. You can create separate view just for your logs if you decide to keep all your data in same place.
Related
I have a firebase storage bucket set up for the primary purpose of storing user's profile pictures. Fetching the profile picture of the currentUser is simple, as I know the .uid. However, fetching the profile pictures for other users is not so straightforward as that first requires a query to my actual database (in this case a graph database) before I can even begin fetching their images. This process is aggravated by my backend having a three tier architecture.
So my current process is this:
get request to Node.js backend
Node.js queries graph database
Node.js sends data to frontend
frontend iteratively fetches profile pictures from other user's uid
What seems slow is the fact that my frontend has to wait for the other uids before it can even begin fetching the images. Is this unavoidable? Ideally, the images would be fetched concurrently with the info about the users.
The title here is Firebase fetching other user's Images efficiently but you're using a non-firebase database which makes it a little difficult.
The way I believe you could handle this in Firebase/Firestore would be to have duplicate data (pretty common with NoSQL databases).
Example:
Say you have a timeline feed, you probably wouldn't query the list of posts and then query user info from each of the posts. Instead, I would have a list of timeline posts for a given UID (the customer accessing the system right now), that list would include all the details needed to display the feed without another query. This could be users names, post description, and a link to their pictures based of a known bucket path to a bucket and directory structure and the UIDs. Something like gs://<my-bucket>/user-images/<a-uid>.jpg. Again, I don't have much exposure to graph databases so not sure how applicable the technique is there but I believe it could work the same.
I have created a logic app with the goal of pulling data from a container within cosmosdb (with a query), looping over the results and then pushing this data into CRM (or Common Data Service). When the data is pushed to CRM, an ID will be generated. I wish to then update cosmosdb with this new ID. Here is what I have so far:
This next step is querying for the data within our cosmosdb database and selecting all IDS with a length that is greater than 15. (This tells us that the ID is not yet within the CRM database)
Then we loop over the results and push this into CRM (Dynamics365 or the Common Data Service)
Dilemma: The first part of this process appears to be correct, however, I want to make sure that I am on the right track with this. Furthermore, once the data is successfully pushed to CRM, CRM automatically generates an ID for each record. How would I then update cosmosDB with the newly generated IDs?
Any suggestion is appreciated
Thanks
I see a red flag in your approach here with this query with length(c.id) > 15. This is not something I would do. I don't know how big your database is going to be but generally not very performant to do high volumes of cross partition queries, especially if the database is going to keep growing.
Cosmos DB already provides an awesome streaming capability so rather than doing this in a batch I would use Change Feed and use that to accomplish whatever your doing here in your Logic App. This will likely give you better control of the process and likely allow you to get the id back out of your CRM app to insert back into Cosmos DB.
Because you will be writing back to Cosmos DB, you will need a flag to ignore the update in Change Feed when the item is updated.
I'm trying to choose between two patterns or maybe even another one that I have yet to consider for handling logging in my application.
I have a nodejs express server serving clients in an auto scaling group.
The goal is to ideally be able to see each user's activity very easily so that I can trouble shoot in production.
Approach 1, centralized logging using ELK to query based on certain json fields such as customerId requestId etc.
Approach 2, create a log filer per customer and query each file as needed.
In both approaches, log files will be rotated.
Creating a log file per customer just doesn't feel right to me especially when considering the scenarios of having millions of customers but in terms of performance...
query 1 million files based on customer ID then subsequently query a much smaller file for the information you need
OR
query centralized log file filtering results based on customerID etc.
Is one approach significantly better in performance than the other? What is the best practice in the industry at the moment for this scenario and is there a better approach to consider?
Lastly AWS Services seem to charge based on file size that you are querying. As such would one approach be more cost effective than the other?
I have a scenario where there are multiple (~1000 - 5000) databases being created dynamically in CouchDB, similar to the "one database per user" strategy. Whenever a user creates a document in any DB, I need to hit an existing API and update that document. This need not be synchronous. A short delay is acceptable. I have thought of two ways to solve this:
Continuously listen to the changes feed of the _global_changes database.
Get the db name which was updated from the feed.
Call the /{db}/_changes API with the seq (stored in redis).
Fetch the changed document, call my external API and update the document
Continuously replicate all databases into a single database.
Listen to the /_changes feed of this database.
Fetch the changed document, call my external API and update the document in the original database (I can easily keep a track of which document originally belongs to which database)
Questions:
Does any of the above make sense? Will it scale to 5000 databases?
How do I handle failures? It is critical that the API be hit for all documents.
Thanks!
I'm working on an e-commerce website project. I want to count views on each product and display on the single product display page. I know it can be easily implemented by adding a count into express routes and then load into database.
But it will be a burden for the DB connection if for each view I need to connect to the DB and increment the index.
I have a second solution but not sure if it is a better solution since I didn't have any experience on these fields.
The solution is : use a variable to count number of views for each item, and send a query every day to record this variable, or load into a json file every X (minutes/hours..)
What is the best way to count these stuff without sacrificing the performance of the website?
Any suggestions?
I would store the counter against each endpoint in a Redis server. It's in-memory so read/writes are fast. And you can persist it to disk too.
Check out this redis client for Node.js.