There is a SQL table contains Bids.
When first bids is inserted to table the downcounter starts. After some time, as instance 5 minutes I must aggregate all data and find the max price across bids.
I wonder how to trigger this event and send message to the Node service that should handle this?
Another directions when service asks each second DB and compares startDate, endDate and makes aggregate by sum.
Which is approach to choose?
What about create a cron UNIX task when bid is inserted to DB?
So, the bid continues according by configurated time in script. In my case it is 5 minutes. After the no one can not send own bid.
After I need to select all participant who made bids and aggregate max price across them.
Related
I have a time series data, for example myData, and display it to my ui, it could be Day, Week, Month, Year.
How it's better to store in MongoDB, should I create separate Collections for this like:
myDataDay
myDataWeek
...
or it's better to store it in one Collections with Day, Week, Month, Year keys?
How could it impact the performance?
You will need to answer following questions:
Number and type of paralel queries you send to the database?
Are there other fields that data will be searched on?
Are the 90% of all queries in the range of last year/month/date/hour or other?
If you split the data between many collections logic on app side will become more complex , from the other hand if you keep everything in same collection at some point in time your database will become bigger and more difficult to mantain...
You may take a look to the special collection types dedicated to time series data , but in general it very depends on the amount of data and and distribution you expect ...
How can we do relational update queries in mongodb.
We have the following scenario:
In a bank the deposits of the clients are managed, once a certain
period of time is fulfilled the bank returns the deposit to the
clients plus the accrued interest, a client can have several deposits.
we create the client's collection, with his name, and what he has
available to withdraw, we create the deposit collection, with the
amount, the interest rate, and we join it to the client's model by the
client's id from a clientId field. Every 24 hours the bank updates all
user accounts, if the deposit creation date is less than two years,
the user's interest is updated, if the date is equal to 2 years a new
field is added to the deposit (expired: true), and to the client's
collection in the available field is added, what was already
accumulated, the interest, plus the amount of the deposit. To give a
solution I have tried to obtain all the deposits, I save them in an
object and I go through it with the map property. Inside the map I try
to update the clients that have expired deposits, however, only the
last element that runs through the map is being updated.
What would be the best solution to this problem. I clarify that I am using mongoose and nodejs.
I'm wondering about best practise to keep a database as tidy as possible. The database is postgresql accessed by express.js/node. It is for a kids chores app that I'm working on and it has the following schema:
CHILDREN
id
name
points
rate
created_at
updated_at
user_id
TASKS
id
description
value
days (boolean array - eg. [0,0,0,0,0,0,0])
periods (boolean array - eg. [0,0])
created_at
updated_at
user_id
FINISHED TASKS
id
task_id
child_id
completed (boolean)
created_at
updated_at
period (boolean)
day (int (0-6))
For every individual finished task a row is created in the database. With only 400 children doing chores in there, there are already around 800 rows being added each day to the FINISHED TASKS table.
I have two questions:
Is there a more efficient way of storing FINISHED TASKS either for a full day per child or similar?
With scale I'm going to end up with potentially tens of thousands of rows per day - is this acceptable for an app like this?
Having a child table related to a task table through an intermediate bridge table is the common way of doing this. My experience with large hospital applications is that once tables start to have millions of rows and performance is degrading, the applications typically archive the "finished tasks" into a separate archive table. You would maybe end up with two tables, one called 'active tasks' that contains tasks where 'completed' is false and once the task is finished, the row is moved into the archived 'finished tasks' table.
Depending on how much effort you want to put into future proofing the application, this could be done now to prevent having to revisit this.
What do you recommend in the following scenario:
I have an azure table called Users where as columns are:
PrimaryKey
RowKey
Timestamp
FirstName
LastName
Email
Phone
Then there are different types of tasks for each user let's call them TaskType1 and TaskType2.
Both task types have common columns but then have also type specific columns like this:
PrimaryKey (this is the same as the Users PrimaryKey to find all tasks belonging to one user)
RowKey
Timestamp
Name
DueDate
Description
Priority
then TaskType1 has additional columns:
EstimationCompletionDate
IsFeasible
and TaskType2 has it's own specific column:
EstimatedCosts
I know I can store both types in the same table and my question is:
If I use different tables for TaskType1 and TaskType2 what will be the impact in transactions costs? I will guess that if I have 2 tables for each task type and then I will issue a query like: get me all tasks where the task Primarykey is equal to a specific user from Users table PrimaryKey then I will have to run 2 queries for each types (because users can have both tasks type) that means more transactions ... instead if both tasks are in the same table then it will be like 1 query (in the limit of 1000 after pagination transactions) because I will get all the rows where the PartitionKey is the user PartitionKey so the partition is not split that means 1 transaction right?
So did I understood it right that I will have more transactions if I store the tasks in different tables .. ?
Your understanding is completely correct. Having the tasks split into 2 separate tables would mean 2 separate queries thus 2 transactions (let's keep more than 1000 entities out of equation for now). Though transaction cost is one reason to keep them in the same table, there are other reasons too:
By keeping them in the same table, you would be making full use of schema-less nature of Azure Table Storage.
2 tables means 2 network calls. Though the service is highly available but you would need to take into consideration a scenario when call to 1st table is successful however call to 2nd table fails. How would your application behave in that scenario? Do you discard the result from the 1st table also? By keeping them in just one table saves you from this scenario.
Assuming that you have a scenario in your application where a user could subscribe to both Task 1 and 2 simultaneously. If you keep them in the same table, you can make use of Entity Group Transaction as both entities (one for Task 1 and other for Task 2) will have the same PartitionKey (i.e. User Id). if you keep them in separate tables, you would not be able to take advantage of entity group transactions.
One suggestion I would give is to have a "TaskType" attribute in your Tasks table. That way you would have an easier way of filtering by tasks as well.
I need to model a list of items which is sorted by the time of last update of the item.
Consider for instance a user task list. Each user has a list of tasks and each tasks has a due date. Tasks can be added to that list, but also the due date of a task can change after it has been added to the list.
That is, a task which is in the 3rd position in the task list of User A may have to be moved to the 1st, as a result of the due date of the task being updated.
What I have right now is the following CF:
Create Table UserTasks (
user_id uuid,
task_id timeuuid,
new_due_date timestamp
PRIMARY KEY (user_id, task_id));
I understand that I cannot sort on 'new_due_date' unless it is made part of the key.
But if its part of the key then it cannot be updated unless but rather deleted and recreated.
My concerns in doing so is that if a task exists in the task list of 100.000 users, then I need to make 100.000 select/delete/insert sequence.
While if I could sort on new_due_date it's be 100.000 updates
Any suggestions would be greatly appreciated.
Well, one option is if use PlayOrm with cassandra, you can partition by user_id and query for UserTasks of a user. If you query where time > 0 and time < MAX, it returns a cursor(reading in batchSize rows at a time) and you can traverse the cursor in reverse order or just plain order. This solution scales infinitely with number of users, but only scales to millions of tasks per user which may be ok but I don't know your domain well enough.
Dean