How to create a appropriate database model for the IM

How to create a appropriate database model for the IM - core-data

recently we're developing the IM feature for our app. And we would save the chat record with core data. The strategy we make are:
every account has a separate sqlite file.
every chat has a separate table (dynamic created, refer to this article ), however, the table structure is the same. such as,
sender_id
msg_id
content
msg_send_time
...
If we put all the chat message in a table, and we fetch the records by "fromid and toid" to get a specific dialog records. However, if we have thousands of thousands message in this table, we doubt the fetch request would be very slow. so we create a specific table for each dialog.
So, is there any better solution for this problem?

Creating "tables" for conversations dynamically is a very bad idea. This will create so much overhead that it will make your code completely inefficient.
Instead, use single entity (not table, mind you, Core Data is not a database) to capture the messages. Filter by user IDs.
This will perform without a glitch with 100.000s of messages, far more than should be stored or displayed on a mobile device.

Related

Are client side joins permissable in Cassandra if client drills down on datapoint?

I have this structure with about 1000 data points in a list on the website:
Datapoint1:
Datapoint2:
...
Datapoint1000:
With each datapoint containing 6 fields of information.
Each datapoint can be opened to reveal an additional 2-3x of information in sublist.
Would making a new request upon the user clicking on one of my datapoints be considered bad practice in Cassandra? Should I just go ahead and get it all in one go?

Should I just go ahead and get it all in one go?
Definitely not.
Would making a new request upon the user clicking on one of my datapoints be considered bad practice in Cassandra?
That's absolutely the way you should do it. Cassandra is great at writing large amounts of data, but not so great a returning large amounts of data. More, small key-based queries are definitely the way to go.

It is possible to do the JOINs on the client side but as a general proposition, queries which require joins indicate that you possibly didn't design the data model correctly.
You need to model your data such that (a) each application query (b) maps to a single table. If you need to do a client-side JOIN then you need to query the database multiple times to get the data required by your app. It will work but it's not efficient so affects the performance of the app and the database.
To illustrate with an example, let's say you app needs to display a customer's list of orders. The table design would need to be partitioned by customer with (clustered) multiple rows of orders:
CREATE TABLE orders_by_customerid (
customerid text,
orderid text,
orderdate timestamp,
ordertotal decimal,
...
PRIMARY KEY (customerid, orderid)
)
You would retrieve the list of orders for a customer with:
SELECT ... FROM orders_by_customerid WHERE customerid = ?
By default, the driver or Stargate API your app is using would page the results so only the first 100 rows (for example) will be returned instead of retrieving thousands of rows in a single pass. Note that the page size is configurable. Cheers!

Best way to run a script for large userbase?

I have users stored in postgresql database (~10 M) and i want to send all of them emails.
Currently i have written a nodejs script which basically fetches users 1000 at a time (Offset and limit in sql) and queues the request in rabbit MQ. Now this seems clumsy to me, as if the node process fails at any time i have to restart the process (i am currently keeping track of number of users skipped per query, and can restart back at the previous number skipped found from logs). This might lead to some users receiving duplicate email and some not receiving any. I can create a new table with new column indicating whether email has been to that person or not, but in my current situation i cant do so. Neither can i create a new table nor can i add a new row to existing table. (Seems to me like idempotent problem?).
How would you approach this problem? Do you think compound indexes might help. Please explain.

The best way to handle this is indeed to store who received an email, so there's no chance of doing it twice.
If you can't add tables or columns to your existing database, just create a new database for this purpose. If you want to be able to recover from crashes, you will need to store who got the email somewhere so if you are given hard restrictions on not storing this in your main database, get creative with another storage mechanism.

Redis and Postgresql synchronization (online users status)

In an NodeJS application I have to maintain a "who was online in the last N minutes" state. Since there is potentially thousands of online users - for performance reasons - I decided to not update my Postgresql user table for this task.
I choosed to use Redis to manage the online status. It's very easy and efficient.
But now I want to make complex queries to the user table, sorted by the online status.
I was thinking of creating a online table filled every minute from a Redis snapshot, but I'm not sure it's the best solution.
Following the table filling, will the next query referencing the online table take a big hit caused by the new indexes creation or loading?
Does anyone know a better solution?

I had to solve almost this exact same issue, but I took a different approach because I Didn't like the issues caused by trying to mix Redis and Postgres.
My solution was to collect the online data in a queue (Zero MQ in my case) but any queueing system should work, or a stream processing facility like Amazon Kinesis (The alternative I looked at.) I then inserted the data in batches into a second table (not the users table). I don't delete or update that table, only inserts and queries are allowed.
Doing things this way preserved the ability to do joins between the last online data and the users table without bogging down the database or creating many updates on the user tables. It has the side effect of giving us a lot of other useful data.
One thing to note that I have though about when thinking of other solutions to this problem is that your users table in transactional data(OLTP) while the latest online information is really analytics data (OLAP), so if you have a data warehouse, data lake, big data, or whatever term of the week you want to use for storing this type of data and querying against it that may be a better solution.

Robot's Tracker Threads and Display

Application: The purposed application has an tcp server able to handle several connections with the robots.
I choosed to work with database/ no files, so i'm using a sqlite db to save information about the robots and their full history, models of robots, tasks, etc...
The robots send us several data like odometry, tasks information, and so on...
I create a thread for every new robot's connection to handle the messages and update the informations of the robots on the database. Now lets start talk about my problems:
The application got to show information about the robots in realtime, and I was thinking about using QSqlQueryModel, set the right query and the show it on a QTableView but then I got to some problems/ solutions to think about:
Problem number 1: There are informations to show on the QTableView that are not on the database: I have the current consumption on the database and the actual charge on the database in capacity, but I want to show also on my table the remaining battery time, how can I add that column with the right behaviour (math implemented) in my TableView.
Problem number 2: I will be receiving messages each second for each robot, so, updating the db and the the gui(loading the query) may not be the best solution when I have a big number of robots connected? Is it better to update the table, and only update the db each minute or something like this? If I use this method I cant work with the table with the QSqlQueryModel to update the tables, so what is the approach that you recommend me to use?
Thanks
SancheZ

I have run into similar problem before; my conclusion was QSqlQueryModel is not the best option for display purposes. You may want some processing on query results, or you may want to create, remove, change display data based on the result for a fancier gui. I think best is to implement your own delegates and override the view related methods - setData, setEditor
This way you have the control over all your columns and direct union of raw data and its display equivalent (i.e. EditData, UserData).
Yes, it is better if you update your view real-time and run a batch execute at lower frequency to update the big data. In general app is the middle layer and db is a bottom layer for data monitoring, unless you use db in memory shared cache.
EDIT: One important point, you cannot run updates in multiple threads (you can, but sqlite blocks the thread until it gets the lock) so it is best to run update from a single thread

Processing a stream in Node where action depends on asynchronous calls

I am trying to write a node program that takes a stream of data (using xml-stream) and consolidates it and writes it to a database (using mongoose). I am having problems figuring out how to do the consolidation, since the data may not have hit the database by the time I am processing the next record. I am trying to do something like:
on order data being read from stream
look to see if customer exists on mongodb collection
if customer exists
add the order to the document
else
create the customer record with just this order
save the customer
My problem is that two 'nearby' orders for a customer cause duplicate customer records to be written, since the first one hasn't been written before the second one checks to see if it there.
In theory I think I could get around the problem by pausing the xml-stream, but there is a bug preventing me from doing this.

Not sure that this is the best option, but using async queue was what I ended up doing.
At the same time as I was doing that a pull request for xml-stream (which is what I was using to process the stream) that allowed pausing was added.

Is there a unique field on the customer object in the data coming from the stream? You could add a unique restriction to your mongoose schema to prevent duplicates at the database level.
When creating new customers, add some fallback logic to handle the case where you try to create a customer but that same customer is created by another save at the same. When this happens try the save again but first fetch the other customer first and add the order to the fetched customer document

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to create a appropriate database model for the IM - core-data

Related

Are client side joins permissable in Cassandra if client drills down on datapoint?

Best way to run a script for large userbase?

Redis and Postgresql synchronization (online users status)

Robot's Tracker Threads and Display

Processing a stream in Node where action depends on asynchronous calls

Categories

Resources