query friends who are online with getstream.io - getstream-io

What is the best way to retrieve a user's online friends when each user has around 100 friends on average?
According to the document, I can query x number of users with their ids. If I can query 100 users at a time, I should do this. I wasn't able to find the maximum number of users per query.
This answer seems to suggest that I can create a hidden channel for each user with the user's friends and query the channel to see who's online. What is the maximum number of members per channel?
What should I do if each user has around 1000 friends on average.

Related

Searching for users in a graph database(neo4j), where there can be multiple(similar not duplicate) instances of a single user

I am working on a metadata search project in which I have a graph database.
The graph database is created by combining multiple datasets which contain information about users (purchase history, CRM data, etc.) and contains information like user name, age, date of birth, email, SSN etc.
So for a single user, there could be multiple instances in the database(one from CRM, one from purchase history, etc.). I want to build a search system that should take some user metadata as a query and return information combined from all instances of a user.
Is resolving records online(while executing a query instead of previously resolving all the results) is a Good Idea?
Is a Graph Database like neo4j a good choice for this task?
If yes, what is the optimal way of achieving this in neo4j?

How to query price items for a customer, in stripe?

I am trying the following :
stripe.Product.list(limit=3)
however this is quite generic and I dont find any parameter to limit the query to a certain customer.
I apreciate any hint.
Products and Prices are designed to be reusable and aren't restricted to a single user, therefore it doesn't really make sense to list either of them by Customer ID.
If instead you're looking for a specific transaction that is linked to a Customer, you'll want to list either Subscriptions or PaymentIntents by Customer ID.

ArangoDB: having a few large collections vs. a lot small collections

I have a question regarding performance/best practice:
Scenario: I have a user-collection and a chatbot-collection. They can be a lot of users (lets say 100 -1000 users) in the user-collection. Each user can have multiple chatbots (around 10 per user).
Option A: I create an edge collection to define the connection between user -> chatbot. At the end I would have 1 user-collection, 1 chatbot-collection (containing all chatbots from all users) and 1 edge-collection (containing the definitions from a user to its chatbots)
Option B: I create a separate chatbot-collection for each user, to have all chatbots of a specific user in one place. Chatbot-collection name would be e.g. user_xyz(user._key)_chatbots. So if I need all chatbots of a user with the _key ‚abc‘, I would check the collection user_abc_chatbots. In this case I don’t need an edge collection for the connection user -> chatbot. At the end I would have 1 user-collection and a lot of user_xyz_chatbots-collections (depending on how many users I have - can be 100-1000 as I wrote before).
Now my question: What is the better option? Also regarding performance - Image I have to get all (or a specific) chatbot of a user each time I receive a request.
Would be awesome if you can give me feedback on your experience/thoughts :)
Looking at the numbers you posted, i.e. 100 - 1000 users and about 10 chatbots per user, this would mean just 1000 to 10000 chatbots in total.
For this dimension of data, I would say it makes more sense to store all chatbots in a single collection, and use an (indexed) attribute to store the user id for each chatbot. This is a 1:n relationship (1 user mapped to n chatbots)
That way you can easily and still quickly find all chatbots mapped to a particular user, but this setup will also allow you to make analyses for all users or all chatbots easily.
This would be much more difficult to achieve if the chatbots of each user would be located in a different collection.
In addition, if the same chatbots can be mapped to multiple users, it may actually make sense to use three collections:
one collection for users
one collection for chatbots
and one mapping collection between users and chatbots
This would be an n:m relationship, in which each user can still be mapped to any number of chatbots, but if multiple users are mapped to the same chatbot, the data of each chatbot does not need to be stored redundantly.
I would only recommend to use separate chatbot collections per user if each chatbot has an individual data structure that is separate from all others, and that needs special indexing or querying. In this case it may make sense to separate different chatbots.
However, having too many collections (here we would think of at most 1,000) also isn't great, because each collection has a small overhead even when empty. This amortized much better if there are fewer collections that are used more frequently than when having many collections which are used seldomly.

Is the twissandra data model efficient one ?

help me please,
I am new in cassandra world, so i need some advice.
I am trying to make data model for cassandra DB.
In my project i have
- users which can follow each other,
- articles which can be related with many topics.
Each user can follow many topics.
So the goal is make the aggregated feed where user will get:
articles from all topics which he follow +
articles from all friends which he follow +
self articles.
I have searched about same tasks and found twissandra example project.
As i understood in that example we storing only ids of tweets in timeline, and when we need to get timeline we getting ids of tweets and then getting each tweet by id in separate non blocking request. After collecting all tweets we returning list of tweets to user.
So my question is: is it efficient ?
Making ~41 requests to DB for getting one page of tweets ?
And second question is about followers.
When someone creating tweet we getting all of his followers and putting tweet id to their timeline,
but what if user have thousands of followers ?
It means that for creating only one tweet we should write (1+followers_count) times to DB ?
twissandra is more a toy example. It will work for some workloads, but you possibly have more you need to partition the data more (break up huge rows).
Essentially though yes, it is fairly efficient - it can be made more so by including the content in the timeline, but depending on requirements that may be a bad idea (if need deleting/editing). The writes should be a non-issue, 20k writes/sec/node is reasonable providing you have adequate systems.
If I understand your use case correctly, you will probably be good with twissandra like schema, but be sure to test it with expected workloads. Keep in mind at a certain scale everything gets a little more complicated (ie if you expect millions of articles you will need further partitioning, see https://academy.datastax.com/demos/getting-started-time-series-data-modeling).

Database Design for "Likes" in a social network (MongoDB)

I'm building a photo/video sharing social network using MongoDB. The social network has a feed, profiles and a follower model. I basically followed a similar approach to this article for my "social feed" design. Specifically, I used the fan-out on write with bucket approach when users posts stories.
My issue is when a user "likes" a story. I'm currently also using the fan-out on write approach that basically increments/decrements a story's "like count" for every user's feed. I think this might be a bad design since users "like" more frequently than they post. Users can quickly saturate the server by liking and unliking a popular post.
What design pattern do you guys recommend here? Should I use fan-out on read? Keep using Fan-out on write with Background workers? If the solution is "background workers", what approach do you recommend using for background workers? 'm using Node.js.
Any help is appreciated!
Thanks,
Henri
I think the best approach is:
1. increasing-decreasing a counter in your database to keep track of the number of like
2. insert in a collection called 'like' each like as a single document, where you track the id of the users who likes the story and the id of the liked story.
Then if you just need the number of likes you can access the counter data and it's really fast, instead if you need to know where the likes where from you will query the collection called 'like' querying by story id and get all users' ids who liked the story.
The documents i am talking about in the like collection will be like so:
{_id: 'dfggsdjtsdgrhtd'
'story_id': 'ertyerdtyfret',
'user_id': 'sdrtyurertyuwert'}
You can store the counter in the story's document itself:
{
...
likes: 56
}
You can also keep track of last likes in your story's document (for example 1000. last because mongodb's documents have limited size to 16 mb and if your application scales so much you will meet problem in storing potential unlimited data in a single document). With this approach you can easily query the 'likes' collection and get last likes.
When someone unlikes a story you can simply remove the like document from 'like' collection, or, as better approach, (e.g: you are sending notification when someone's story is liked), just store in that document that was unliked, so that if it will be liked again by the same user you will have checked that the like was already inserted and you won't send another notification.
example:
first time insert:
{_id: 'dfggsdjtsdgrhtd'
'story_id': 'ertyerdtyfret',
'user_id': 'sdrtyurertyuwert'
active: true}
When unliked update to this
{_id: 'dfggsdjtsdgrhtd'
'story_id': 'ertyerdtyfret',
'user_id': 'sdrtyurertyuwert'
active: false}
When each like is added check if there's an existing document with the same story id and the same user id. If there is, if active is false it means the user already liked and unliked the story so that if it will be liked again you won't send already-sent notification!

Resources