I've created a feed reference and fetched the followers like so:
var admin = client.feed('user', 'admin');
const res = await admin.followers();
But the returned result contains paginated data.
How can I count the total number of followers?
Will this feature be available or any rough estimation on the roadmap?
Is there any other recommended architecture to get this total count when working with Stream?
Looks like this is not supported yet.
Dwight Gunning wrote on github on 3rd of May, 2018:
Thanks for the interest. This is still on our long-range backlog.
https://github.com/GetStream/stream-django/issues/42
It's now supported with the client.followStats() function:
// get follower and following stats of the feed
client.feed('user', 'me').followStats()
// get follower and following stats of the feed but also filter with given slugs
// count by how many timelines follow me
// count by how many markets are followed
client.feed.followStats({followerSlugs: ['timeline'], followingSlugs: ['market']})
Which returns something like:
{
results: {
followers: { count: 1529, feed: 'user:me' },
followings: { count: 81, feed: 'user:me' }
},
duration: '1.92ms'
}
Here is the API documentation for it:
https://getstream.io/activity-feeds/docs/node/following/?language=javascript#reading-follow-stats
Related
I have a two models, a video model and a global statistic model. The video model stores an array of strings for tags. The global statistics model stores an array of tagCountSchema that contains tag and count.
I am writing a function that deletes and rebuilds the global statistic documents using data from the video documents. This includes rebuilding the list of unique tags and their counts in the global statistics document.
const videoSchema = new mongoose.Schema({
tags: [{ type: String }],
});
const tagCountSchema = new mongoose.Schema({
tag: { type: String, required: true },
count: { type: Number, default: 1 },
}, { _id: false });
const statisticSchema = new mongoose.Schema({
is: { type: String, default: 'global' },
tags: [tagCountSchema],
});
const Statistic = mongoose.model('Statistic', statisticSchema );
const Video = mongoose.model('Video', videoSchema );
// Rebuild the statistics document
let statistics = await Statistic.findOne({ is: 'global' });
let videos = await Video.find({});
let map = statistics.tags.map(e => e.tag);
for (let video of videos) {
for (let tag of video.tags) {
const index = map.indexOf(tag);
if (index === -1) {
statistics.tags.push({ tag: tag, count: 1 });
map.push(tag);
} else {
statistics.tags[index].count++;
}
}
}
await statistics.save();
However, the use of indexOf() in the function above makes rebuilding the statistics take a very long time. Since videos have a lot of unique tags, the array of unique tags on the global statistics document becomes really long and since indexOf() needs to be called for each tag of each video the function takes a long time to complete.
I tested a version of this function that stored tags as an Object in the database and used Object.keys to update tags in the statistics document. This was an order of magnitude faster but I have come to realize that storing tag names directly as an object in the database would cause issues if the tag name was illegal to use as a database key.
It is also technically possible I could stringify the tags object to store it, but that is not convent for how this function is used in other places of my code. As the function loops through videos it is also updating similar statistics for other documents (such as uploader) which I have left out of the code for simplicities sake. This would mean it would need to stringify and destringify the object for every video.
What can I improve the speed of this function?
Maybe your aproach is not quite good.
It would be simplier if you update your statitics as you register your videos.
This way you will avoid the building index problem. Or you can use a queue to update your data.
This is way i would do
I have come across a similar problem in my line of work and had to get a little creative to avoid that indexOf (in my case find) function, because, as you already know, it's a hugely expensive operation.
On the other hand, as you may know, looking up keys of an object is pretty much instant. So I would rewrite the code that builds the statistics document like so:
const map = {};
statistics.tags.map((e, i) => {
map[e.tag] = i
});
for (let video of videos) {
for (let tag of video.tags) {
if (tag in map) {
statistics.tags[map[tag]].count++;
} else {
statistics.tags.push({ tag, count: 1 });
map[tag] = Object.keys(map).length;
}
}
}
This will significantly speed up your nested loop.
I am running an iOS app where I display a list of users that are currently online.
I have an API endpoint where I return 10 (or N) users randomly, so that you can keep scrolling and always see new users. Therefore I want to make sure I dont return a user that I already returned before.
I cannot use a cursor or a normal pagination as the users have to be returned randomly.
I tried 2 things, but I am sure there is a better way:
At first what I did was sending in the parameters of the request the IDs of the user that were already seen.
ex:
But if the user keeps scrolling and has gone through 200 profiles then the list is long and it doesnt look clean.
Then, in the database, I tried adding a field to each users "online_profiles_already_sent" where i would store an array of the IDs that were already sent to the user (I am using MongoDB)
I can't figure out how to do it in a better/cleaner way
EDIT:
I found a way to do it with MySQL, using RAND(seed)
but I can't figure out if there is a way to do the same thing with Mongo
PHP MySQL pagination with random ordering
Thank you :)
I think the only way that you will be able to guarentee that users see unique users every time is to store the list of users that have already been seen. Even in the RAND example that you linked to, there is a possibility of intersection with a previous user list because RAND won't necessarily exclude previously returned users.
Random Sampling
If you do want to go with random sampling, consider Random record from MongoDB which suggests using an an Aggregation and the $sample operator. The implementation would look something like this:
const {
MongoClient
} = require("mongodb");
const
DB_NAME = "weather",
COLLECTION_NAME = "readings",
MONGO_DOMAIN = "localhost",
MONGO_PORT = "32768",
MONGO_URL = `mongodb://${MONGO_DOMAIN}:${MONGO_PORT}`;
(async function () {
const client = await MongoClient.connect(MONGO_URL),
db = await client.db(DB_NAME),
collection = await db.collection(COLLECTION_NAME);
const randomDocs = await collection
.aggregate([{
$sample: {
size: 5
}
}])
.map(doc => {
return {
id: doc._id,
temperature: doc.main.temp
}
});
randomDocs.forEach(doc => console.log(`ID: ${doc.id} | Temperature: ${doc.temperature}`));
client.close();
}());
Cache of Previous Users
If you go with maintaining a list of previously viewed users, you could write an implementation using the $nin filter and store the _id of previously viewed users.
Here is an example using a weather database that I have returning entries 5 at a time until all have been printed:
const {
MongoClient
} = require("mongodb");
const
DB_NAME = "weather",
COLLECTION_NAME = "readings",
MONGO_DOMAIN = "localhost",
MONGO_PORT = "32768",
MONGO_URL = `mongodb://${MONGO_DOMAIN}:${MONGO_PORT}`;
(async function () {
const client = await MongoClient.connect(MONGO_URL),
db = await client.db(DB_NAME),
collection = await db.collection(COLLECTION_NAME);
let previousEntries = [], // Track ids of things we have seen
empty = false;
while (!empty) {
const findFilter = {};
if (previousEntries.length) {
findFilter._id = {
$nin: previousEntries
}
}
// Get items 5 at a time
const docs = await collection
.find(findFilter, {
limit: 5,
projection: {
main: 1
}
})
.map(doc => {
return {
id: doc._id,
temperature: doc.main.temp
}
})
.toArray();
// Keep track of already seen items
previousEntries = previousEntries.concat(docs.map(doc => doc.id));
// Are we still getting items?
console.log(docs.length);
empty = !docs.length;
// Print out the docs
docs.forEach(doc => console.log(`ID: ${doc.id} | Temperature: ${doc.temperature}`));
}
client.close();
}());
I have encountered the same issue and can suggest an alternate solution.
TL;DR: Grab all Object ID of the collections on first landing, randomized using NodeJS and used it later on.
Disadvantage: slow first landing if have million of records
Advantage: subsequent execution is probably quicker than the other solution
Let's get to the detail explain :)
For better explain, I will make the following assumption
Assumption:
Assume programming language used NodeJS
Solution works for other programming language as well
Assume you have 4 total objects in yor collections
Assume pagination limit is 2
Steps:
On first execution:
Grab all Object Ids
Note: I do have considered performance, this execution takes spit seconds for 10,000 size collections. If you are solving a million record issue then maybe used some form of partition logic first / used the other solution listed
db.getCollection('my_collection').find({}, {_id:1}).map(function(item){ return item._id; });
OR
db.getCollection('my_collection').find({}, {_id:1}).map(function(item){ return item._id.valueOf(); });
Result:
ObjectId("FirstObjectID"),
ObjectId("SecondObjectID"),
ObjectId("ThirdObjectID"),
ObjectId("ForthObjectID"),
Randomized the array retrive using NodeJS
Result:
ObjectId("ThirdObjectID"),
ObjectId("SecondObjectID"),
ObjectId("ForthObjectID"),
ObjectId("FirstObjectID"),
Stored this randomized array:
If this is a Server side script that randomized pagination for each user, consider storing in Cookie / Session
I suggest Cookie (with timeout expired linked to browser close) for scaling purpose
On each retrieval:
Retrieve the stored array
Grab the pagination item, (e.g. first 2 items)
Find the objects for those item using find $in
.
db.getCollection('my_collection')
.find({"_id" : {"$in" : [ObjectId("ThirdObjectID"), ObjectId("SecondObjectID")]}});
Using NodeJS, sort the retrieved object based on the retrived pagination item
There you go! A randomized MongoDB query for pagination :)
Does Getstream support "seen" and "unseen" posts?
Essentially, I'd like to be able to show the user the number of new posts that have been posted to the feed since the last time they visited it.
After they see the new posts on the feed, reset the number of unseen posts to 0.
I'm aware that the notification feed has similar capabilities but best practices wise, it doesn't seem like a good idea to use that instead of a flat feed (maybe i'm wrong)
UPDATE SCENARIO
Every user has a (global_feed_notifications:user_uuid) that follows (global_feed_flat:1)
A user adds an activity to their (user_posts_flat:user_uuid)
The activity has a to:["global_feed_flat:1"]
The expectation is that (global_feed_notifications:user_uuid) would receive the activity as an unseen and unread notification due to a fanout.
UPDATE
The scenario failed.
export function followDefaultFeedsOnStream(userapp){
const streamClient = stream.connect(STREAM_KEY, STREAM_SECRET);
const globalFeedNotifications = streamClient.feed(feedIds.globalFeedNotifications, userapp);
globalFeedNotifications.follow(feedIds.globalFeedFlat, '1');
}
export function addPostToStream(userapp, post){
const streamClient = stream.connect(STREAM_KEY, STREAM_SECRET);
const userPosts = streamClient.feed(feedIds.userPosts, userapp);
//expansion point: if posts are allowed to be friends only,
//calculate the value of the 'to' field from post.friends_only or post.private
const activity = {
actor: `user:${userapp}`,
verb: 'post',
object: `post:${post.uuid}`,
post_type: post.post_type,
foreign_id: `foreign_id:${post.uuid}`,
to: [`${feedIds.globalFeedFlat}:1`],
time: new Date()
}
userPosts.addActivity(activity)
.then(function(response) {
console.log(response);
})
.catch(function(err) {
console.log(err);
});
}
UPDATE
Well I'm not sure what happened but it suddenly started working after a day.
Unread and unseen is only supported on notification feeds. You could set up your aggregation format to {{ id }} to avoid any aggregations but still leverage the power of unread and unseen indicators.
I have a collection, say, "Things":
{ id: 1
creator: 1
created: Today }
{ id: 2
creator: 2
created: Today }
{ id: 3
creator: 2
created: Yesterday }
I'd like to create a query that'll return each Thing created by a set of users, but only their most recently created thing.
What would this look like? I can get search my collection with an array of creators and it works just fine - how can I also only get the most recently created object per user?
Thing.find({ _creator : { "$in" : creatorArray })...
You cannot find, sort and pick the most recent in just a single find() query. But you can do it using aggregation:
Match all the records where the creator is amongst the one who we are looking
for.
Sort the records in descending order based on the created field.
Group the documents based on the creator.
Pick each creator's first document from the group, which will also be
his latest.
Project the required fields.
snippet:
Thing.aggregate([
{$match:{"creator":{$in:[1,2]}}},
{$sort:{"created":-1}},
{$group:{"_id":"$creator","record":{$first:"$$ROOT"}}},
{$project:{"_id":0,
"id":"$record.id",
"creator":"$record.creator",
"created":"$record.created"}}
], function(err,data){
})
I have a collection with feeds. The documents are structured something like this:
{
_id: '123',
title: 'my title',
openedBy: ['321', '432', '543'] // ID of users
}
Then I have users:
{
_id '321',
friends: ['432'] // ID of users
}
What I would like to accomplish is to get the number of friends that has opened the feeds fetched by the user. I do this now with a mapReduce, passing the friends of the user fetching the feeds. I do not think I am doing it correctly as I reduce by only returning the emit itself and I have to convert the result back to a normal query result on the finalizer:
db.collection(collectionName).mapReduce(function () {
var openedByFriendsLength = 0;
for (var x = 0; x < friends.length; x++) {
if (this.openedBy.indexOf(friends[x]) >= 0) {
openedByFriendsLength++;
}
}
emit(this._id, {
title: this.title,
openedByLength: this.openedBy.length,
openedByFriendsLength: openedByFriendsLength
});
}, function (key, emits) {
return emits[0];
}, {
out: 'getFeeds',
scope: {
friends: user.friends
},
}, function (err, collection) {
collection.find().toArray(function (err, feeds) {
// Convert the _id / value to a normal find result
var resultFeeds = [];
for (var x = 0; x < feeds.length; x++) {
resultFeeds.push(feeds[x].value);
resultFeeds[resultFeeds.length - 1]._id = feeds[x]._id;
}
callback(err, resultFeeds);
});
});
I have looked at aggregation, but I can not quite figure out how to do the same thing. Or is the structure of the documents here all wrong?
Thanks for any reply!
You ask how to do the calculation using the aggregation framework. In general the aggregation framework performs better than map-reduce. You can find documentation on the Aggregation Framework here: http://docs.mongodb.org/manual/aggregation/.
I understand that the calculation you want is, given a user, to find all feeds where that user is contained in the openedBy array, and then find the number of distinct friends of that user that are contained in those openedBy arrays. Do I have that correct?
Aggregation, like map-reduce, only operates on one collection at a time, so the first step is to obtain the list of friends for the user from the users collection, for example:
friends = db.users.findOne({_id:user}).friends
Then we can perform the following aggregation on the feeds collection to do the calculation:
db.feeds.aggregate([
{$match: {openedBy: user}},
{$unwind: '$openedBy'},
{$match: {openedBy: {$in: friends}}},
{$group: {_id: '$openedBy'}},
{$group: {_id: 0, count: {$sum: 1}}}
])
The aggregate command specifies a list of processing steps that work much like a Unix pipeline, passing streams of documents from one stage of the pipeline to the next.
The first first step in the pipeline, $match, takes as input all documents in the collection and selects only those where the user is contained in the openedBy array.
The second step, $unwind, takes each input document and produces multiple output documents, one for each member of the openedBy array; each output document contains an openedBy field whose value is a single user. These will be users that opened the same feeds as the given user. This step will allow later steps of the pipeline to perform aggregation operations on the indivdual values of the openedBy array.
The third step, $match, filters those documents to pass only the ones where the openedBy user is a friend of the given user. However a given friend may be represented more than once in this stream, so aggregation will be needed to eliminate the duplicates.
The fourth step, $group, performs an aggregation, generating one output document for each value of the openedBy field. This will be the set of unique friends, without duplication, of the given user who have opened a feed that the user opened. The _id field will be the friend user id.
The final step, another $group, counts the number of documents generated by the preceding step. It outputs a single document, with an _id of 0 (you could use any value you want here), and with a count field that contains the final count that you wished to calculate, for example:
{ "result" : [ { "_id" : 0, "count" : 2 } ], "ok" : 1 }
I hope this answer is helpful! Let me know if you have further questions.
Bruce