We are storing users and friends (relationships) in Redis sets.
This is probably easy but we can't figure out how to get back results when paginating.
Example: when showing a logged in users's friends, we need the first 20 results, then on the following click, the next 20 results, etc.. We don't really care about the order, provided we don't get repeated data for the following queries.
We prefer to use sets vs sorted sets, as sets lets us use cheap SINTER for other queries.
WHat would the recommended aproach be? Storing them as both sets and sorted sets? Sounds a bit redundant.
You can paginate through a Set using SSCAN, note that it can return the same result twice though. Alternatively, Sorted Sets are the best for that kind of task. Lastly, Lists can also work but LRANGE is an expensive operation.
Related
I am working on a system where I need fast filtering queries. Basically, it is a set of 50 different fields, booleans, amounts, code and dates; just like a web-shop filter.
it is ~ 10 000 000 items.
For the moment I am using MSSQL, and using one big table with different indexes except for a few separate tables when I found it much faster to join instead of just filter the result in one table.
I usually get a response time around 1 second, with a fairly fast server.
I was considering to use ArangoDB for this and wonder what approach is best? Is it better to keep some of the "flags" as separate tables and join or is it more efficient to put everything in the same document and have it as a flag with an index? Or would it be any benefit using the graph/edge feature and make a link back to the same object (or an object representing the code for instance)?
The reason I am considering ArangoDB is that my plan is to have a more complex model and will most likely use the graph feature in the future even if the first priority is to get the system up to the current level of features with a similar speed.
Any thoughts?
I have an application with posts. Those posts are shown in the home view in descending order with the creation date.
I want to implement a more complex sorting strategy based on for example, posts which users have more posts, posts which have more likes, or views. Not complex, simple things. Everything picking random ones. Let's say I have the 100 posts more liked, I pick 10 of them.
To achieve this I don't want to do it in the same query, since I don't want to affect it's performance. I am using mongodb, and I need to use lookup which wouldn't be advisable to use in the most critical query of the app.
What would be the best approach to implement this?.
I thought doing all those calculations using for example AWS Lambda, or maybe triggers in mongo atlas, each 30 seconds and store the resultant information in database, which could be consumed by query.
That way each 30 seconds lets say the first 30 posts will be updated depending on the criteria.
I don't really know if this is a good approach or not. I need something not complex, but be able to "mix" all the post and show first the ones the comply with the criteria.
Thanks!
TL;DR: which of the three options below is the most efficient for paginating with Redis?
I'm implementing a website with multiple user-generated posts, which are saved in a relational DB, and then copied to Redis in form of Hashes with keys like site:{site_id}:post:{post_id}.
I want to perform simple pagination queries against Redis, in order to implement lazy-load pagination (ie. user scrolls down, we send an Ajax request to the server asking for the next bunch of posts) in a Pinterest-style interface.
Then I created a Set to keep track of published posts ids, with keys like site:{site_id}:posts. I've chosen Sets because I don't want to have duplicated IDs in the collection and I can do it fastly with a simple SADD (no need to check if id exists) on every DB update.
Well, as Sets aren't ordered, I'm wheighting the pros and cons of the options I have to paginate:
1) Using SSCAN command to paginate my already-implemented sets
In this case, I could persist the returned Scan cursor in the user's
session, then send it back to server on next request (it doesn't seem
reliable with multiple users accessing and updating the database: at
some time the cursor would be invalid and return weird results -
unless there is some caveat that I'm missing).
2) Refactor my sets to use Lists or Sorted Sets instead
Then I could paginate using LRANGE or ZRANGE. List seems to
be the most performant and natural option for my use case. It's
perfect for pagination and ordering by date, but I simply can't check
for a single item existence without looping all list. Sorted Sets
seems to join the advantages of both Sets and Lists, but consumes more
server resources.
3) Keep using regular sets and store the page number as part of the key
It would be something like site:{site_id}:{page_number}:posts. It
was the recommended way before Scan commands were implemented.
So, the question is: which one is the most efficient / simplest approach? Is there any other recommended option not listed here?
"Best" is best served subjective :)
I recommend you go with the 2nd approach, but definitely use Sorted Sets over Lists. Not only do the make sense for this type of job (see ZRANGE), they're also more efficient in terms of complexity compared to LRANGE-ing a List.
In couchDB, does reduce still get called if the map result is empty? if so, are both keys and value empty?
my use case (and hopefully there's a better way to do this):
I send a query to my cluster, and I require both the list of items and the count of items returned (which the map doesnt seem to provideā¦ it only gives me the total count of the view, not the filtered viewresult). I then call reduce to get the count in a separate query.
Sometimes the ViewResult is empty, which makes reduce return null. I could look for this null, but I doubt this is the correct approach in couchdb world.
Edit: turns out the ORM I'm using does support a way to do it.
The reduce function isn't called when there are no rows.
The easiest way to achieve your goal is to just do the map, and back in your code retrieve the length of the rows array that is returned from CouchDB.
The reduce function being called on an empty map was actually a bug that I aided in fixing many months ago. I believe it was patched in 1.2. If you are up for using 1.1 then this bug may still exist and be useable.
Is it possible to transform the returned data from a Find query in MongoDB?
As an example, I have a first and last field to store a user's first and last name. In certain queries, I wish to return the first name and last initial only (e.g. 'Joe Smith' returned as 'Joe S'). In MySQL a SUBSTRING() function could be used on the field in the SELECT statement.
Are there data transformations or string functions in Mongo like there are in SQL? If so can you please provide an example of usage. If not, is there a proposed method of transforming the data aside from looping through the returned object?
It is possible to do just about anything server-side with mongodb. The reason you will usually hear "no" is you sacrifice too much speed for it to make sense under ordinary circumstances. One of the main forces behind PyMongo, Mike Dirolf with 10gen, has a good blog post on using server-side javascript with pymongo here: http://dirolf.com/2010/04/05/stored-javascript-in-mongodb-and-pymongo.html. His example is for storing a javascript function to return the sum of two fields. But you could easily modify to return the first letter of your user name field. The gist would be something like:
db.system_js.first_letter = "function (x) { return x.charAt(0); }"
Understand first, though, that mongodb is made to be really good at retrieving your data, not really good at processing it. The recommendation (see for example 50 tips and tricks for mongodb developers from Kristina Chodorow by Oreilly) is to do what Andrew tersely alluded to doing above: make a first letter column and return that instead. Any processing can be more efficiently done in the application.
But if you feel that even querying for the fullname before returning fullname[0] from your 'view' is too much of a security risk, you don't need to do everything the fastest possible way. I'd avoided map-reduce in mongodb for awhile because of all the public concerns about speed. Then I ran my first map reduce and twiddled my thumbs for .1 seconds as it processed 80,000 10k documents. I realize in the scheme of things, that's tiny. But it illustrates that just because it's bad for a massive website to take a performance hit on some server side processing, doesn't mean it would matter to you. In my case, I imagine it would take me slightly longer to migrate to Hadoop than to just eat that .1 seconds every now and then. Good luck with your site
The question you should ask yourself is why you need that data. If you need it for display purposes, do that in your view code. If you need it for query purposes, then do as Andrew suggested, and store it as an extra field on the object. Mongo doesn't provide server-side transformations (usually, and where it does, you usually don't want to use them); the answer is usually to not treat your data as you would in a relational DB, but to use the more flexible nature of the data store to pre-bake your data into the formats that you're going to be using.
If you can provide more information on how this data should be used, then we might be able to answer a little more usefully.