Insert docs in sorted order on mongodb - node.js

In mongodb, I want to insert the data in sorted order based on some field.
The way I am doing, before insertion compare the data with data which is in collection and then insert it on that particular position. Is the insertion at particular position is possible in mongodb using node.js

You can't insert a doc at a specific spot in the collection. Even if you could, it wouldn't matter because you can't rely on the natural order of MongoDB documents staying consistent as they can move over time as the docs in a collection are updated.
Instead, create an index on the field(s) you need your docs sorted on and then include a sort clause in your queries to efficiently retrieve the docs in that order.
Example in the shell:
// Create the index (do this once)
db.test.ensureIndex({someField: 1})
// Sorted query
db.test.find().sort({someField: 1})

Related

Mongodb, should a number fields be indexed?

I'm trying to get a proper understanding of using mongodb to optimise queries. In this case it's for fields that would hold an integer. So say i have a collection
with two fields value and cid where value will store data of type string and cid will store data of type number.
I intend to write queries that will search for records by matching the fields value and cid. Also the expectation is that the saved records for this collection would get very large and hence queries could benefit from mongodb indexes. It makes sense to me to index the value field which holds string. But I wonder if the cid field requires indexing, or its okay as is, given that it will be holding integers.
I'm asking because I was going through a code base with this exact scenario described and i can't figure out why the number field was not indexed. Hoping my question makes any sense.
Regardless of datatypes, generally speaking all queries should use an index. If you use a sort predicate you can assist the database by having a compound index on both the equality portion of the query (the filter predicate) as well as the sorting portion (the sort predicate). MongoDB recommends following the index strategy referred to as the E.S.R. rule - see Performance Best Practices for E.S.R. rule.

node.js and postgres bulk upsert or another pattern?

I am using Postgres, NodeJS and Knex.
I have the following situation:
A database table with a unique field.
In NodeJS I have an array of objects and I need to:
a. Insert a new row, if the table does not contain the unique id, or
b. Update the remaining fields, if the table does contain the unique id.
From my knowledge I have three options:
Do a query to check for each if exists in database and based on the response, do a update or insert. This costs resources because there's a call for each array item and also a insert or update.
Delete all rows that have id in array and then perform a insert. This would mean only 2 operations but the autoincrement field will keep on growing.
Perform an upsert since Postgres 9.5 supports it. Bulk upsert seems to work and there's only a call to database.
Looking through the options I am aware of, upsert seems the most reasonable one but does it have any drawbacks?
Upsert is a common way.
Another way is use separate insert/update operations and most likely it will be faster:
Define existing rows
select id from t where id in (object-ids) (*)
Update existing row by (*) result
Filter array by (*) and bulk insert new rows.
See more details for same question here

MongoDB API pagination

Imagine situation when a client has feed of objects with limit 10.
When the next 10 are required it sends request with skip 10 and limit 10.
But what if there are some new objects were added (or deleted) to collection since the 1st request with offset == 0.
Then on 2nd request (with offset == 10) response may have wrong objects order.
Sorting on time of their creation does not work here, because I have some feeds which are formed on sorting via some numeric field.
You can add a time field like created_at or updated_at. It must updated when ever the document is created or modified and the field must be unique.
Then query the DB for the range of time using $gte and $lte along with a sort on this time field.
This ensures that any changes made outside the time window will not get reflected in the pagination, provided that the time field does not have duplicates. Most probably if you include microtime, duplicates wont happen.
It really depends on what you want the result to be.
If you want the original objects in their original order regardless of Delete and Add operations then you need to make a copy of the list (or at least of the order) and then page through that. Copy every Id to a new collection that doesn't change once the page has loaded and then paginate through that.
Alternatively, and perhaps more likely, what you want is to see the next 10 after the last one in the current set including any Delete or Add operations that have take place since. For this, you can use the sorted order in which you are viewing them and a filter, $gt whatever the last item was. BUT that doesn't work when there are duplicates in the field on which you are sorting. To get around that you will need to index on that field PLUS some other field which is unique per record, for example, the _id field. Now, you can take the last record in the first set and look for records that are $eq the indexed value and $gt the _id OR are simply $gt the indexed value.

Select TTL for an element in a map in Cassandra

Is there any way to select TTL value for an element in a map in Cassandra with CQL3?
I've tried this, but it doesn't work:
SELECT TTL (mapname['element']) FROM columnfamily
Sadly, I'm pretty sure the answer is that it is not possible as of Cassandra 1.2 and CQL3. You can't query individual elements of a collection. As this blog entry says, "You can only retrieve a collection in its entirety". I'd really love to have the capability to query for collection elements, too, though.
You can still set the TTL for individual elements in a collection. I suppose if you wanted to be assured that a TTL is some value for your collection elements, you could read the entire collection and then update the collection (the entire thing or just a chosen few elements) with your desired TTL. Or, if you absolutely needed to know the TTL for individual data, you might just need to change your schema from collections back to good old dynamic columns, for which the TTL query definitely works.
Or, a third possibility could be that you add another column to your schema that holds the TTL of your collection. For example:
CREATE TABLE test (
key text PRIMARY KEY,
data map<text, text>,
data_ttl text
) WITH ...
You could then keep track of the TTL of the entire map column 'data' by always updating column 'data_ttl' whenever you update 'data'. Then, you can query 'data_ttl' just like any other column:
SELECT ttl(data_ttl) FROM test;
I realize none of these solutions are perfect... I'm still trying to figure out what will work best for me, too.

How to retrieve search results from two fields in lucene index, giving one query?

Suppose I search for a query in Field A, and I want to retrive the corresponding fields B and C from my index, how should I go about it? I am using Lucene 3.6.0.
The results of your query will be returned as a set of documents, not fields. Once you've got a document, you can load whichever field contents you're interested in.
One thing that's probably worth watching out for is to ensure that your fields have been "stored".
Good luck,

Resources