CouchDB key always matches - couchdb

I'm looking to query my CouchDB in such a way that some of the fields in a document can be wildcards that match any key request.
Example:
function(doc) {
emit(doc.some_field, doc);
}
?key=100 would match both the document with some_field of 100 and of some_field value like *.
Is this possible? Is there a hack to do that?

As per the CouchDB documentation you can do:
?startkey="key"&endkey="key\ufff0"
to match key*.

From Couchdb wiki:
CouchDB actually stores the
[key,docid] pair as the key in the
btree. This means that:
you always know which document the key and value came from (it's exposed as the 'id' field in the view result)
view rows with equal keys sort by increasing docid.
So I don't think that wildcard fields used as a part of a key are possible because they are sorted. Suppose they are possible. Then if you try to query a key range from a view, rows with a wildcard will be returned with any key range. That means that they are everywhere. But that's impossible because they are sorted. That is a row with a wildcard is placed between a pair of other rows one of which has a greater key and the other a smaller one.

Related

How to get last inserted 10 records in descending order using dynamodb

I am new in amazone-dynamodb. I want last inserted 10 records in descending order using dynamodb.
DynamoDB allows to sort the data only by sort key attribute. The ScanIndexForward option can be used to sort the data in ascending or descending order.
Please note that the ordering will be done for the specific partition key only. It will not sort all the items in the table and give you the last 10 records. The sort operation can be done for the specific partition key.
ScanIndexForward
Specifies the order for index traversal: If true (default), the
traversal is performed in ascending order; if false, the traversal is
performed in descending order.
Sort key definition and example:-
A composite partition-sort key is indexed as a partition key element
and a sort key element. This multi-part key maintains a hierarchy
between the first and second element values. For example, a composite
partition-sort key could be a combination of “UserID” (partition) and
“Timestamp” (sort). Holding the partition key element constant, you
can search across the sort key element to retrieve items. This would
allow you to use the Query API to, for example, retrieve all items for
a single UserID across a range of timestamps.
Sounds like you are using the DynamoDB example here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.NodeJs.01.html
The sample data does not have insertion timestamps.
Another catch is, that you can only sort at DynamoDB by using the Sort Key, otherwise you need to perform the sorting in code.
So if your Partition Key is the Year, and the Sort Key is the Title, you need to:
Introduce an attribute which provides you with a timestamp of creation.
Create the table with an LSI of this attribute, or create a GSI using the new attribute as your Sort Key.
Now you can use query!
The Query API has an option to:
Sort by the Sort Key in descending order (using ScanIndexForward parameter)
Limiting the number of items returned (using Limit parameter)
The answer by Abhaya Chauhan is mostly correct, though there is one inaccuracy. The Limit parameter does not actually limit the number of items returned, but rather limit the number of items scanned (irregardless of whether they match the search criteria).
Thus if you set a Limit of 10, you might get anywhere between 0 and 10 items. See the below docs for more info:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Limit

DynamoDb with sort?

I'm very new to the Dynamo Db concept so forgive me if my question is a bit stupid
I have a file how looks like that
Appel,www.appel.com,www.cnn.com,www.bla.com....
Blabla,www.test.com,www.fox.com,www.bla.com.....
test,www.test.com,www.fox.com,www.bla.com...
www.appel.com,300
www.cnn.com,400
and so on. In short each line is
1: a word and all the URL's she in them
2: a URL and the number of appearance
What is need to do is to to make a query for the dynamo given the word the output need to be the list of the URL's sorted by the appearance.
for exapmple to this file
for the word appel the output is:
www.cnn.com,www.appel.com,www.bla.com....
I have tried to create 2 tables `Invert-index' and 'rank' the first for the word and the list of URL's and the second for the URL and his rank, but i cant find a way to make the query without sorting my self
so first: is the Dynamo structure (the two tables) is correct?
is there a way to query the db and sort the results?
In order to rely on DynamoDB to sort your data you have to use a Range Key. That being, in order to meet your requirements, the number of appearance has to be part of the Range Key.
The Hash Key could then be the word (e.g. Appel or Blabla), and lastly you can store the urls as an string array in each record.
From the documentation:
Query results are always sorted by the range key. If the data type of
the range key is Number, the results are returned in numeric order;
otherwise, the results are returned in order of ASCII character code
values. By default, the sort order is ascending. To reverse the order
use the ScanIndexForward parameter set to false. Source: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
You can find more information about the available key types on DynamoDB on the links below:
When to use what primary key type
What is the use of a hash range in a dynamodb table
Q: If I use the number of appearance as range key how can I store the the String array? each value there has a diffrent number so if each record has a primary key (word) range key(number) and value (string array) what is the number in this case?
In that case I would recommend you to compose the Range Key with two fields (number and url) using a separator character (e.g. '#'). Your final table structure would be:
Hash Key : <Word>
Range Key : <AppearanceNumber>#<Url>
Your Range Key would be of the String type which would still work to sort your data as the <AppearanceNumber> is the prefix.
As an example by querying by the <Word>'Appel' you would get the following results:
Appel,900#www.appel.com
Appel,800#www.cnn.com
Appel,700#www.bla.com
Notice that you can still have the url and the appearanceNumber as separate fields in your table in case you want to minimize processing on your application side.

CouchDB view collation sorted by date

I am using a couchDB database.
I can get all documents by category and paginate results with a key like ["category","document_id"]and a query likestartkey=["category","document_id"]&endkey=["category",{}]`
Now I want to sort those results by date to have latest documents first.
I tried a lot of keys such as ["category","date","document_id"]
but nothing works (or I can't get it working).
I would use something like
startkey=["queried_category","queried_date","queried_document_id"]&endkey=["queried_category"]
but ignore the "queried_date" key part (sort but do not take documents where "document_id" > "queried_document_id")
EDIT:
Example :
With a key like :
startkey=["apple","2012-12-27","ZZZ"]&endkey=["apple",{}]&descending=true
I will have (and it is the normal behavior)
"apple","2012-12-27","ABC"
"apple","2012-05-01","EFG"
...
"apple","2012-02-13","ZZZ"
...
But the result set I want should start with
"apple","2012-02-13","ZZZ"
Emit the category and the timestamp (you don't need the document_id):
emit(category, timestamp);
And then filter on the category:
?startkey=[":category"]&endkey=[":category",{}]
You must understand that this is only a sort, so you need the startkey to be before the first row, and the endkey to be after the last row.
Last but not least, don't forget to have a representation for the timestamp that is adequate to the sort.
The problem with pagination with timestamp instead of doc ID is that timestamp is not unique. That's why you will have problem with paging Aurélien's solution.
I would stay with what you tried but use timestamp as the number (standard UNIX milliseconds since 1970). You can reverse the order of single numeric field just by multiplying by -1:
emit(category, -timestamp, doc_id)
This way result sorted lexicographically (ascending) will be ordered according to your needs:
first dates descending,
then document id's ascending.

Fetching key range with common prefix in Cassandra

I want to fetch all rows having a common prefix using hector API. I played with RangeSuperSlicesQuery a bit but didn't find a way to get it working properly. Does key range parameters work with wild cards etc?
Update: I used ByteOrderedPartitioner instead of RandomPartitioner and it works fine with that. Is this the expected behavior?
Yes, that's the expected behavior. In RandomPartitioner, rows are stored in the order of the MD5 hash of their keys, so to get a meaningful range of keys, you need to use an order preserving partitioner like ByteOrderedPartitioner.
However, there are downsides to using ByteOrderedPartitioner or OrderPreservingPartitioner that you can usually avoid with a slightly different data model and RandomPartitioner.
To elaborate on the above answer, you should consider using column names as your "common prefix" instead of the key. Then you can either use a column slice to get all column names in a certain range, or you could use a secondary index then do an indexed slice for all keys with that column name.
Column slice example:
Key (without prefix)
<prefix1> : <data>
<prefix2> : <data>
...
Secondary index example:
Key (with or without prefix)
"prefix" : <the_prefix> <-- this column is indexed
otherCol1 : <data>
...

How to chose Azure Table ParitionKey and RowKey for table that already has a unique attribute

My entity is a key value pair. 90% of the time i'll be retrieving the entity based on key but 10% of the time I'll also do a reverse lookup i.e. I'll search by value and get the key.
The key and value both are guaranteed to be unique and hence their combination is also guaranteed to be unique.
Is it correct to use Key as PartitionKey and Value as RowKey?
I believe this will also ensure that my data is perfectly load balanced between servers since ParitionKey is unique.
Are there any problems in the above decision?
Under any circumstance is it practical to have a hard coded partition key? I.e all rows have same partition key? and keeping the RowKey unique?
Is it doable, yes, but depending on the size of your data, I'm not so sure it's a good idea. When you query on partition key, Table Store can go directly to the exact partition and retrieve all your records. If you query on Rowkey alone, Table store has to check if the row exists in every partition of the table. so if you have 1000 key value pairs, searching by your key will read a single partition/row. If your search via your value alone, it will read all 1000 partitions!
I face a similar problem, I solved it 2 ways:
Have 2 different tables, one with partitionKey as your-key, the other with your-value as partitionKey. Storage is cheap, so duplicating data shouldn't cost much.
(What I finally did) If you're effectively returning single entites based on a unique key, just stick them in blobs(partitioned and pivoted as in point 1), because you don't need to traverse a table, so don't.

Resources