DynamoDb with sort? - node.js

I'm very new to the Dynamo Db concept so forgive me if my question is a bit stupid
I have a file how looks like that
Appel,www.appel.com,www.cnn.com,www.bla.com....
Blabla,www.test.com,www.fox.com,www.bla.com.....
test,www.test.com,www.fox.com,www.bla.com...
www.appel.com,300
www.cnn.com,400
and so on. In short each line is
1: a word and all the URL's she in them
2: a URL and the number of appearance
What is need to do is to to make a query for the dynamo given the word the output need to be the list of the URL's sorted by the appearance.
for exapmple to this file
for the word appel the output is:
www.cnn.com,www.appel.com,www.bla.com....
I have tried to create 2 tables `Invert-index' and 'rank' the first for the word and the list of URL's and the second for the URL and his rank, but i cant find a way to make the query without sorting my self
so first: is the Dynamo structure (the two tables) is correct?
is there a way to query the db and sort the results?

In order to rely on DynamoDB to sort your data you have to use a Range Key. That being, in order to meet your requirements, the number of appearance has to be part of the Range Key.
The Hash Key could then be the word (e.g. Appel or Blabla), and lastly you can store the urls as an string array in each record.
From the documentation:
Query results are always sorted by the range key. If the data type of
the range key is Number, the results are returned in numeric order;
otherwise, the results are returned in order of ASCII character code
values. By default, the sort order is ascending. To reverse the order
use the ScanIndexForward parameter set to false. Source: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
You can find more information about the available key types on DynamoDB on the links below:
When to use what primary key type
What is the use of a hash range in a dynamodb table
Q: If I use the number of appearance as range key how can I store the the String array? each value there has a diffrent number so if each record has a primary key (word) range key(number) and value (string array) what is the number in this case?
In that case I would recommend you to compose the Range Key with two fields (number and url) using a separator character (e.g. '#'). Your final table structure would be:
Hash Key : <Word>
Range Key : <AppearanceNumber>#<Url>
Your Range Key would be of the String type which would still work to sort your data as the <AppearanceNumber> is the prefix.
As an example by querying by the <Word>'Appel' you would get the following results:
Appel,900#www.appel.com
Appel,800#www.cnn.com
Appel,700#www.bla.com
Notice that you can still have the url and the appearanceNumber as separate fields in your table in case you want to minimize processing on your application side.

Related

solr query to sort result in descending order on basis of price

I am very beiginer in Solr and I am trying to do query on my data. I am trying to find data with name=plant and sort it by maximum price
my schema for both name and price is text type.
for eg let say data is
name:abc, price:25;
name:plant, price:35;
name:plant,price:45; //1000 other data
My Approach
/query?q=(name:"Plant")&stopwords=true
but above is giving me result of plants but I am not sure how to sort result using price feild
Any help will be appreciated
You can use the sort param for achieving the sorting.
Your query would be like q=(name:"Plant")&sort=price desc
The sort parameter arranges search results in either ascending (asc)
or descending (desc) order. The parameter can be used with either
numerical or alphabetical content. The directions can be entered in
either all lowercase or all uppercase letters (i.e., both asc or ASC).
Solr can sort query responses according to document scores or the
value of any field with a single value that is either indexed or uses
DocValues (that is, any field whose attributes in the Schema include
multiValued="false" and either docValues="true" or indexed="true" – if
the field does not have DocValues enabled, the indexed terms are used
to build them on the fly at runtime), provided that:
the field is non-tokenized (that is, the field has no analyzer and its
contents have been parsed into tokens, which would make the sorting
inconsistent), or
the field uses an analyzer (such as the KeywordTokenizer) that
produces only a single term.

How to get last inserted 10 records in descending order using dynamodb

I am new in amazone-dynamodb. I want last inserted 10 records in descending order using dynamodb.
DynamoDB allows to sort the data only by sort key attribute. The ScanIndexForward option can be used to sort the data in ascending or descending order.
Please note that the ordering will be done for the specific partition key only. It will not sort all the items in the table and give you the last 10 records. The sort operation can be done for the specific partition key.
ScanIndexForward
Specifies the order for index traversal: If true (default), the
traversal is performed in ascending order; if false, the traversal is
performed in descending order.
Sort key definition and example:-
A composite partition-sort key is indexed as a partition key element
and a sort key element. This multi-part key maintains a hierarchy
between the first and second element values. For example, a composite
partition-sort key could be a combination of “UserID” (partition) and
“Timestamp” (sort). Holding the partition key element constant, you
can search across the sort key element to retrieve items. This would
allow you to use the Query API to, for example, retrieve all items for
a single UserID across a range of timestamps.
Sounds like you are using the DynamoDB example here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.NodeJs.01.html
The sample data does not have insertion timestamps.
Another catch is, that you can only sort at DynamoDB by using the Sort Key, otherwise you need to perform the sorting in code.
So if your Partition Key is the Year, and the Sort Key is the Title, you need to:
Introduce an attribute which provides you with a timestamp of creation.
Create the table with an LSI of this attribute, or create a GSI using the new attribute as your Sort Key.
Now you can use query!
The Query API has an option to:
Sort by the Sort Key in descending order (using ScanIndexForward parameter)
Limiting the number of items returned (using Limit parameter)
The answer by Abhaya Chauhan is mostly correct, though there is one inaccuracy. The Limit parameter does not actually limit the number of items returned, but rather limit the number of items scanned (irregardless of whether they match the search criteria).
Thus if you set a Limit of 10, you might get anywhere between 0 and 10 items. See the below docs for more info:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Limit

How to retrieve item closest to another item in DynamoDB?

I have a dynamo DB table where the sort key has a numeric value.
I have a requirement to retrieve the first item which has a lower value than the one, that I have.
I have gone through http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html#API_UpdateItem_Examples docs but I can see no way to:
- sort the output
- limit the result to 1 entry
Is there any way to actually achieve what I want with dynamo DB?
EDIT:
According to this: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html
The results are sorted using sorting key, and when it's numeric, they are sorted descending. Which is great, but I still can't find any way to get only a single result [don't want to "pay" for the full table scan in some cases].
Are you searching for the next item which has a lower sort key within the same Partition Key?
In that case, you are able to use Query as you've found, sort in Descending and Limit to 1. This will not scan the entire table.
Alternatively, if you wish you scan cross Partitions, unfortunately a Table Scan is the only way to do this.

Fetching key range with common prefix in Cassandra

I want to fetch all rows having a common prefix using hector API. I played with RangeSuperSlicesQuery a bit but didn't find a way to get it working properly. Does key range parameters work with wild cards etc?
Update: I used ByteOrderedPartitioner instead of RandomPartitioner and it works fine with that. Is this the expected behavior?
Yes, that's the expected behavior. In RandomPartitioner, rows are stored in the order of the MD5 hash of their keys, so to get a meaningful range of keys, you need to use an order preserving partitioner like ByteOrderedPartitioner.
However, there are downsides to using ByteOrderedPartitioner or OrderPreservingPartitioner that you can usually avoid with a slightly different data model and RandomPartitioner.
To elaborate on the above answer, you should consider using column names as your "common prefix" instead of the key. Then you can either use a column slice to get all column names in a certain range, or you could use a secondary index then do an indexed slice for all keys with that column name.
Column slice example:
Key (without prefix)
<prefix1> : <data>
<prefix2> : <data>
...
Secondary index example:
Key (with or without prefix)
"prefix" : <the_prefix> <-- this column is indexed
otherCol1 : <data>
...

CouchDB key always matches

I'm looking to query my CouchDB in such a way that some of the fields in a document can be wildcards that match any key request.
Example:
function(doc) {
emit(doc.some_field, doc);
}
?key=100 would match both the document with some_field of 100 and of some_field value like *.
Is this possible? Is there a hack to do that?
As per the CouchDB documentation you can do:
?startkey="key"&endkey="key\ufff0"
to match key*.
From Couchdb wiki:
CouchDB actually stores the
[key,docid] pair as the key in the
btree. This means that:
you always know which document the key and value came from (it's exposed as the 'id' field in the view result)
view rows with equal keys sort by increasing docid.
So I don't think that wildcard fields used as a part of a key are possible because they are sorted. Suppose they are possible. Then if you try to query a key range from a view, rows with a wildcard will be returned with any key range. That means that they are everywhere. But that's impossible because they are sorted. That is a row with a wildcard is placed between a pair of other rows one of which has a greater key and the other a smaller one.

Resources