I have a requirement which requires to do range search on _key column. But in one of the blog of arangodb, they have mentioned that _key column can not be used for range queries and sort operations. So in this case what can we do? Can we add skiplist index on _key column?
You cannot use the _key attribute for searching ranges in the current version of ArangoDB (3.4.x). The primary index is not considered as sorted, even though in RocksDB the index is sorted. This will change in v3.5.0 (it is already implemented in the devel branch).
Adding a skiplist index to the collection over the _key attribute will have no effect.
The only way of managing indexed ranges in your collections is by holding a separate field, which is indexed accordingly to allow for range searches.
Related
From the MongoDB documentation they have stated six index types :
Single Field Index
Compound Index
Multikey index
Geospacial index
Text index
Hashed index
The documentation has also stated four index properties.
Unique Indexes
Partial indexes
Sparse Indexes
TTL Indexes
My questions are:
Can any index type have any index property?
Can an index type have more than one index property?
According to the docs: MongoDB creates a unique index on the _id field during the creation of
a collection. Does this mean when I search by Id MongoDB does not do a collection scan but instead uses the id index to execute the query efficiently? Or is the default id index just for uniqueness only? Does a unique index property always support faster queries?
I am using MongoDB via mongoose. When defining a schema in node.js does the field unique: true imply indexing of that will result to efficient search as opposed to a collection scan?
Can materialized views be indexed in MongoDB? If so how?
In the MongoDB documentation it states that MongoDB provides a number of different index types to support specific types of data and queries. Gut there is no explanation of what index properties are. How would you define index properties?
Can any index type have any index property?
Can an index type have more than one index property?
You can test yourself and find out.
Does this mean when I search by Id MongoDB does not do a collection scan but instead uses the id index to execute the query efficiently?
Yes.
Does a unique index property always support faster queries?
Uniqueness refers to a restriction on data which can be placed in the field which is indexed. Both unique and non-unique indexes allow fast retrieval of data queried by indexed fields.
Can materialized views be indexed in MongoDB?
If you are talking about https://docs.mongodb.com/manual/core/materialized-views/, "materialized views" in MongoDB are orthogonal to indexes. You can add indexes on what this page refers to as "output collection" (the argument to $merge) if you wish to query the "materialized view" efficiently.
MongoDB provides a number of different index types to support specific types of data and queries.
Geospatial index supports geo queries. Text index supports text search. Other indexes are general-purpose.
I have a large number of records indexed on some startDateTime field, and want to select aggregates (SUM and COUNT) on all records grouped by WEEKOFYEAR(startDateTime) (i.e., EXTRACT(WEEK FROM startDateTime)). Can I put a secondary index on EXTRACT(WEEK FROM startDateTime)? Or, even better, will the query use an index on startDateTime appropriately to optimize a request grouped by WEEK?
See this similar question about MySQL indices. How would this be handled in the Cloud Spanner world?
Secondary index on generated columns (i.e., EXTRACT(WEEK FROM startDateTime)) are not supported yet. If you have a covering index that includes all the columns required for the query (i.e., startDateTime and other required columns for grouping and aggregation), the planner will use such covering index over the base table but the aggregation is likely to be based on hash aggregation. Unless you aggregate over very long period of time, it should not be a big problem (I admit that it is not ideal though).
If you want to restrict the aggregated time range, you need to spell it out in terms of startDateTime (i.e., you need to convert the min/max datetime to the same type as startDateTime).
Hope this helps.
I am reading A deep look at the CQL WHERE clause. I am confused by several statements, so I posted 5 questions (Q1 - Q5). Any comments welcomed.
Thanks
Q1: What does Secondary index queries mean? What does the query is using a secondary index mean?
I think secondary index queries==the query which is using a secondary index.
But Secondary index queries means queries on the table where secondary index exists
OR the queried columns are all indexed OR at least one column among
all queried columns is indexed?
Single column slice restrictions are allowed only on the last clustering column being restricted.
Q2: Single column slice restrictions mean >, >=, <=, <?
Direct queries on secondary indices support only =, CONTAINS or CONTAINS KEY restrictions.
Q3: The indexed columns can be restricted only by =, CONTAINS, and
CONTAINS KEY?
CONTAINS and CONTAINS KEY restrictions can only be used on collections when the query is using a secondary index.
Q4: CONTAINS can be used on any non-indexed clustering column? Buy
when one column is secondary indexed, CONTAINS can only be used on
this column when this column is collections type?
Regular columns can be restricted by =, >, >=, <= and <, CONTAINS or CONTAINS KEY restrictions if the query is a secondary index query.
IN restrictions are not supported.
Q5: What does Regular columns mean? Always Single column slice
restrictions are allowed only on the last clustering column being
restricted.. If one column is secondary indexed, =, >, >=, <= and <,
CONTAINS or CONTAINS KEY restrictions (but not IN) can be restricted
on this column, even if it is not last clustering column?
Q1: Secondary index is all the example where they use 'CREATE INDEX'.
Q2: Yes, all of those inequality operators produce a slice of the query.
Q3: Yes, essentially CONTAINS will look inside a collection and look for that particular item. It only works if the column has a secondary index otherwise Cassandra would have to scan every collection to check.
Q4/5: Questions are a bit confusing. Regular columns in this context will be non-partition key columns.
In mongodb, I want to insert the data in sorted order based on some field.
The way I am doing, before insertion compare the data with data which is in collection and then insert it on that particular position. Is the insertion at particular position is possible in mongodb using node.js
You can't insert a doc at a specific spot in the collection. Even if you could, it wouldn't matter because you can't rely on the natural order of MongoDB documents staying consistent as they can move over time as the docs in a collection are updated.
Instead, create an index on the field(s) you need your docs sorted on and then include a sort clause in your queries to efficiently retrieve the docs in that order.
Example in the shell:
// Create the index (do this once)
db.test.ensureIndex({someField: 1})
// Sorted query
db.test.find().sort({someField: 1})
Is there any way to select TTL value for an element in a map in Cassandra with CQL3?
I've tried this, but it doesn't work:
SELECT TTL (mapname['element']) FROM columnfamily
Sadly, I'm pretty sure the answer is that it is not possible as of Cassandra 1.2 and CQL3. You can't query individual elements of a collection. As this blog entry says, "You can only retrieve a collection in its entirety". I'd really love to have the capability to query for collection elements, too, though.
You can still set the TTL for individual elements in a collection. I suppose if you wanted to be assured that a TTL is some value for your collection elements, you could read the entire collection and then update the collection (the entire thing or just a chosen few elements) with your desired TTL. Or, if you absolutely needed to know the TTL for individual data, you might just need to change your schema from collections back to good old dynamic columns, for which the TTL query definitely works.
Or, a third possibility could be that you add another column to your schema that holds the TTL of your collection. For example:
CREATE TABLE test (
key text PRIMARY KEY,
data map<text, text>,
data_ttl text
) WITH ...
You could then keep track of the TTL of the entire map column 'data' by always updating column 'data_ttl' whenever you update 'data'. Then, you can query 'data_ttl' just like any other column:
SELECT ttl(data_ttl) FROM test;
I realize none of these solutions are perfect... I'm still trying to figure out what will work best for me, too.