If you keep pressing the tab key on a list you end-up having a lot of levels.
Can we limit it to one level more than the last one used?
Or at least to a maximum number of levels? (4 for instance)
Related
In my current application I have setup a search for my (web-)shop items. Until now I have always worked with dynamic mappings and now encountered the problem, that I have reached the default index.total_fields.limit of 1000.
Now what I want to do is to reduce the number the total number of fields (reduce the mappings total_fields) just by putting a new mapping where I set dynamic to false on most of the unnecessary properties.
Somehow when doing this, I now only can't reduce the number of total fields, I also get the 1000 total fields limit error while putting the new mapping to the items index. Is there a way to refresh the mapping on an existing index without the need to recreate a new index with correct mappings.
Thanks in advance
No, the mapping is only applied when creating an index, if you want to change the mapping of any field you will need to create a new index or reindex your current index into a newly created index with the correct mapping.
Alternatively you can increase the limit of total mapping fields, but having too many fields can impact in performance.
PUT your_index/_settings
{
"index.mapping.total_fields.limit": 5000
}
How does the following two Cassandra limitations interplay with one another?
Cells in a partition: ~2 billion (2^31); single column value size: 2 GB (1 MB is recommended) [1]
Collection values may not be larger than 64KB. [2]
Are collections laid out inside of a single column and hence ought one limit the size of the entire collection to 1MB?
[1] https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refLimits.html
[2] https://wiki.apache.org/cassandra/CassandraLimitations
A collection is a single column value with
a single value inside size limited to 64k (max value of unsinged short)
items in collections limited to to 64K (max value of unsinged short)
The 1MB is a recommendation and no hard limit, you can go higher if you need to - but as always do testing before production. But as you can have 2^16 items and 2^16 bytes in each - this will break the 2GB limit per cell.
But collections should be kept small for performance reasons anyway as they are always read entirely. And updates to collections are not very fast either.
I need to use Cassandra to store an inverted index, in which words and their frequencies in articles are stored as follows:
word, article_title, frequency
Number of unique words is about 40M and number of Cassandra nodes = 2.
Which is better to use the first character of the word as Partition key or the word itself?
what about the Primary key?
TL;DR: With regards to your query, I would definitely say to use the word as the partition key.
If you would use only the first character, you will have only 26 partitions. You do not want that, if anything else, you will get hot-spotting. Some rows will be pretty short, since there are not a lot of words starting with a specific letter, and others will be very, very long, maybe even beyond the point it is performant to use. Yes, Cassandra has a two billion column per row limit, but the recommendation is to keep the size of the row in the millions. You also do not want to access all the words starting with 'A' if you want only 'AIRPORT'.
You need a high carnality, as random as possible, partition key so that the rows are easily dispersed throughout the cluster. On the other hand, it has to reflect your access patterns. In your case, you wan't to see stats for a word or set of words. Accessing by partition/primary is basically as fast as it gets with Cassandra.
As for the clustering key, it is more or less obvious, you can use the article title, OR, what I would do is actually use an article identifier (a UUID or such) as the cluster key. Article titles might change (typo?), and you certainly do not want to iterate through all your rows changing the title.
Do we lose performance or gain disk size significantly if I define lots of columns but use few but different set per row?
You will not lose performance. In fact, it's actually quite efficient (depending on your use case) since empty columns don't take up any space. Whether or not you'll "gain" significant disk size is subjective. It's more about the disk space you'll be saving in place of the alternative method you would use to avoid empty columns.
"having columns without values is virtually free in Cassandra"
http://www.datastax.com/dev/blog/cql3_collections
I am trying to define a Cassandra schema for the following use case: Each unique set of users defines a group. The query pattern requires a quick way to find if a group exists based on an set of users as input.
Since there is very little information given, I will make some best-case assumptions here. I am assuming there is a unique way of identifying a user using a fixed length N-bit hash (let's call it uid). I am also assuming that the max number of users (MAX) in a group would be such that (MAX < 64*1024*8 / n). This is because Cassandra has 64KB limit on key length). In real terms this means that if you have up to 32k users, you could form any group up to the max number of users.
Given the above, I would say that a sorted concatenation of the uids would be an easy way to identify the group and the group can be keyed as such.
In that case, a single lookup by the sorted concatenated key formed by the query set of users would give you the answer if you get a hit.
Let's say
key of G1 = u04,u08,u10,u12;
key of G2 = u01,u11,u12;
...
Key of GN = u09,uxx,uyy;
If searching whether a group containing users u04, u08, u03, exists, simply create a key "u03,u04,u08" and try and find a hit in the "Groups" column family.
If you are working with a larger user-set with larger users per group, then a different approach may be needed.
EDIT: Can you give a sense of maximum how many users may form a group. I assume your client would have to pass a list of all those users as part of he query.