Azure Table Storage Rowkey Query not returning correct entities - azure

I have an azure table storage with a lot of entities and when I query for entities with Rowkey(which is of the data type "Double") less than 8888 by using the query "RowKey le '8888' ".I get also those entities with Rowkey greater than 8888 also.

Even if you are storing a Double data type in RowKey, it gets stored as a String (both PartitionKey and RowKey are string data type). Thus the behavior you are seeing is correct because in string comparison 21086 is smaller than 8888.
What you need to do is make both of these strings of equal length by pre-padding them with 0 (so your RowKey values will be 000021086 and 000008888 for example) and then when you perform your query, these values will not be returned.

Related

Need to build a complex query for Azure Table to count number of rows

I am trying to run complex query for Azure Table where I want to count number of rows for all deviceID for specific dateiot and timeiot? Is this possible?
Your query would be something like:
PartitionKey eq 'aaaa' and dateiot eq 'bbbb' and deviceID eq 'cccc' and timeiot eq 'dddd'
2 things though:
This query will do a complete partition scan and may result in incomplete data in a single request. Your code should be able to handle continuation tokens to get all data matching the query.
Table query does not support count functionality so you will get the entities back. You will need to add all entities to get the total count of entities matching the query.

spark thrift server issue: Length field is empty for varchar fields

I am trying to read Data from Spark Thrift Server using SAS. In the table definition through DBeaver, I am seeing that Length field is empty only for fields with VARCHAR data type. I can see the length in the Data Type field as varchar(32). But that doesn't suffice my purpose as the SAS application taps into the Length field. Since, this field is not populated now, SAS is defaulting to the max size and as a result its becoming extremely slow. I get the length field populated in Hive.

Cassandra how to filter hex values in blob field

Consider the following table:
CREATE TABLE associations (
someHash blob,
someValue int,
someOtherField text
PRIMARY KEY (someHash, someValue)
) WITH CLUSTERING ORDER BY (someValue ASC);
The inserts to this table have someHash as a hex value, like 0xA0000000000000000000000000000001, 0xA0000000000000000000000000000002, etc.
If a query needs to find all rows with 0xA0000000000, what's the recommended Cassandra way to do it?
The main problem with your query is that it does not take into account limitations of Cassandra, namely:
someHash is a partition key column
The partition key columns [in WHERE clause] support only two operators: = and IN (i.e. exact match)
In other words, your schema is designed in such a way, that effectively query should say: "let's retrieve all possible keys [from all nodes], let's filter them (type not important here) and then retrieve values for keys that match predicate". This is a full-scan of some sort and is not what Cassandra is best at. You can try using UDFs to do some data transformation (trimming someHash), but I would expect it to work well only with trivial amounts of data.
Golden rule of Cassandra is "query first": if you have such a use-case, schema should be designed accordingly - sub-key you want to query by should be actual partition key (full someHash value can be part of clustering key).
BTW, same limitation applies to most maps in programming: you can't do lookup by part of key (because of hashing).
Following your 0xA0000000000 example directly:
You could split up someHash into 48 bits (6 bytes) and 80 bits (10 bytes) parts.
PRIMARY KEY ((someHash_head, someHash_tail), someValue)
The IN will then have 16 values, from 0xA00000000000 to 0xA0000000000F.

How is the RowKey formed in the WADMetrics*** table?

I'm looking to understand the data returned by WADMetricsPT1HP10DV2S20160415 table inside Azure's storage account.
It has the PartitionKey, RowKey and EntityProperties. PartitionKey, I understand. It translates to the resource Id of my resource inside the storage account. However, I partially understand the RowKey. An example RowKey is:
:005CNetworkInterface:005CPacketsReceived__2519410355999999999
I understand the first part, which is a metric name. But what I don't understand in the number/digits that follow. I am assuming it to be the timestamp, but can't say for sure.
I was attempting to use the RowKey filter, but due to this added wrinkle it's almost impossible to generate the RowKey and use it as a filter. Does anyone know how to generate the numbers/digits in order to create a RowKey filter?
If you're curious to know about what 2519410355999999999 number is, it is essentially reverse ticks derived by subtracting the event date/time ticks from the max date/time ticks value. To derive date/time value from this number, here's what you can do:
long reverseTicks = 2519410355999999999;
long maxTicks = DateTime.MaxValue.Ticks;
var dateTime = new DateTime(maxTicks - reverseTicks, DateTimeKind.Utc);

How to chose Azure Table ParitionKey and RowKey for table that already has a unique attribute

My entity is a key value pair. 90% of the time i'll be retrieving the entity based on key but 10% of the time I'll also do a reverse lookup i.e. I'll search by value and get the key.
The key and value both are guaranteed to be unique and hence their combination is also guaranteed to be unique.
Is it correct to use Key as PartitionKey and Value as RowKey?
I believe this will also ensure that my data is perfectly load balanced between servers since ParitionKey is unique.
Are there any problems in the above decision?
Under any circumstance is it practical to have a hard coded partition key? I.e all rows have same partition key? and keeping the RowKey unique?
Is it doable, yes, but depending on the size of your data, I'm not so sure it's a good idea. When you query on partition key, Table Store can go directly to the exact partition and retrieve all your records. If you query on Rowkey alone, Table store has to check if the row exists in every partition of the table. so if you have 1000 key value pairs, searching by your key will read a single partition/row. If your search via your value alone, it will read all 1000 partitions!
I face a similar problem, I solved it 2 ways:
Have 2 different tables, one with partitionKey as your-key, the other with your-value as partitionKey. Storage is cheap, so duplicating data shouldn't cost much.
(What I finally did) If you're effectively returning single entites based on a unique key, just stick them in blobs(partitioned and pivoted as in point 1), because you don't need to traverse a table, so don't.

Resources