Windows Azure WADLogs Query is very slow - azure

I've noticed that querying a WADs table e.g. WADLogs is very very slow. It takes up to 5minute to return 10 records.
Yes the WADs table are very large in our scenario. Still, I wasn't expecting this slow. It takes ages to troubleshoot the production issues.
Question I've:
Could anyone please share best way to manage the WADs table so that query is faster.
Is there anyway to optimize the WADs tabels
Is there a best practice what should and should not be done when logging on to WADs
Are there any best practices on purging/backing up etc.
Thank you.

Gaurav Mantri has a post explaining how to query WAD tables in a performant manner. The bottom line is that you need to query on PartitionKey and RowKey to avoid a performance-killing table scan. The PartitionKey for the WAD tables contains the TickCount in a slightly encoded form and an appropriately constructed value can be used for range queries.

Thanks Neil for the link.
Summary:
Use PartionKey attribute which is indexed by Table Storage.
Where,
PartionKey = "0"+ DateTime.UtcNow.AddDays(-1.0).Ticks
Usage for REST API Query ($filter) criteria:
PartitionKey ge ’0634012319404982640′

Related

How can we dynamically Sorting & Grouping data from DynamoDB

I am new to DynamoDB and i see that data sorting done by the sort key value. Is there a way how we can sort data for any attribute with in an item irrespective of sortkey. And i am using nodejs with DynamoDB for my project.
May i know how can i achieve the same.
Thank you
So long as you don't need to re-partition your data, the first way to do this is to create a LSI (local secondary index) with your new sort key.
But unlike SQL you can't do ad hoc queries, when you design how you are setting up your table you already have to know much about what queries and transactions you will have, in general you shouldn't need to make an LSI every time you want to search, I recommend watching this.

How to Partition Database Table in Azure Data Explorer?

I started exploring ADX a few days back. I imported my data from Azure SQL to ADX using ADF pipeline but when I query those data, it is taking a long time. To find out some workaround I researched for Table Data Partitioning and I am much clear on partition types and tricks.
The problem is, I couldn't find any sample (Kusto Syntax) that guide me to define Paritionging on ADX Database Tables. Can anyone please help me with this syntax?
partition operator is probably what you are looking for:
T | partition by Col1 ( top 10 by MaxValue )
T | partition by Col1 { U | where Col2=toscalar(Col1) }
ADX doesn't currently have the notion of partitioning a table, though it may be added in the future.
that said, with the lack of technical information currently provided, it's somewhat challenging to understand how you got to the conclusion that partitioning your table is required and is the appropriate solution, as opposed to other (many) directions that ADX does allow you pursue.
if you would be willing to detail what actions you're performing, the characteristics of your data & schema, and which parts are performing slower than expected, that may help in providing you a more meaningful and helpful answer.
[if you aren't keen on exposing that information publicly, it's ok to open a support ticket with these details (through the Azure portal)]
(update: the functionality is available for a while now. read more # https://yonileibowitz.github.io/blog-posts/data-partitioning.html)

Derby: Full Text Search

I have a gigantic data of more than 2500000000 records distributed among 10 tables in derby. There are two columns "floraNfauna" and "locations" common in each table. Now I have to find a particular "floraNfauna" found at particular "locations", so I use "select" query with "like" e.g. "select * from tables where floraNfauna like('%fish%') and locations like('%shallow water bodies%')"; and it takes days to finally fetch the results which count below 1000 sometimes. After searching I found that "full text search" would be the best and faster approach to this. Can you help me with an example?
Derby integrates nicely with Lucene, which is a full-text search engine.
Read more about that here: http://wiki.apache.org/db-derby/LuceneIntegration
Firstly you must consider indexing your table. Here is an SO link which definitely would help to know more about Why to index a DB table.
More about Adding Indexes to a table.
Secondly, if you are using a centralized database, then definitely consider upgrading your server hardware configuration.
Thanks, hope it helps.

About Azure Table Secondary Index

I know the Secondary Index(s) is not here yet: It's in wish list and "planed"
I like to get some ideas (or information from the reliable source) about the incoming secondary index(s)
1st question: I noticed MS planed "secondary indexes": is that mean we can create as many indexes as we want on one table
2nd question: Current index is "PartitionKey+RowKey", if above question is not true, will the secondary index be "RowKey+PartitionKey" or we have a good chance that we can customize it?
I like to gain some ideas because I am currently design a table, since the data won't much from beginning, so I think I can wait for the secondary index feature without create multiple tables at this moment.
Please share you ideas or any source you have, thanks.
There's currently no information on secondary indexes, other than what's written at the site you referenced. So, there's no way to answer either of your two questions.
Several customers I work with, that use Table Storage, have taken the multiple-table approach to provide additional indexing. For those requiring extensive index coverage, that data typically has found its way into SQL Azure (or a combination of SQL Azure + Table Storage).
As a Windows Azure MVP I don't have any information about the secondary indexes in table service. If we do need more indexes in table service, but don't want use SQL Azure, (Not just because of the pricing...) then I would like to de-normalization my data, which split the same data into more than one table, with different row key as the indexes.
This question is now two years old. And still no sign of secondary indexes in Azure Table Storage. My guess is that it is now very unlikely to ever eventuate.
Azure Cosmos DB provides the Table API for applications that are written for Azure Table storage and that need capabilities like:
Automatic Secondary Indexing
From: Introduction to Azure Cosmos DB: Table API
So if you are willing to move your tables over to Cosmos, then you will get all the indexing you could ever want.

Can I query any attribute in a Windows Azure Tablestorage row?

Sorry if this sounds like a rather dumb question but I would like to do a "select" on data from a Windows Azure table. I tried the following and it worked:
from question in _statusTable.GetAll()
where status.RowKey.StartsWith(name)
I then tried
from question in _statusTable.GetAll()
where status.Description.StartsWith(name)
This one gave me nothing. Can anyone explain to me if or how I can query on rows that are not part of the RowKey or PartitionKey.
You can query on any property, but the types of query supported are limited - e.g. StartsWith isn't supported. Also if you aren't querying on PartitionKey and RowKey, then there are some very important performance issues to understand - and you always need to be aware of ContinuationToken's - almost any query result can contain these.
You can see the sorts of queries supported by looking at the REST API: http://msdn.microsoft.com/en-us/library/dd894031.aspx - it's pretty limited (but quick as a result):
Equal
GreaterThan
GreaterThanOrEqual
LessThan
LessThanOrEqual
NotEqual
If you need to do more, then:
you can mimic things like StartsWith("Fred") by doing a GreaterThanOrEqualTo("Fred") and LessThan("Free")
or client side filtering will work - but that means pulling back all the rows from the storage - which could be a lot of data and which could be computationally and transactionally expensive!
What does GetAll() do? StartsWith isn't supported by WA tables, so I'm assuming GetAll pulls all the data local, and so your query is done over objects in memory. If so, this has nothing to do with Windows Azure, so I'd take a look at whether your data looks like you expect it to.

Resources