Querying azure table storage for null values

Querying azure table storage for null values - azure

Does anyone know the proper way to query azure table storage for a null value. From what I've read, it's possible (although there is a bug which prevents it on development storage). However, I keep getting the following error when I do so on the live cloud storage:
One of the request inputs is not valid.
This is a dumbed down version of the LINQ query that I've put together.
var query = from fooBar in fooBarSVC.CreateQuery<FooBar>("FooBars")
where fooBar.PartitionKey == kPartitionID
&& fooBar.Code == kfooBarCode
&& fooBar.Effective_Date <= kFooBarDate.ToUniversalTime()
&& (fooBar.Termination_Date > kFooBarDate.ToUniversalTime() || fooBar.Termination_Date == null)
select fooBar;
If I run the query without checking for null, it works fine. I know a possible solution would be to run a second query on the collection that this query brings back. I don't mind doing that if I need to, but would like to know if I can get this approach to work first.
Anyone see anything obvious I'm doing wrong?

The problem is that because azure table storage does not have a schema, the null column actually doesn't exist. This is why your query is not valid. there is no such thing as a null column in table storage. You could do something like store an empty string if you really have to. Really though the fundamental issue here is that Azure table storage really is not built to be queried by any columns other than partition key and row key. Every time you make a query on one of these non-standard columns you are doing a table scan. If you start to get lots of data you are going to have a very high rate of query time outs. I would suggest setting up a manual index for these types of queries. For example, you could store the same data in the same table but with different values for the Row key. Ultimately, if your are app is not getting crazy high usage I would just use SQL Azure as it will be much more flexible for the types of queries you are doing.
Update: Azure has a great guide on table storage design that I would recommend reading. http://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/

I just had this problem and found a nice little ninja-trick to actually test for nulls. Although I'm using the Azure Storage interface directly, I'm 90% sure it will work for LINQ too if you do the same.
Here's what I did to check if Price (Int32?) is null:
not (Price lt 0 or Price gt 0)
I'm guessing in your case you can do the same in LINQ by testing if fooBar.Termination_Date is less or greater than DateTime.UtcNow for example. Something like this:
where fooBar.PartitionKey == kPartitionID
&& fooBar.Code == kfooBarCode
&& fooBar.Effective_Date <= kFooBarDate.ToUniversalTime()
&& (fooBar.Termination_Date > kFooBarDate.ToUniversalTime()
|| (not (fooBar.Termination_Date < DateTime.UtcNow
or fooBar.Termination_Date > DateTime.UtcNow))
select fooBar;

For a string column called MyColumn I was able to type: not(MyColumn gt '')
Mike S answer above put me on the right path.

For strings, we can compare to empty string.
IsNotBlank(value)
Can be:
(Value gt '')

Using the Azure Tables client library for .NET. to query for null Guid values.
In the sample code, the property's name is MyColumn.
var filter = Azure.Data.Tables.TableClient
.CreateQueryFilter($"not(MyColumn gt {Guid.Empty})");
The TableClient.CreateQueryFilter method will create the filter:
not(MyColumn gt guid'00000000-0000-0000-0000-000000000000')

Related

[Shopware6]: How can I add SQL Filter to Criteria?

So, the criteria are already quite powerful. Yet I came across a case I seem to not be able to replicate on the criteria object.
I needed to filter out all entries that were not timely relevant.
In a world, where you'd be able to mix SQL with the field definition, it would look like this:
...->addFilter(
new RangeFilter('DATEDIFF(NOW(), INTERVAL createdAt DAY)', [RangeFilter::LTE => 1])
)
Unfortunately that doesn't work in our world.
When i pass the criteria to a searchfunction, i only get:
"DATEDIFF(NOW(), INTERVAL createdAt DAY)" is not a field on xyz
I tried to do it with ->addExtensions and several other experiments, but i couldn't get it to work. I resorted to using the queryBuilder from Doctrine, using queryParts, but the data i'm getting is not very clean and not assigned to an ORMEntity like it should be.
Is it possible to write a criteria that incooperates native SQL filtering?

The DAL is designed in a way that should explicitly not accept SQL statements as it is a core concept of the abstraction. As the DAL offers extendibility for third party extensions it should be preferred to raw SQL in most cases. I would suggest writing a lightweight query that only fetches the IDs using your SQL query and then use these pre-filtered IDs to fetch complete data sets using the DAL.
$ids = (new QueryBuilder($connection))
->select(['LOWER(HEX(id))'])
->from('product')
->where('...')
->execute()
->fetchFirstColumn();
$criteria = new Criteria($ids);
This should offer the best of both worlds, the freedom of using raw SQL and the extendibility features of the DAL.
In your specific case you could also just take the current day, remove the amount of days that should have passed and use this threshold date to compare it to the creation date:
$now = new \DateTimeImmutable();
$dateInterval = new \DateInterval('P1D');
$thresholdDate = $now->sub($dateInterval);
// filter to get all with a creation date greater than now -1 day
$filter = new RangeFilter(
'createdAt',
[RangeFilter::GTE => $thresholdDate->format(Defaults::STORAGE_DATE_TIME_FORMAT)]
);

NodeJS - azure-storage-node- , how to retrieve addition of two columns, and apply filtering condition

Sorry for being newbie for NodeJs and table query, my question's,
How I could create a query using Nodejs pakcage "azure-storage-node", which selects the sum/addition of two coloumns 'start' and 'period' , if the addition is greater than a threshold it will take the whole raw, my tries which didn't work is something like this,
var query = new azure.TableQuery();
total = query.select(['start']) + query.select(['period']);
query.where('total > ?' , 50000);
or may be something like this,
var query = new azure.TableQuery()
.where('start + period gt 50000');
but it throws an error of '+'.
Thanks

What you're trying to accomplish is not possible with Azure Tables at least as of today as Azure Tables has limited querying support and support for computed columns (if I may say so) is not there.
There are two possible solutions:
Have an attribute called total in your entities that will contain the value i.e. start + period. You calculate this value when you're inserting or updating the entity and store it at that time.
Do this filtering on the client side. For this you will need to download all related entities and then apply this filtering on the client side on the data that you fetched.

Documentdb performance when using pagination

I have a working code on pagination which works great with azure search and sql, but when using it on documentdb it takes up to 60 seconds to load.
We beleive it's a latency issue, but I can't find a workaround to fasten it up,
any documentation, or ideas on where to start looking?
public PagedList(IQueryable<T> superset, int pageNumber, int pageSize, string sortExpression = null)
{
if (pageNumber < 1)
throw new ArgumentOutOfRangeException("pageNumber", pageNumber, "PageNumber cannot be below 1.");
if (pageSize < 1)
throw new ArgumentOutOfRangeException("pageSize", pageSize, "PageSize cannot be less than 1.");
// set source to blank list if superset is null to prevent exceptions
TotalItemCount = superset == null ? 0 : superset.Count();
if (superset != null && TotalItemCount > 0)
{
Subset.AddRange(pageNumber == 1
? superset.Skip(0).Take(pageSize).ToList()
: superset.Skip((pageNumber - 1) * pageSize).Take(pageSize).ToList()
);
}
}

While the LINQ provider for DocumentDB translates .Take() into a "TOP" SQL clause under certain circumstances, DocumentDB has no equivalent for Skip. So, I'm a little surprised it works at all but I suspect that the provider is rerunning the query from scratch to simulate Skip. In the comments here is a discussion led by a DocumentDB product manager on why they chose not to implement SKIP. tl;dr; It doesn't scale for NoSQL databases. I can confirm this with MongoDB (which does have a skip functionality). Later pages simply scan and throw away earlier documents. The later in the list you go, the slower it gets. I suspect that the LINQ implementation is doing something similar except client-side.
DocumentDB does have a mechanism for getting documents in chunks but it works a bit differently than SKIP. It uses a continuation token. You can even set a maxPageSize, however there is no guarantee that you'll get that number back.
I recommend that you implement a client-side cache of your own and use a fairly large maxPageSize. Let's say each page in your UI is 10 rows and your cache currently has 27 rows in it. If the user selects page 1 or page 2, you have enough rows to render the result from the data already cached. If the user select page 7, then you know that you need at least 70 rows in your cache. Use the last continuation token to get more until you have at least 70 rows in your cache and then render rows 61-70. On the plus side, continuation tokens are long lived so you can use them later based upon user input.

Referencing external doc in CouchDB view

I am scraping an 90K record database using JSON-RPC and I am trying to put in some basic error checking. I want to start by scraping the database twice using two different settings and adding a prefix to the second scrape. This way I can check to ensure that the two settings are not producing different records (due to dropped updates, etc). I wanted to implement the comparison using a view which compares each document from the first scrape with it's twin produced by the second scrape and then emit the names of records with a difference between them.
However, I cannot quite figure out how to pull in another doc in the view, everything I have read only discusses external docs using the emit() function, which is too late to permit me to compare it. In the example below, the lookup() function would grab the referenced document.
Is this just not possible?
function(doc) {
if(doc._id.slice(0,1)!=='$' && doc._id.slice(0,1)!== "_"){
var otherDoc = lookup('$test" + doc._id);
if(otherDoc){
var keys = doc.value.keys();
var same = true;
keys.forEach(function(key) {
if ((key.slice(0,1) !== '_') && (key.slice(0,1) !=='$') && (key!=='expires')) {
if (!Object.equal(otherDoc[key], doc[key])) {
same = false;
}
}
});
if(!same){
emit(doc._id, 1);
}
}
}
}

Context
You are correct that this is not possible in CouchDB. The whole point of the map function is that it must be idempotent, otherwise you lose all the other nice benefits of a pre-calculated index.
This is why you cannot access external resources in the map function, whether they be other records or the clock. Any time you run a map you must always get the same result if you put the same record into it. Since there are no relationships between records in CouchDB, you cannot promise that this is possible.
Solution
However, you can still achieve your end goal, just be different means. Some possibilities...
Assuming there is some meaningful numeric value in each doc, you could use a view to take the sum of all those values and group them by which import you did ({key: <batch id>, value: <meaningful number>}). Then compare the two numbers in your client or the browser to see if they match.
A brute force approach would be to use a view to pair the docs that should match. Each doc is on a different row, but they're grouped by a common field. Then iterate through the entire index comparing the pairs. This would certainly be the quickest to code and doesn't depend on your application or data.
Implement a validation function to enforce a schema on your data. Just be warned that this will reduce your write throughput since each written record will be piped out of Erlang and into the JS engine. Also, this is only applicable if you're worried about properly formed records instead of their precise content, which might not be the case.
Instead of your different batch jobs creating different docs, have them place them into the same doc. The structure might look like this: { "_id": "something meaningful", "batch_one": { ..data.. }, "batch_two": { ..data.. } } Then your validation function could compare them or you could create a view that indexes all the docs that don't match. All depends on where in your pipeline you want to do the error checking and correction.
Personally I like the last option better, but only if you don't plan to use the database as is in production. Ie., you wouldn't want to carry around all that extra data in each record.
Hope that helps.
Cheers.

Astyanax key range query

trying to write a query which will paginate through all rows in a column family using astyanax client and RowSliceQuery.
keyspace.prepareQuery(COLUMN_FAMILY).getKeyRange(null, null, null, null, 100);
Done this successfully using hector where 1st call is done with null start and end keys. After retrieving 1st page I use last key from the result to make query for second page and etc. This is code for 1st page using hector.
HFactory.createRangeSlicesQuery(keyspace,
LongSerializer.get(), new CompositeSerializer(),
BytesArraySerializer.get())
.setColumnFamily(COLUMN_FAMILY)
.setRange(null, null, false, 100).setRowCount(100);
Now when I am trying to do this with astyanax I am getting errors about null and non-null keys and tokens. Not sure what tokens do in this query. Also I am able to use allRows(), but would like to do this using key range query as it gives me more flexibility.
Does anybody have an example of key range query using astyanax? I cannot find an example neither in "getting started" documentation or anywhere else on the net.
Thanks!
Anton

What you are referring to is the getRowRange method:
keyspace.prepareQuery(CF_STANDARD1)
.getRowRange(startKey, endKey, startToken, endToken, count)
Note however that this works only when the ByteOrderedPartitioner is used. Since by default Cassandra uses the Murmur3Partitioner, this will usually not work. Using an index to do this instead is recommended. Astyanax also provides the reverse index search recipe which takes advantage of a second column family which stores your keys as columns to allow efficient range searches on the original data.

Check this sample code. I hope this code will help you in doing the paging.
IndexQuery<String, String> query = keyspace
.prepareQuery(CF_STANDARD1).searchWithIndex()
.setRowLimit(10).autoPaginateRows(true).addExpression()
.whereColumn("Index2").equals().value(42);
Best,

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string