Cosmos db REST API order by with partitioning - azure

I'm using the REST API with SQL for Cosmos db and need to return a query's results ordered by a timestamp (which is stored as UNIX numeric timestamp). I'm trying to do this with a simple ORDER BY.
e.g. SELECT * FROM requests c ORDER BY c.timestamp
However with partitioning I get the error:
"Cross partition query with TOP/ORDER BY or aggregate functions is not
supported."
In the collection settings the indexing precision for strings is set to -1, which was a suggestion from elsewhere, however the error is still thrown.
If I remove or set to false the x-ms-documentdb-query-enablecrosspartition header then I get:
"Cross partition query is required but disabled. Please set
x-ms-documentdb-query-enablecrosspartition to true, specify
x-ms-documentdb-partitionkey, or revise your query to avoid this
exception."
Has anyone had any success doing this via the SQL REST API?
Thanks.

I reproduce your issue on my side.
However , based on this official statement, java sdk and node.js sdk are support for TOP and ORDER BY queries on partitioned collections. I test the same query via sdk and it works.
Update Answer:
I used the Flidder tool to observe the requests from the SDK, and I found three requests included.
One:
When I run the above sdk code, the first request as below and the exactly the same error that rest received. However , sdk will do retry for me to get the _rid property of partition.
Two:
I did not find any clear official explanation about this.However , after reading this article, I suppose the cross partition here refers to physical partitions , not logical partitions. So the request help us to get the "rid" of the physical partition which your data stored in and PartitionKeyRanges.
Three:
Then sdk send update request for me with additional property :x-ms-documentdb-partitionkeyrangeid. And query results returned correctly. Please notice the update sql in the last request.
I think you could emulate the requests SDK did for us to fulfill your needs.

Related

Spark results accessible through API

We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything.
Here are a few options that come to mind
Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query.
Spark results are pumped into a messaging queue from which a socket server like connection is made.
What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection.
Can someone with expertise help in that sense?
The standard way I think would be to use Livy, as you mention. Since it's a REST API you wouldn't expect to get a JSON response containing the full result (could be gigabytes of data, after all).
Rather, you'd use pagination with ?from=500 and issue multiple requests to get the number of rows you need. A web application would only need to display or visualize a small part of the data at a time anyway.
But from what you mentioned in your comment to Raphael Roth, you didn't mean to call this API directly from the web app (with good reason). So you'll have an API layer that is called by the web app and which then invokes Spark. But in this case, you can still use Livy+pagination to achieve what you want, unless you specifically need to have the full result available. If you do need the full results generated on the backend, you could design the Spark queries so they materialize the result (ideally to cloud storage) and then all you need is to have your API layer access the storage where Spark writes the results.

Import data from Clio to Azure database using API v4

Let me start out by saying I am a SQL Server Database expert, not a coder so making API calls is certainly not an everyday task for me.
Having said that, I am trying to use the Azure Data Factory's data copy tool to import data from Clio to an Azure SQL Server database. I have had some limited success, data is copied over using the API and inserted into the target table but paging really seems to be an issue. I am testing this with the billable_clients call and the first 25 records with the fields I specify are inserted along with the paging record. As I understand, the billable_clients call is eligible for bulk actions which may be the solution, although I've not been able to figure out how it works. The url I am calling is below:
https://app.clio.com/api/v4/billable_clients.json?fields=id,unbilled_hours,name
Using Postman I've tried to make the same call while adding X-BULK true to the header but that returns no results. If there is anyone that can shed some light on how the X-BULK header flag is used when making a call, or if anyone has any experience loading Clio data into a SQL Server database I'd love some feedback on your methods.
If any additional information regarding my attempts or setup would help please let me know.
Thanks!
you need to download the json files with Bulk API and then update them in DB.
It isn't possible to directly insert the data

Syncing Problems with Xamarin Forms and Azure Easy Tables

I've been working on a Xamarin.Forms application in Visual Studio using Azure for the backend for a while now, and I've come across a really strange issue.
Please note, that I am following the methods mentioned in this blog
For some strange reason the PullAsync() method seems to have some bizarre problems. Any data that I create and sync will only be pulled by PullAsync() from that solution. What I mean by that is that if I create another solution that accesses the exact same backend, it can perform it's own create/sync data, but will not bring over the data generated by the other solution, even though they both seem to have the exact same access. This appears to be some kind of a security feature/issue, but I can't quite make sense of it.
Has anyone else encountered this at all? Was there a work-around at all? This could potentially cause problems down the road if I were to ever want to create another solution that accesses the same system/data for whatever reason.
For some strange reason the PullAsync() method seems to have some bizarre problems. Any data that I create and sync will only be pulled by PullAsync() from that solution.
According to your provided tutorial, I found that the related PullAsync is using Incremental Sync.
await coffeeTable.PullAsync("allCoffees", coffeeTable.CreateQuery());
Incremental Sync:
the first parameter to the pull operation is a query name that is used only on the client. If you use a non-null query name, the Azure Mobile SDK performs an incremental sync. Each time a pull operation returns a set of results, the latest updatedAt timestamp from that result set is stored in the SDK local system tables. Subsequent pull operations retrieve only records after that timestamp.
Here is my test, you could refer to it for a better understanding of Incremental Sync:
Client : await todoTable.PullAsync("todoItems-02", todoTable.CreateQuery());
The client SDK would check if there has a record with the id equals deltaToken|{table-name}|{query-id} from the __config table of your SQLite local store.
If there has no record, then the SDK would send a request as following for pulling your records:
https://{your-mobileapp-name}.azurewebsites.net/tables/TodoItem?$filter=(updatedAt%20ge%20datetimeoffset'1970-01-01T00%3A00%3A00.0000000%2B00%3A00')&$orderby=updatedAt&$skip=0&$top=50&__includeDeleted=true
Note: the $filter would be set as (updatedAt ge datetimeoffset'1970-01-01T00:00:00.0000000+00:00')
While there has a record, then the SDK would pick up the value as the latest updatedAt timestamp and send the request as follows:
https://{your-mobileapp-name}.azurewebsites.net/tables/TodoItem?$filter=(updatedAt%20ge%20datetimeoffset'2017-06-26T02%3A44%3A25.3940000%2B00%3A00')&$orderby=updatedAt&$skip=0&$top=50&__includeDeleted=true
Per my understanding, if you handle the same logical query with the same query id (non-null) in different mobile client, you need to make sure the local db is newly created by each client. Also, if you want to opt out of incremental sync, pass null as the query ID. In this case, all records are retrieved on every call to PullAsync, which is potentially inefficient. For more details, you could refer to How offline synchronization works.
Additionally, you could leverage fiddler for capturing the network traces when you invoke the PullAsync, in order to troubleshoot your issue.

How to abort query on demand using Neo4j drivers

I am making a search engine application using Node.js and Neo4j that allows users to submit a graph traversal query via a web-based user interface. I want to give users the option to cancel a query after it has been submitted (i.e. if a user decides to change the query parameters). Thus, I need a way to abort a query using either a command from a Node.js-to-Neo4j driver or via Cypher query.
After a few hours of searching, I haven't been able to find a way to do this using any of the Node.js-to-Neo4j drivers. I also can't seem to find a cypher query that allows killing of a query. Am I overlooking something, or is this not possible with Neo4j? I am currently using Neo4j 2.0.4, but I am willing to upgrade to a newer version of Neo4j if it has query killing capabilities.
Starting from Neo4j version 2.2.0, it's possible to kill queries from the UI and via the REST interface. I don't know if existing nodejs drivers support this feature, but if not you can still achieve the same functionality by making HTTP request to the REST interface.
If you run a query in the browser in version 2.2.x or later, you will notice that there is an (X) close link at the top-right corner of the area where the queries are executed and displayed.
You can achieve the same results by wrapping your queries in transactions and rolling them back. In fact, I just opened the web inspector to see how the browser UI was canceling the running queries.
I think this would be the recommended approach:
Begin a transaction with the Cypher query you want to run. The transaction will be assigned an identifier, i.e. request POST http://localhost:7474/db/data/transaction and response with Location: http://localhost:7474/db/data/transaction/7
If you want to cancel the query, delete the transaction using the identifier you got in step 1, i.e. DELETE http://localhost:7474/db/data/transaction/7
You will find more info about the Transactional Cypher HTTP Endpoint and examples in the official docs.
Update: node-neo4j seems to support transaction handling, including rollback. See docs.

Sql Azure - Timeout on query

I have setup an Azure website with a SQL Server Azure back-end. I used a migration tool to populate a single table with 80000 rows of data. During the data migration I could access the new data via the website without any issues. Since the migration has completed I keep getting a exception: [Win32Exception (0x80004005): The wait operation timed out].
This exception suggests to me that the database queries I am doing are taking more than 30 seconds to return. If I query the database from Visual Studio I can confirm that the queries are taking more than 30 seconds to return. I have indexes on my filter columns and on my local SQL database my queries take less than a second to return. Each row does contain a varchar(max) column that stores json which means that a bit of data is held in each row, but this shouldn't really affect the query performance.
Any input that could help me sole this issue would be much appreciated.
I seem to be around the query timeout issues for now. What appeared to do the trick for me was to update the SQL Server stats.
EXEC sp_updatestats;
Another performance enhancement that worked well was to enable json compression on my azure website.
See: enter link description here

Resources