I am making a search engine application using Node.js and Neo4j that allows users to submit a graph traversal query via a web-based user interface. I want to give users the option to cancel a query after it has been submitted (i.e. if a user decides to change the query parameters). Thus, I need a way to abort a query using either a command from a Node.js-to-Neo4j driver or via Cypher query.
After a few hours of searching, I haven't been able to find a way to do this using any of the Node.js-to-Neo4j drivers. I also can't seem to find a cypher query that allows killing of a query. Am I overlooking something, or is this not possible with Neo4j? I am currently using Neo4j 2.0.4, but I am willing to upgrade to a newer version of Neo4j if it has query killing capabilities.
Starting from Neo4j version 2.2.0, it's possible to kill queries from the UI and via the REST interface. I don't know if existing nodejs drivers support this feature, but if not you can still achieve the same functionality by making HTTP request to the REST interface.
If you run a query in the browser in version 2.2.x or later, you will notice that there is an (X) close link at the top-right corner of the area where the queries are executed and displayed.
You can achieve the same results by wrapping your queries in transactions and rolling them back. In fact, I just opened the web inspector to see how the browser UI was canceling the running queries.
I think this would be the recommended approach:
Begin a transaction with the Cypher query you want to run. The transaction will be assigned an identifier, i.e. request POST http://localhost:7474/db/data/transaction and response with Location: http://localhost:7474/db/data/transaction/7
If you want to cancel the query, delete the transaction using the identifier you got in step 1, i.e. DELETE http://localhost:7474/db/data/transaction/7
You will find more info about the Transactional Cypher HTTP Endpoint and examples in the official docs.
Update: node-neo4j seems to support transaction handling, including rollback. See docs.
Related
I am integrating elastic search with my Nest application. positing my workflow and question below.
flow:
front-end updating one entity, so in back-end updates the primary storage MYSQL and cache storage elastic search.
Immediately front-end hit the get request to get the updated entity, so the back-end fetches the entity from elastic-search and sends a response to the front-end.
Now the issue is sometimes elastic search giving not updated data due to index not refreshed.
Solutions:
We can use the refresh:'wait-for' config in the update API.
We can trigger the refresh API to refresh the indices
The question is, after the update API, if we trigger the refresh API without waiting for the result. in meantime, if the front-end requested the data what will happen, the elastic search will wait for the refresh indices operation completion or will it serve old data?
As you already know Elasticsearch provides near real time searches, and if you are making a search call, as soon as document is indexed to Elasticsearch, it will not be available unless refresh happened on the index.
Now, As you mentioned, you can solve the issue by using the wait_for with your index/update operation, only downside is that its resource intensive and if you do it frequently on a large index, it can cause severe performance issues in your cluster.
Your second option will not work, as you can still query the Elasticsearch from FE before refresh finishes and you will get obsolete data.
I'm relatively new to webdevelopment and have been using ArangoDB for most of that limited experience. I have a basic understanding of Node.js and creating express based CRUD apps with ArangoDB as the database.
I'm getting to a point though where I'd like to have the ability to query the database from inside the client. Say I would like to have a datalist-type element where the user types words into a searchbar. I'd like the ability to query the database from there rather than having to query the database for all of its files prior to creating the datalist. I have not found a single mention though of using database queries from the client side. I can't imagine that this is not possible. Surely when I search wikipedia through the search bar and it provides me with options I didn't just receive the entire wikipedia documents list upon loading the page? Please steer me in the right direction, I don't know how to tackle this problem.
Have a look at how to build dynamic forms, this will allow you to perform AJAX style calls from the browser window to a back end REST API service. This will allow your back end web service to gather the data for the response (from ArangoDB if required), and respond with that data, most likely in a JSON format.
Your UI can then take that response and dynamically update components in your DOM so that the user can see the data injected into the page without a page reload action taking place.
https://www.pluralsight.com/search?q=ajax is a great place to start.
Alternatively you can have a look at free content like https://www.youtube.com/watch?v=tNKD0kfel6o
We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything.
Here are a few options that come to mind
Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query.
Spark results are pumped into a messaging queue from which a socket server like connection is made.
What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection.
Can someone with expertise help in that sense?
The standard way I think would be to use Livy, as you mention. Since it's a REST API you wouldn't expect to get a JSON response containing the full result (could be gigabytes of data, after all).
Rather, you'd use pagination with ?from=500 and issue multiple requests to get the number of rows you need. A web application would only need to display or visualize a small part of the data at a time anyway.
But from what you mentioned in your comment to Raphael Roth, you didn't mean to call this API directly from the web app (with good reason). So you'll have an API layer that is called by the web app and which then invokes Spark. But in this case, you can still use Livy+pagination to achieve what you want, unless you specifically need to have the full result available. If you do need the full results generated on the backend, you could design the Spark queries so they materialize the result (ideally to cloud storage) and then all you need is to have your API layer access the storage where Spark writes the results.
We've got an application in Django running against a PGSQL database. One of the functions we've grown to support is real-time messaging to our UI when data is updated in the backend DB.
So... for example we show the contents of a customer table in our UI, as records are added/removed/updated from the backend customer DB table we echo those updates to our UI in real-time via some redis/socket.io/node.js magic.
Currently we've rolled our own solution for this entire thing using overloaded save() methods on the Django table models. That actually works pretty well for our current functions but as tables continue to grow into GB's of data, it is starting to slow down on some larger tables as our engine digs through the current 'subscribed' UI's and messages out appropriately which updates are needed as which clients.
Curious what other options might exist here. I believe MongoDB and other no-sql type engines support some constructs like this out of the box but I'm not finding an exact hit when Googling for better solutions.
Currently we've rolled our own solution for this entire thing using
overloaded save() methods on the Django table models.
Instead of working on the app level you might want to work on the lower, database level.
Add a PostgreSQL trigger after row insertion, and use pg_notify to notify external apps of the change.
Then in NodeJS:
var PGPubsub = require('pg-pubsub');
var pubsubInstance = new PGPubsub('postgres://username#localhost/tablename');
pubsubInstance.addChannel('channelName', function (channelPayload) {
// Handle the notification and its payload
// If the payload was JSON it has already been parsed for you
});
See that and that.
And you will be able to to the same in Python https://pypi.python.org/pypi/pgpubsub/0.0.2.
Finally, you might want to use data-partitioning in PostgreSQL. Long story short, PostgreSQL has already everything you need :)
Couch has a REST interface.
This means that data-updates are exclusive to PUT calls.
I'm inspecting ways to implement a humble analyics counters, and came accross the features of couchdb, sofa and couchapp - which are kin'da cool, having in mind my strong JavaScript orientation.
However, most web-analytics servcies end with making count update calls using requesting some resource, usually in an IMG or SCRIPT tag.
Is there a way I can use couchApp to
use GET request to perform my counts?
Would that be abuse of the
architecture? I mean, not everything in couch is REST - i,g, - the administration parts are not.
I'd be very happy to hear what the experts have to say :)
** Editted *
I just noted that CouchDB and Sofa are shipped with a Mochiweb web-server!
Maybe there's a way I could hook on that?
Fork or plugin idea
If you are an Erlang programmer (or you're looking for a new project to learn Erlang), then you definitely can write anything you want as a plugin/extension to CouchDB. The smallest example I know of is Die CouchDB, my proof-of-concept which adds one query that will simply stop the server.
https://github.com/iriscouch/die_couchdb
You could in principle write a plugin or fork of CouchDB to handle GET requests and do anything with them.
Note about REST architecture
I am not super familiar with analytics implementations, but the point of REST and HTTP is that GET queries have no side-effects and/or are idempotent (running 50 queries is no different from running one).
The upshot is, proxies can and will cache many GET responses, in both standard and nonstandard ways. That seems incompatible with user tracking and data gathering techniques; however maybe analytics tools still think the benefits outweigh the costs.
For most people, it's probably easier to use external tools.
Log idea
One trick is to GET anything from Couch, and then check the log entry from couch. You can get the couch log by querying /_log as the admin. The log will show users' IP address, request path, and any query parameters.
For example
$ curl -X GET http://localhost:5984/?userid=abcde\&windowsize=1024x768\&color=blue
{"couchdb":"Welcome","version":"1.1.0"}
$ curl localhost:5984/_log | grep userid
[Mon, 23 May 2011 00:34:54 GMT] [info] [<0.1409.0>] 127.0.0.1 - - 'GET' /?userid=abcde&windowsize=1024x768&color=blue 200
Next you can process that log entry and re-insert into your actual analytics database yourself.
Wrapper idea
A final solution is to run a simple reverse-proxy which converts your GET requests into whatever you need. NodeJS is getting popular for tasks like that, but you can use any web platform you prefer: PHP, ASP, JSP, whatever you know already.
You simply respond to the GET request and do whatever you need on the server side, such as inserting the relavant information into your analytics db.
Good luck!