Geomesa accumulo CURD data operations using WFS geoserver - accumulo

I have created geomesa accumulo datastore and can query features using command line. Now i want to perform data operations using Open Geospatial Consortium's (OGC) Web Feature Service (WFS) for creating, modifying and exchanging vector format geographic information. I don't want to create proxy client or deal with thrifts for programatically operating with accumulo storage. Instead what are other techniques to insert and read using filters for geomesa accumulo storage data.

GeoMesa integrates with GeoServer to support such use cases.
Using WFS to read data is a very common use case. To write data to a layer in GeoServer, you'll want check out WFS Transactions (also called WFS-T). I've used both with GeoMesa and GeoServer successfully.
Check out http://www.geomesa.org/documentation/user/architecture.html#geomesa-and-geoserver for background about GeoMesa and GeoServer. This link provides information about setting up GeoMesa's Accumulo DataStore with GeoServer (http://www.geomesa.org/documentation/user/accumulo/install.html#installing-geomesa-accumulo-in-geoserver). General GeoMesa-GeoServer usage is documented here: http://www.geomesa.org/documentation/user/geoserver.html.
For some quick links to GeoServer's WFS details, I'd suggest reading through (http://docs.geoserver.org/latest/en/user/services/wfs/reference.html) and checking out the demos which come with GeoServer (http://docs.geoserver.org/latest/en/user/configuration/demos/index.html#demos).

Related

How to process large .kryo files for graph data using TinkerPop/Gremlin

I am new to Apache TinkerPop.
I have done some basic stuff like installing TinkerPop Gremlin console, creating graph .kryo file, loaded it in gremlin console and executed some basic gremlin queries. All good till now.
But i wanted to check how can we process .kryo files which are very much large in size says more than 1000GB. If i create a single .kryo file, loading it in console(or using some code) is not feasible i think.
Is there any way we can deal with graph data which is pretty huge in size?
basically i have some graph based data stored in Amazon Neptune DB, i want to take it out and store it in some files(e.g .kryo) and process later for gremlin queries. Thanks in advance.
Rather than use Kyro which is Java specific, I would recommend using something more language agnostic such as CSV files. If you are using Amazon Neptune you can use the Neptune Export tool to export your data as CSV files.
Documentation
Git repo
Cloud Formation template

Is it possible to create a graph database using AQL in Arangodb?

It seems the options to create a graph within Arangodb are:
The Web Interface
Arangosh using the general-graph module
The provided drivers using the object based API
The HTTP API
Is it possible to create the necessary components to build a graph using AQL???
For background, I am trying to assess options for bootstrapping graphs in different environments and potentially performing migrations in production environments.
No, at the moment AQL is only a DML (data manipulation language), but no DDL (data definition language).
To create a graph, please use one of the other methods you listed.

Geolocation App Google Cloud

I've no experience with geo location based apps and want to build a geolaction based app with a backend written in nodejs and running on google cloud.
My main problem is how to design the database and which db should I use (Bigtable or Datastore)? The main query is to query places at a given location and radius. I have read a lot about the geohash, but the nodejs librarys aren't so good now.
So what are you recommend me for chosing and designing database?
If you want to store the data in relational format, perform frequent
joins between location/co-ordinates and the amount of data being
processed is less (>50 GB), then go for Google Cloud SQL.
Cloud Bigtable is ideal for storing very large amounts of
single-keyed data with very low latency. It has great integration
services with most of the Apache projects.
If there is no requirement of data to be in the relational format,
and frequent insertions and updations are required on huge amounts of
data, go for Google Cloud Datastore. The querying process would be
slightly different and difficult for a naive person to understand.
You can also use Google BigQuery which processes TBs of data within a
few seconds, if frequent insertions and updations are not required.
It is more of a data store.
Have a look at the following URL for better insights: https://cloud.google.com/storage-options/
Google has also announced Cloud Spanner which is a relational
database service that offers great consistency and speed (still be
beta). It is still in early stage, but can revolutionise the concepts
of SQL vs NoSQL.
All of the above databases have querying libraries written for NodeJS.
GeoMesa, an Apache licensed open source suite of tools that enables large-scale geospatial analytics, works with Cloud Bigtable. I don't know how well this will interact with node.js, but it's worth considering a framework like GeoMesa since it will likely enable you to focus more on your core product.

How to query a gemesa-accumulo feature using command line

I ingested data in geomesa accumulo using sfts and converters, Data was ingested successfully and i can visualise the same data using geoserver plugin. I want to filter feature data using command line but however not able to find any commands to do so. Please correct me if am wrong but i want to query feature data set just like done RDMS or so.
The GeoMesa command line tools support querying via the 'export' command. This command uses CQL (which is the same query language that GeoServer supports).
Check out these links for more about the GeoMesa export command.
http://www.geomesa.org/documentation/user/accumulo/commandline_tools.html#export
http://www.geomesa.org/documentation/user/accumulo/examples.html#exporting-features
For more about CQL, see the GeoTools (http://docs.geotools.org/latest/userguide/library/cql/cql.html) and GeoServer documentation (http://docs.geoserver.org/stable/en/user/tutorials/cql/cql_tutorial.html, and http://docs.geoserver.org/latest/en/user/filter/ecql_reference.html).

NoSQL database: ArangoDB

I have been looking for a database that can be embedded and also be file-based, like Sqlite.
I wanted a NoSQL type of database with this kind of feature.
The language is Python, and ArangoDB has binding for Python, and many other languages.
I am finding conflicting facts about ArangoDB.
In some cases I have seen articles say it is not an embedded DB, or can't be embedded, then see others that imply it is embedded.
Also on the website it says that it stores its data in a special binary format, and then I see an article saying its mainly an In-Memory database.
So its been very confusing.
1)So the question is, can this database run embedded in a python app?
If not, if it runs as a separate process, runs as a server, can this be generated/managed in Python with "zero configuration" on the part of the user, for the sake of deploying a desktop app based on this.
2) Does the database data etc get stored on disk.
SO that is it!
No, you can't embedd ArangoDB in the way you embedd SQLite.
ArangoDB offers the Foxx framework, which you can use to implement RESTfull microservices in JavaScript close to the database core like you would use python with SQLite. However, with AQL ArangoDB also offers a query language as SQLite does with sql.
There are currently several python drivers available that grant you access to ArangoDB from python in a compfortable manner.
The ArangoDB download page offers several packages, which you could use to deploy ArangoDB alongside your app. We offer a windows zip package that you could install by yourselves without user interaction; For linux distributions you'd probably want to use the respective package for that distribution. Easy deployability is one of our core goals.
Regarding the database and your data itself, this gets persisted to disk. This works via memory mapped files. However, the index and other structures are built up during the startup, which is why we refer to ourselves as mostly in memory.
Regular access to ArangoDB (and foxx) is done via the http interface and you get json documents as response. The drivers abstract that interface for you. If you implement foxx apps, you may need to formulate requests on your own.
ArangoDB Datafiles aren't intended to be moved across machines; though it may work as long as you have the same OS & Architectures on both sides. The proper way of doing this is to use ArangoDump on the first machine and ArangoRestore on the second. These are mostly json inside (one json document per line) so they're portable and even simple to load in python - you could even directly access the dump facility from python, and prepare an email for the user with the content.
The most sustainable way of running ArangoDB would be as a service; please note that you may need elevated privileges to register & re/start new services in Windows. The service then binds a tcp port, which you may access from other nodes in the network.

Resources