Turn off JSON deserialization of results from spanner - google-cloud-spanner

We pull many large-ish payloads from Spanner, and the current performance bottleneck is deserializing the result from Spanner into objects in memory. We don't need these objects, as the result is just going to pass through to another application. I haven't seen anything in the SDK documentation about this, but I'm sure there is a way to provide a custom deserialization, or to turn it off altogether. We currently use the Javascript and Java SDKs, so any advice on either is appreciated. Thanks!

Related

Using Arangodb to implement a queue

Is there a way to implement a thread-safe concurrent queue store using arangodb?
I read this article from RocksDB that using a KV store, a scalable persistent queue service can be implemented "easily", does this apply to ArangoDB as well? I read somewhere that Arango uses RocksDB as storage engine for it's KV store, so I was wondering if someone has already tried this.
Thanks!
I have tried this, but for whatever reason (probably bone-headed implementation) I ran into resource contention issues (deadlock), even at low usage rates.
ArangoDB does indeed use RocksDB as the default storage engine (MMFiles is deprecated) but doesn't expose RocksDB internals other than a few knobs to tweak for performance tuning. If you want something VERY similar to the RocksDB-based solution, ArangoDB is probably not what you are looking for, but ArangoDB does provide a sort of K/V solution.
Since ArangoDB only supports two "collection" types ("document" and "edge"), a K/V-store is really an implementation method, not an option you choose. Their idea is to use the native "_key" attribute (present in every document, unique, automatically indexed) with a single "value" attribute, creating a document like:
{
"_key": "my awesome key name",
"value": "supercool"
}
Part of my use-case was to create a queue of "nonce" tokens that I would pick-up when a request came in, to act as a sort of cheap resource governor. However, the queue quickly became overwhelmed when I tried to go with sub-1-second query rates, giving me deadlocks when it tried to access/lock tokens that were being written.
Again, I believe this could have been sorted out, but the project went in a different direction and I never ended up troubleshooting it to completion.
Use Transactions, for more details check how ArangoDB implements Foxx queues, foxx/queues code # github

NodeJS storing large object - JSON file vs Database

I am loading a few big JSON data from 3rd party API on server startup and write them into .JSON files (150mb json files), loading it into an object whenever I need to use it.
The thing is, I am not sure this is the right and efficient way to do so. Should I use a database instead? If yes, could you mention which one to use?
Thanks.
glad to answer your question.
Modern databases are already able to keep up with large file sizes, so in this case size would not be an issue.
However, the issue regarding performance is that it still depends on the usage and purpose of the application.
For example, sometimes the application might require content caching, in this case most databases already have this function built-in, however, there are also applications where this won't apply.
This issue also discusses the comparison of disk storage and database storage, there are lots of good answers in there, I hope it will help.

Does the Datastore API for NodeJS use `distinct on` by default?

https://cloud.google.com/datastore/docs/concepts/queries#datastore-distinct-on-query-nodejs
When reading the documentation about querying entities, I noticed that keys-only queries and projection queries without a distinct on clause are considered small operations, which according to their quota and pricing are considered free.
However, when you look at the examples from different languages on that page, it looks like several (C#, Java, PHP, etc...) support a way of telling the query to specifically perform a distinct on operation, but there doesn't seem to be support in NodeJS for specifying this directly. This seems to significantly impact cost, but NodeJS is missing support.
What am I missing?
I don't think the NodeJS API use distinct by default, though you should be able to do a simple test to confirm. Looking through the examples tells me that NodeJS API uses a slightly different terminology, and calls it groupBy for fetching distinct results. Here is the link to the API Documentation.

Java SOAP client optimisations and async client

Need to consume a third party web service exposed over SOAP. But the response is having too deep object graphs and the response time is very high: 40-60 sec.
JAXB marshalling and unmarshalling also add significant latency on top of it. Is there a way to reduce the latency using protobuf/thrift. Also some recent application modules are migrated to Vertx. While CXF has asychHttpClient built in, is there any equivalent module in Vertx. What is the advantage of using it over CXF client?
You're referencing 2 separate issues here.
1) The third party service takes 40-60 seconds to respond.
Most like there is no way for you to speed up the response so you have to deal with it as it is, and the choice you use just depends on your application. Vert.x may help in this regard because it is asynchronous by design.
2) The object graph of the response is large and JAXB deserialization has serious overhead.
Most likely Thrift or Protocol Buffers won't help you too much because they are completely different technologies than SOAP/XML. Probably the issue you are having is the fact that JAXB reads the entire message into memory and then creates a complete object graph for it, regardless of the amount of data you actually need. If you don't actually need all of the data, you should investigate using something like the Streaming API for XML (StAX) that is part of the Java platform. It allows you to parse an XML message without creating an object model for data that you do not need.

Azure Table Storage design question: Is it a good idea to use 1 table to store multiple types?

I'm just wondering if anyone who has experience on Azure Table Storage could comment on if it is a good idea to use 1 table to store multiple types?
The reason I want to do this is so I can do transactions. However, I also want to get a sense in terms of development, would this approach be easy or messy to handle? So far, I'm using Azure Storage Explorer to assist development and viewing multiple types in one table has been messy.
To give an example, say I'm designing a community site of blogs, if I store all blog posts, categories, comments in one table, what problems would I encounter? On ther other hand, if I don't then how do I ensure some consistency on category and post for example (assume 1 post can have one 1 category)?
Or are there any other different approaches people take to get around this problem using table storage?
Thank you.
If your goal is to have perfect consistency, then using a single table is a good way to go about it. However, I think that you are probably going to be making things more difficult for yourself and get very little reward. The reason I say this is that table storage is extremely reliable. Transactions are great and all if you are dealing with very very important data, but in most cases, such as a blog, I think you would be better off just 1) either allowing for some very small percentage of inconsistent data and 2) handling failures in a more manual way.
The biggest issue you will have with storing multiple types in the same table is serialization. Most of the current table storage SDKs and utilities were designed to handle a single type. That being said, you can certainly handle multiple schemas either manually (i.e. deserializing your object to a master object that contains all possible properties) or interacting directly with the REST services (i.e. not going through the Azure SDK). If you used the REST services directly, you would have to handle serialization yourself and thus you could more efficiently handle the multiple types, but the trade off is that you are doing everything manually that is normally handled by the Azure SDK.
There really is no right or wrong way to do this. Both situations will work, it is just a matter of what is most practical. I personally tend to put a single schema per table unless there is a very good reason to do otherwise. I think you will find table storage to be reliable enough without the use of transactions.
You may want to check out the Windows Azure Toolkit. We have designed that toolkit to simplify some of the more common azure tasks.

Resources