Optimizing GraphQL queries by including fields only if they hold a value? - node.js

I have no idea if this is a good idea at all, really bad practice, so that's what I'd like to know.
Say I have a graphql server, which sends back stuff like this:
[
{
name: "John"
age: 55
isSingle: false
hasChildren: false
lovesMusic: false #imagine more fields like these
{,
# ...loads of records of the same format here
]
Say the file is large and there are lots of records: Wouldn't it be a good idea to just strip it off the values that are false and only send something back whenever isSingle / hasChildren / lovesMusic is true, in order to save on bandwidth? Or is this a bad idea, making everything prone to error?
So in this example, I'd just send back:
[
{
name: "John"
age: 55
{,
...
]
I am worried because I have large graphql objects coming back from my queries, and lots of it is just fields that aren't filled out and that I thus won't need in my frontend.

To my understanding The whole point of GraphQL is to receive exactly the specified set of types and fields - not more, not less.
Altering the results with some middle ware or deleting fields containing 0, null, false or undefined would be the violation of Graphql Specification. There's no point in using some kind of convention if you are going to violate it anyways.
Neverthless there are some options to consider for performance and bandwidth optimization:
using shorter names
you could use shorter field names in GraphQL schema. you could even come up with some kind of compression algorithm of your own. For example you could replace logical names with characters and then map them back to regular names on the front-end.
{
name: "John"
age: 55
isSingle: false
hasChildren: false
lovesMusic: false
{
would become something like this:
{
n: "John"
a: 55
s: f
hC: f
lM: f
}
then you could map the field names and values back on a front-end sacrificing some performance of course. You just need to have some dictionary object in javascript which stores shorter field names as key-s and longer ones as values:
const adict = {
"hC": "hasChildren",
"lM": "lovesMusic",
"n": "name",
...
}
Althrough this approach will shrink your data and save you some bandwidth, it is not the best practice application-wise. somehow I think it will make your code less flexible and less readable.
external libraries
there quite a few external libraries that could help you with reducing dataload. most popular ones being Message Pack and Protocol buffers. These are very efficient and cleverly built technologies, but researching and integrating them in your solution might take some time.

Related

Using Input types for query arguments

Is it a bad practice to use graphql input types for query arguments like its commonly done in mutations, for example
createPost(CreatePostInput!): CreatePostPayload
I see that a lot of api's use separate queries to fetch an entity by different fields, for example
userByEmail(email: String!): User
userByName(name: String!): User
as opposed to
user(email: String, name: String): User
This makes sense, but some queries end up requiring more than one argument, for example paginated results. One might need for example, to modify the start/end cursor, results per page, ordering and others, plus the main query might need more than one argument to find the entities, so these queries end up with 5-6 different arguments.
The question is, why don't people use input types for these?
To the answer below, I can't help but wonder why people deem this
query ($category: String, $perPage: Number, $page: Number, $sortBy: String) {
posts(category: $category, perPage: $perPage, page: $page, sortBy: $sortBy) {
...
}
}
friendlier than this
query( $input: PostQueryInput ) {
posts(input: $input) {
...
}
}
Is it because the input types can only contain primitives? I find this really confusing why its better in one case and worse in another.
I know that people are not forced to to this, you can do it as you like, but in the majority of graphql api's I think it is not done like this and I wonder why might that be - people must have a reason not to do this.

Is it possible to update complex data structures in chrome.storage without re-saving everything?

If I want to store persistent data from a chrome extension, I write an object to chrome.storage:
chrome.storage.sync.set({key: value});
As far as I know this means I have to store all my extension's data in a single object. This is fine, but what if I want to update only one field in that data structure? Imagine my chrome.storage.sync is
{
friends: [
{ name: "Billy", age: 28 }
],
enemies: [
{ name: "Penny", age: 18 },
{ name: "Mortimer", age: 51 }
]
}
Now suppose I want to go in and just update Mortimer's age. I have to reconstruct the entire above object, modify the appropriate field, and write the whole object back to storage. Is there no way I can just update a single field?
the assumption in your question is not correct: "this means I have to store all my extension's data in a single object."
you may use as many objects as you wish. for example, one key for the list of friends, and one key for each friend which maps to the first list by some unique id.
this is much easier with indexedDb or the deprecated web sql, as you could still have some large objects that would need a full re-save, like a list of friends.

How do I define Sequelize.STRING length?

I want to define a length of a datatype in sequelize.
There is my source code :
var Profile = sequelize.define('profile', {
public_id: Sequelize.STRING,
label: Sequelize.STRING
})
It create a table profiles with a public_id with a datatype varchar(255).
I would like to define a public_id with varchar(32).
I searched on the doc and the stack but couldn't find any answer...
How can I do that please ?
As it is mentioned in the documentation, use:
Sequelize.STRING(32)
First, I think you need to rethink your design just a little bit. The basic point is that length constraints should be meaningful, not just there to save space. PostgreSQL does not store 'A'::varchar(10) any differently than it does 'A'::text (both are stored as variable length text strings, only as long as the value stored, along with a length specifier and some other metadata), so you should use the longest size that can work for you, and use the lengths for substantive enforcement rather than to save space. When in doubt, don't constrain. When you need to make sure it fits on a mailing label, constrain appropriately.
Secondly Dankohn's answer above:
var Profile = sequelize.define('PublicID', {
public_id: {
validate: { len: [0,32] })
is how you would then add such enforcement to the front-end. Again, such enforcement should be based on what you know you need, not just what seems like a good idea at the time, and while it is generally easier to relax constraints than tighten them, for string length, it's really a no-brainer to do things the other way.
As for using such in other applications, you'd probably want to look up the constraint info in the system catalogs, which gets you into sort of advanced territory.

How dangerous is a mongo query which is fed directly from a URL query string?

I am playing around with node.js, express, and mongoose.
For the sake of getting something up and running right now I am passing the Express query string object directly to a mongoose find function. What I am curious about is how dangerous would this practice be in a live app. I know that a RDBMS would be extremely vulnerable to SQL injection. Aside from the good advice of "sanitize your inputs" how evil is this code:
app.get('/query', function (req, res) {
models.findDocs(req.query, function (err, docs) {
res.send(docs);
});
});
Meaning that a a get request to http://localhost:8080/query?name=ahsteele&status=a would just shove the following into the findDocs function:
{
name: 'ahsteele',
status: 'a'
}
This feels icky for a lot of reasons, but how unsafe is it? What's the best practice for passing query parameters to mongodb? Does express provide any out of the box sanitization?
As far as injection being problem, like with SQL, the risk is significantly lower... albeit theoretically possible via an unknown attack vector.
The data structures and protocol are binary and API driven rather than leveraging escaped values within a domain-specific-language. Basically, you can't just trick the parser into adding a ";db.dropCollection()" at the end.
If it's only used for queries, it's probably fine... but I'd still caution you to use a tiny bit of validation:
Ensure only alphanumeric characters (filter or invalidate nulls and anything else you wouldn't normally accept)
Enforce a max length (like 255 characters) per term
Enforce a max length of the entire query
Strip special parameter names starting with "$", like "$where" & such
Don't allow nested arrays/documents/hashes... only strings & ints
Also, keep in mind, an empty query returns everything. You might want a limit on that return value. :)
Operator injection is a serious problem here and I would recommend you at least encode/escape certain characters, more specifically the $ symbol: http://docs.mongodb.org/manual/faq/developers/#dollar-sign-operator-escaping
If the user is allowed to append a $ symbol to the beginning of strings or elements within your $_GET or $_POST or whatever they will quickly use that to: http://xkcd.com/327/ and you will be a gonner, to say the least.
As far as i know Express doesnt provide any out of box control for sanitization. Either you can write your own Middleware our do some basic checks in your own logic.And as you said the case you mention is a bit risky.
But for ease of use the required types built into Mongoose models at least give you the default sanitizations and some control over what gets into or not.
E.g something like this
var Person = new Schema({
title : { type: String, required: true }
, age : { type: Number, min: 5, max: 20 }
, meta : {
likes : [String]
, birth : { type: Date, default: Date.now }
}
});
Check this for more info also.
http://mongoosejs.com/docs/2.7.x/docs/model-definition.html

CouchDB: Single document vs "joining" documents together

I'm tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to my idea, lets assume we have a stackoverflow page stored in a CouchDB. In essence it consists of the actual question on top, answers and commets. Those are basically three layers.
There are two ways of storing it. Either within a single document containing a suitable JSON representation of the data, or store each part of the entry within a separate document combining them later through a view (similar to this: http://www.cmlenz.net/archives/2007/10/couchdb-joins)
Now, both approaches may be fine, yet both have massive downsides from my current point of view. Storing a busy document (many changes through multiple users are expected) as a signle entity would cause conflicts to happen. If user A stores his/her changes to the document, user B would receive a conflict error once he/she is finished typing his/her update. I can imagine its possible to fix this without the users knowledge through re-downloading the document before retrying.
But what if the document is rather big? I'll except them to become rather blown up over time which would put quite some noticeable delay on a save process, especially if the retry process has to happen multiple times due to many users updating a document at the same time.
Another problem I'd see is editing. Every user should be allowed to edit his/her contributions. Now, if they're stored within one document it might be hard to write a solid auth handler.
Ok, now lets look at the multiple documents approach. Question, Answers and Comments would be stored within their own documents. Advantage: only the actual owner of the document can cause conflicts, something that won't happen too often. Being rather small elements of the whole, redownloading wouldn't take much time. Furthermore the auth routine should be quite easy to realize.
Now here's the downside. The single document is real easy to query and display. Having a lot of unsorted snippets laying around seems like a messy thing since I didn't really get the actual view to present me with a 100% ready to use JSON object containing the entire item in an ordered and structured format.
I hope I've been able to communicate the actual problem. I try to decide which solution would be more suitable for me, which problems easier to overcome. I imagine the first solution to be the prettier one in terms of storage and querying, yet the second one the more practical one solvable through better key management within the view (I'm not entirely into the principle of keys yet).
Thank you very much for your help in advance :)
Go with your second option. It's much easier than having to deal with the conflicts. Here are some example docs how I might structure the data:
{
_id: 12345,
type: 'question',
slug: 'couchdb-single-document-vs-joining-documents-together',
markdown: 'Im tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to...' ,
user: 'roman-geber',
date: 1322150148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 23456,
type: 'answer'
question: 12345,
markdown: 'Go with your second option...',
user : 'ryan-ramage',
votes: 100,
date: 1322151148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 45678,
type: 'comment'
question: 12345,
answer: 23456,
markdown : 'I really like what you have said, but...' ,
user: 'somedude',
date: 1322151158041,
'jquery.couch.attachPrevRev' : true
}
To store revisions of each one, I would store the old versions as attachments on the doc being edited. If you use the jquery client for couchdb, you get it for free by adding the jquery.couch.attachPrevRev = true. See Versioning docs in CouchDB by jchris
Create a view like this
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, null, null], null);
if (doc.type == 'answer') emit([doc.question, doc._id, null], null);
if (doc.type == 'comment') emit([doc.question, doc.answer, doc._id], null) ;
}
}
And query the view like this
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{},{}]&include_docs=true
(Note: I have not url encoded this query, but it is more readable)
This will get you all of the related documents for the question that you will need to build the page. The only thing is that they will not be sorted by date. You can sort them on the client side (in javascript).
EDIT: Here is an alternative option for the view and query
Based on your domain, you know some facts. You know an answer cant exist before a question existed, and a comment on an answer cant exist before an answer existed. So lets make a view that might make it faster to create the display page, respecting the order of things:
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, doc.date], null);
if (doc.type == 'answer') emit([doc.question, doc.date], null);
if (doc.type == 'comment') emit([doc.question, doc.date], null);
}
}
This will keep all the related docs together, and keep them ordered by date. Here is a sample query
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{}]&include_docs=true
This will get back all the docs you will need, ordered from oldest to newest. You can now zip through the results, knowing that the parent objects will be before the child ones, like this:
function addAnswer(doc) {
$('.answers').append(answerTemplate(doc));
}
function addCommentToAnswer(doc) {
$('#' + doc.answer).append(commentTemplate(doc));
}
$.each(results.rows, function(i, row) {
if (row.doc.type == 'question') displyQuestionInfo(row.doc);
if (row.doc.type == 'answer') addAnswer(row.doc);
if (row.doc.type == 'comment') addCommentToAnswer(row.doc)
})
So then you dont have to perform any client side sorting.
Hope this helps.

Resources