Custom Computed Etag for Express.js - node.js

I'm working on a simple local image server that provides images to a web application with some JSON. The web application has pagination that will do a get request "/images?page=X&limit&200" to an express.js server that returns the JSON files in a single array. I want to take advantage of the browser's internal caching such that if a user goes to a previous page the express.js returns an ETAG. I was wondering how this could be achieved with express.js? For this application, I really just want the computation of the ETAG to take in three parameters the page, the directory, and the limit (It doesn't need to consider the whole JSON body). Also this application is for local use only, so I want the server to do the heavy lifting since I figured it be faster than the browser. I did see https://www.npmjs.com/package/etag which seems promising, but I'm not sure how to use it with express.js
Here's a boilerplate of the express.js code I have below:
var express = require('express');
var app = express();
var fs = require('fs');
app.get('/', async (req, res) =>{
let files = [];
let directory = fs.readdirSync("mypath");
let page = parseInt(req.query.page);
let limit = parseInt(req.query.limit);
for (let i = 0; i < limit; ++i) {
files.push(new Promise((resolve) => {
fs.readFile(files[i + page * limit].name, (err, data) => {
// format the data so easy to use for UI
resolve(JSON.parse(data));
});
});
}
let results = await Promise.all(files);
// compute an etag here and attach it the results.
res.send(results);
});
app.listen(3000);

When your server sends an ETag to the client, it must also be prepared to check the ETag that the client sends back to the server in the If-None-Match header in a subsequent "conditional" request.
If it matches, the server shall respond with status 304; otherwise there is no benefit in using ETags.
var serverEtag = "<compute from page, directory and limit>";
var clientEtag = req.get("If-None-Match");
if (clientEtag === serverEtag) res.status(304).end();
else {
// Your code from above
res.set("ETag", serverEtag);
res.send(results);
}
The computation of the serverEtag could be based on the time of the last modification in the directory, so that it changes whenever any of the images in that directory changes. Importantly, this could be done without carrying out the fs.readFile statements from your code.

Related

Node.js Express Temporary File Serving

I'm trying to do a reverse image search using googlethis on an image the user uploads. It supports reverse image searching, but only with a Google-reachable image URL. Currently, I upload the image to file.io, which deletes it after it gets downloaded.
This is the current application flow:
User POSTs file -> Server uploads file to file.io -> Google downloads the file -> Server does things with the reverse image search
However, I want to skip the middleman and have Google download files directly from the server:
User POSTs file -> Server serves file at unique URL -> Google downloads the file -> Server deletes the file -> Server does things with the reverse image search
I've looked at Serving Temporary Files with NodeJs but it just shows how to serve a file at a static endpoint. If I added a route to /unique-url, the route would stay there forever (a very slow memory leak! Probably! I'm not really sure!)
The only way I can think of is to save each file with a UUID and add a parameter: /download?id=1234567890, which would probably work, but if possible, I want to do things in memory.
So:
How do I do this using normal files?
How do I do this in-memory?
Currently working (pseudo) code:
app.post('/', (req, res) => {
const imagePath = saveImageTemporarily(req)
const tempUrl = uploadToFileIo(imagePath)
const reverseImageResults = reverseGoogleSearch(tempUrl)
deleteFile(imagePath)
doThingsWithResults(reverseImageResults).then((result) => { res.send(result) })
}
The other answer is a good one if you are able to use Redis -- it offers lots of helpful features like setting a time-to-live on entries so they're disposed of automatically. But if you can't use Redis...
The basic idea here is that you want to expose a (temporary) URL like example.com/image/123456 from which Google can download an image. You want to store the image in memory until after Google accesses it. So it sounds like there are two (related) parts to this question:
Store the file in memory temporarily
Rather than saving it to a file, why not create a Buffer holding the image data. Once you're done with it, release your reference to the buffer and the Node garbage collector will dispose of it.
let image = Buffer.from(myImageData);
// do something with the image
image = null; // the garbage collector will dispose of it now
Serve the file when Google asks for it
This is a straightforward route which determines which image to serve based on a route parameter. The query parameter you mention will work, and there's nothing wrong with that. Or you could do it as a route parameter:
app.get('/image/:id', (req, res) => {
const id = req.params.id;
res.status(200).send(/* send the image data here */);
});
Putting it all together
It might look something like this:
// store image buffers here
const imageStore = {};
app.post('/image', (req, res) => {
// get your image data here; there are a number of ways to do this,
// so I leave it up to you
const imageData = req.body;
// and generate the ID however you want
const imageId = generateUuid();
// save the image in your store
imageStore[imageId] = imageData;
// return the image ID to the client
res.status(200).send(imageId);
});
app.get('/image/:id', (req, res) => {
const imageId = req.params.id;
// I don't know off the top of my head how to correctly send an image
// like this, so I'll leave it to you to figure out. You'll also need to
// set the appropriate headers so Google recognizes that it's an image
res.status(200).send(imageStore[imageid]);
// done sending? delete it!
delete imageStore[imageId];
});
I would use REDIS for the in-memory DB, and on the server, I would transform the image to base64 to store it in Redis.
In Redis, you can also set TTL on the images.
Check my code below
import {
nanoid
} from 'nanoid'
function base64_encode(file) {
// read binary data
var bitmap = fs.readFileSync(file);
// convert binary data to base64 encoded string
return new Buffer(bitmap).toString('base64');
}
app.post('/', async(req, res) => {
const client = redisClient;
const imagePath = saveImageTemporarily(req)
//const tempUrl = uploadToFileIo(imagePath)
var base64str = base64_encode(imagePath);
const id = nanoid()
await client.set(id, JSON.stringify({
id,
image: base64str
}));
const reverseImageResults = reverseGoogleSearch(JSON.parse(await client.get(id)).image)
await client.del(id);
doThingsWithResults(reverseImageResults).then((result) => {
res.send(result)
})
}

Using Firebase function as a proxy server

I built an app with Vuejs which is hosted on firebase, I recently added dynamic rendering with rendertron to improve SEO, I'm hosting the rendertron on Heroku. The rendertron client work well.
In order to send requests coming from bots like googlebot to rendertron and recieve a compiled HTML file, I used firebase function, it checks for the user agent, if it's a bot then it sends it to the rendertron link, if it's not, it fetches the app and resend result.
Here's the function code:
const functions = require('firebase-functions');
const express = require('express');
const fetch = require('node-fetch');
const url = require('url');
const app = express();
const appUrl = 'khbich.com';
const renderUrl = 'https://khbich-render.herokuapp.com/render';
function generateUrl(request){
return url.format({
protocol:request.protocol,
host:appUrl,
pathname:request.originalUrl
});
}
function detectBot(userAgent){
let bots = [
"googlebot",
"bingbot",
"facebookexternalhit",
"twitterbot",
"linkedinbot",
"facebot"
]
const agent = userAgent.toLowerCase()
for(let bot of bots){
if(agent.indexOf(bot)>-1){
console.log('bot-detected',bot,agent)
}
}
}
app.get('*', (req,res)=>{
let isBot = detectBot(req.headers['user-agent']);
if(isBot){
let botUrl= generateUrl(req);
fetch(`${renderUrl}/${botUrl}`)
.then(res => res.text())
.then(body=>{
res.set('Cache-Control','public','max-age=300','s-maxage=600')
res.set('Vary','User-Agent');
res.send(body.toString())
})
}
else{
fetch(`https://${appUrl}`)
.then(res=>res.text())
.then(body=>{
res.send(body.toString())
})
}
});
I used the function as an entry point for firebase hosting, so it's invoked whenever someone enters the app.
I checked on the firebase dashboard to see if it's working, and I noticed that it crashed for exceeding the number of requests per 100 second quota, I don't have much users when I checked, and the function invocations reached 370 calls in one minute.
I don't see why I had a large number of calls all at once, I'm thinking that maybe since I'm fetching the website if the user agent is not a bot, then the function is re-invoked causing an infinite loop of invocations, but I don't know if that's really why ?
If it's an infinite loop, how can I redirect users to their entered url without reinvoking the function ? will a redirect work ?

NodeJs Flow Process

i recently started working on nodeJs. But got confused between single thread concept wile using global or var keyword variables across diffrent pages which are inluded using required. Below is my code.
var express = require("express");
var app = express();
var mysql = require("mysql");
var comm_fun = requre("./common_functions");
var global_res = ''; // variable to send the reponse back to browser from any function
var global_req = '';// variables to save request data
app.listen(1234,function(req,res){
console.log("server started");
global_res = res;
global_req = req;
mysql = '';// code to have mysql connection in this variable
});
now, as i can use mysql, global_res and global_req variables in diffrent files which are included. But will these effects the values for another request, as they are global.
for example,
request 1 has value 1 for "global_req",
at same time request 2 comes
request 2 has value 2 for "global_req"
will these two request will collide at any point of time. can at any point of time can "global_req" be overwriten from 1 to 2 , as second request has arrived. Or both are diffrent request and will not collide at any point of time.
Thanks,
Yes, and here's a very simple scenario:
Request 1 comes in, sets global ID to 1
Request 1 then performs some IO-bound operation (e.g. DB query)
Meanwhile request 2 comes in and sets the global ID to 2
The IO-bound operation completes and request 1 continues but the global ID is now 2 and not 1
The argument about Node being single-threaded is only applicable if your application is purely CPU-bound, however, given the majority of Node applications are IO-bound then they are in-effect multi-threading. The problem lies in the fact we can't guarantee the order in which callbacks will return, and with that in mind it's fairly simple to create a race condition e.g.
let global_id = 0;
...
app.use(() => global_id++);
app.post('/create', async (req, res) => {
try {
const exists = db.query(`SELECT id FROM table WHERE id = ${global_id}`);
if (!exists) {
db.query(`INSERT INTO ....`);
}
} catch (e) {
return next(e);
}
});
If 2 requests hit the /create endpoint simultaenously, it would be very unlikely that both would succeed (at least correctly).

How to properly use dataloaders across multiple users?

In caching per request the following example is given that shows how to use dataloaders in express.
function createLoaders(authToken) {
return {
users: new DataLoader(ids => genUsers(authToken, ids)),
}
}
var app = express()
app.get('/', function(req, res) {
var authToken = authenticateUser(req)
var loaders = createLoaders(authToken)
res.send(renderPage(req, loaders))
})
app.listen()
I'm confused about passing authToken to genUsers batch function. How should a batch function be composed to use authToken and to return each user corresponding results??
What the example is saying that genUsers should use the credentials of the current request's user (identified by their auth token) to ensure they can only fetch data that they're allowed to see. Essentially, the loader gets initialised at the start of the request, and then discarded at the end, and never recycled between requests.

node express routing decision based on user-agent

Trying to figure out a way of supplying better data to social media (open graph data). Basically, when facebook, twitter or pinetrest asks for information about a link on my page, I want to provide them og information dependent on link instead of sending them the empty page (OK, it sends javascripts that they dont run).
I tried using prerender and similar, but cant get that to run propperly. But I also realised that I would rather get the express router to identify it and service a static page based on the request.
As a first step, I need to get the user agent information:
So I thought I would add express-useragent, and that seems to work on my test site, but does not seem like facebooks scraper ever goes past it. I can see it tries to get a picture, but never updates the OG or the index. (code below should work as an example)
var express = require('express');
var router = express.Router();
var useragent = require('express-useragent');
//Set up log
var cfgBunyan = require('../config/bunyan')
var log = cfgBunyan.dbLogger('ROUTE')
router.use(useragent.express());
/* GET home page. */
router.get('/', function(req, res, next) {
console.log(req.useragent);
res.render('index');
});
router.get('/share/:service', function(req, res, next) {
res.render('index');
});
router.get('/pages/:name', function (req,res, next){
log.info('/pages/'+req.params.name)
res.render('pages/'+req.params.name);
});
router.get('/modals/:name', function (req,res, next){
res.render('modals/'+req.params.name);
});
router.get('/page/:name', function (req,res, next){
res.render('index');
});
module.exports = router;
I can also tun the google test scraper, which gives me the following source
source: 'Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)' }
So has anyone figured out a easy way to direct facebook and twitter to another route? Or is sitting and checking the different sources the right way?
OK, so I managed to figure out a potential solution.
Basically, I created a function called isBot, which I call similar to how Authentication works, it will send the request to isBot, and check if.
1. ?_escaped_fragment_= is pressent in the url (Google and some others use that)
2. if the user agent is a known bot (Thanks prerender.io, borrowed your list from .htaccess for your service)
The setup is simple enough.
Add (You don't have to, Rob was right) express-useragent to your router (just to be able to get info from the header)
//var useragent = require('express-useragent'); //Not needed ror used
//router.use(useragent.express()); // Thought this was required, it is not
Then in any route you want to check for bots add isBot:
router.get('/', isBot ,function(req, res, next) {
Then add the below function (it does a lot of logging using bunyan, as I want to have statistics, you can remove any line that starts log.info, it should still work, or add bunyan, or just change the lines to console.log. Its just output.
If the code decides the code isn't a bot, it just renders as normal
function isBot (req, res, next){
var isBotTest = false;
var botReq = "";
var botID= ""; //Just so we know why we think it is a bot
var knownBots = ["baiduspider", "facebookexternalhit", "twitterbot", "rogerbot", "linkedinbot","embedly|quora\ link\ preview","howyoubot","outbrain","pinterest","slackbot","vkShare","W3C_Validator"];
log.info({http_user_agent: req.get('User-Agent')});
//log.info({user_source: req.useragent.source}); //For debug, whats the HTTP_USER_AGENT, think this is the same
log.info({request_url: req.url}); //For debug, we want to know if there are any options
/* Lets start with ?_escaped_fragment_=, this seems to be a standard, if we have this is part of the request,
it should be either a search engine or a social media site askign for open graph rich sharing info
*/
var urlRequest=req.url
var pos= urlRequest.search("\\?_escaped_fragment_=")
if (pos != -1) {
botID="ESCAPED_FRAGMENT_REQ";
isBotTest = true; //It says its a bot, so we believe it, lest figure out if it has a request before or after
var reqBits = urlRequest.split("?_escaped_fragment_=")
console.log(reqBits[1].length)
if(reqBits[1].length == 0){ //If 0 length, any request is infront
botReq = reqBits[0];
} else {
botReq = reqBits[1];
}
} else { //OK, so it did not tell us it was a bot request, but maybe it is anyway
var userAgent = req.get('User-Agent');
for (var i in knownBots){
if (userAgent.search(knownBots[i]) != -1){
isBotTest = true;
botReq=urlRequest;
botID=knownBots[i];
}
}
}
if (isBotTest == true) {
log.info({botID: botID, botReq: botReq});
//send something to bots
} else {
log.info("We don't think this is one of those bots any more")
return next();
}
}
Oh, and currently it does not respond to the bot requests. If you want to do that, just add a res.render or res.send at the line that says //send something to bots

Resources