PubSub preventing app engine from running - node.js

I'm working on a google cloud project that involves moving data from pubsub,
splitting the packet into several pieces and submitting each individual
piece to datastore. Datastore and the splitting parts work just fine, but when it comes to getting the message from pubsub, the window just keeps loading with no result. I've let it run for a while now and the request does not time out.
I tried including the subscription.on inside a setIntervalcall so that I can switch it off and on for a span of time, but this does not prevent the app from stalling.
Could this be happening because the subscription.on function is running continuously, not exiting and allowing express (the framework generating the app engine's page) to generate the page? If so, how can I avert this? What other approach could I be taking to successfully get data from pubsub?
I'm willing to provide more details if necessary. Thanks in advance.
EDIT: Including pubsub call
var psMessage = '';
subscription.on('message', function(message) {
psMessage = message;
message.ack();
});

Related

Send request progress to client side via nodejs and express

I am using this (contentful-export) library in my express app like so
const app = require('express');
...
app.get('/export', (req, rex, next) => {
const contentfulExport = require('contentful-export');
const options = {
...
}
contentfulExport(options).then((result) => {
res.send(result);
});
})
now this does work, but the method takes a bit of time and sends status / progress messages to the node console, but I would like to keep the user updated also.. is there a way I can send the node console progress messages to the client??
This is my first time using node / express any help would be appreciated, I'm not sure if this already has an answer since im not entirely sure what to call it?
Looking of the documentation for contentful-export I don't think this is possible. The way this usually works in Node is that you have an object (contentfulExport in this case), you call a method on this object and the same object is also an EventEmitter. This way you'd get a hook to react to fired events.
// pseudo code
someLibrary.on('someEvent', (event) => { /* do something */ })
someLibrary.doLongRunningTask()
.then(/* ... */)
This is not documented for contentful-export so I assume that there is no way to hook into the log messages that are sent to the console.
Your question has another tricky angle though. In the code you shared you include a single endpoint (/export). If you would like to display updates or show some progress you'd probably need a second endpoint giving information about the progress of your long running task (which you can not access with contentful-export though).
The way this is usually handled is that you kick of a long running task via a certain HTTP endpoint and then use another endpoint that serves infos via polling or or a web socket connection.
Sorry that I can't give a proper solution but due to the limitation of contentful-export I don't think there is a clean/easy way to show progress of the exported data.
Hope that helps. :)

API that will continuously return data

Beginner here, I'm using Firebase real time database and I need my API to constantly return that value when something has been added see my code below.
apiCalls.get('/api/getallusers',function(req,res){
userFunc.getAllUsers(function(err,result){
if (err) return res.status(500).send('internal server error!');
res.status(200).write(JSON.stringify(result));
res.end();
return res;
})
})
this will return the error
Error [ERR_STREAM_WRITE_AFTER_END]: write after end
but if i remove res.end it will show 1 record and constantly load until the page times out..
is what I'm doing possible or are there different ways to do it.
also I'm using firebase cloud functions for this api.
UPDATE:
Uploaded the API but it does not return anything...
here is the link https://us-central1-testproject-e6819.cloudfunctions.net/api1/api/getUser
tried axios and Event Source
Firebase functions logs the values but it does not return it..
If you're viewing the API response like a web page, your browser is buffering the data it's received until there's enough of it to form a more full page. Your browser is expecting content that ends, not some endless stream of data.
You should remove .end() if you expect to be able to continue to write to the output stream.
Also, I recommend using the Server-Sent Events (SSE) protocol for this. https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events It provides a nice standards-based abstraction that makes it very easy to handle event streams client-side.
const eventSource = new EventSource('https://api.example.com/someApi');
eventSource.addEventListener('userupdate', (e) => {
console.log(e.data);
});
Server-side, there are a couple Express-based middlewares to make this even easier than it already is.
Operations in Cloud Functions must be relatively short-lived and end deterministically. There is no way to keep a connection open from Cloud Functions to the client.
Typically consider what triggers the need to send new data. For example, if it is triggered by the fact that a new user is registered, you can use trigger your Cloud Functions from Firebase Authentication. Then the function could for example write to the Realtime Database (or Cloud Firestore), and your client/app listens to the database for realtime updates. That way you're using all the pieces of Firebase in the way they're designed: Cloud Functions for short-lived updates triggered from events in the system, and the Realtime Database or Cloud Firestore for sending realtime updates.
If that doesn't work for your use-case, you'll need a runtime environment that allows you to keep processes alive. Something like App Engine flex, Kubernetes, or many other options come to mind for that.

Node app that fetches, processes, and formats data for consumption by a frontend app on another server

I currently have a frontend-only app that fetches 5-6 different JSON feeds, grabs some necessary data from each of them, and then renders a page based on said data. I'd like to move the data fetching / processing part of the app to a server-side node application which outputs one simple JSON file which the frontend app can fetch and easily render.
There are two noteworthy complications for this project:
1) The new backend app will have to live on a different server than its frontend counterpart
2) Some of the feeds change fairly often, so I'll need the backend processing to constantly check for changes (every 5-10 seconds). Currently with the frontend-only app, the browser fetches the latest versions of the feeds on load. I'd like to replicate this behavior as closely as possible
My thought process for solving this took me in two directions:
The first is to setup an express application that uses setTimeout to constantly check for new data to process. This data is then sent as a response to a simple GET request:
const express = require('express');
let app = express();
let processedData = {};
const getData = () => {...} // returns a promise that fetches and processes data
/* use an immediately invoked function with setTimeout to fetch the data
* when the program starts and then once every 5 seconds after that */
(function refreshData() {
getData.then((data) => {
processedData = data;
});
setTimeout(refreshData, 5000);
})();
app.get('/', (req, res) => {
res.send(processedData);
});
app.listen(port, () => {
console.log(`Started on port ${port}`);
});
I would then run a simple get request from the client (after properly adjusting CORS headers) to get the JSON object.
My questions about this approach are pretty generic: Is this even a good solution to this problem? Will this drive up hosting costs based on processing / client GET requests? Is setTimeout a good way to have a task run repeatedly on the server?
The other solution I'm considering would deal with setting up an AWS Lambda that writes the resulting JSON to an s3 bucket. It looks like the minimum interval for scheduling an AWS Lambda function is 1 minute, however. I imagine I could set up 3 or 4 identical Lambda functions and offset them by 10-15 seconds, however that seems so hacky that it makes me physically uncomfortable.
Any suggestions / pointers / solutions would be greatly appreciated. I am not yet a super experienced backend developer, so please ELI5 wherever you deem fit.
A few pointers.
Use crontasks for periodic processing of data. This is far preferable especially if you are formatting a lot of data.
Don't setup multiple Lambda functions for the same task. It's going to be messy to maintain all those functions.
After processing / fetching the feed, you can store the JSON file in your own server or S3. Note that if it's S3, then you are paying and waiting for a network operation. You can read the file from your express app and just send the response back to your clients.
Depending on the file size and your load in the server you might want to add a caching server so that you can cache the response until new JSON data is available.

How do I use the foreach method in MongoDB to do scraping/API calls without getting blacklisted by sites?

I have about 20 documents currently in my collection (and I'm planning to add many more probably in the 100s). I'm using the MongoDB Node.js clients collection.foreach() method to iterate through each one and based on the document records go to 3 different endpoints: two APIs (Walmart and Amazon) and one website scrape (name not relevant). Each document contains the relevant data to execute the requests and then I update the documents with the returned data.
The problem I'm encountering is the Walmart API and the website scrape will not return data toward the end of the iteration. Or at least my database is not getting updated. My assumption is that the foreach method is firing off a bunch of simultaneous requests and either I'm bumping up against some arbitrary limit of simultaneous requests allowed by the endpoint or the endpoints simply can't handle this many requests and ignore anything above and beyond its "request capacity." I've ran some of the documents that were not updating through the same code but in a different collection that contained just a single document and they did update so I don't think it's bad data inside the document.
I'm running this on Heroku (and locally for testing) using Node.js. Results are similar both on Heroku instance and locally.
If my assumption is correct I need a better way to structure this so that there is some separation between requests or maybe it only does x records on a single pass.
It sounds like you need to throttle your outgoing web requests. There's a fantastic node module for doing this called limiter. The code looks like this:
var RateLimiter = require('limiter').RateLimiter;
var limiter = new RateLimiter(1, 1000);
var throttledRequest = function() {
limiter.removeTokens(1, function() {
console.log('Only prints once per second');
});
};
throttledRequest();
throttledRequest();
throttledRequest();

Async profiling nodejs server to review the code?

We encountered performance problem on our nodejs server holding 100k ip everyday.
Now we want to review the code and find the bottle-neck.
#jfriend00 from what we can see now, the problem seems to be DB access and file access. But we don't know what logic caused this access.
We are still looking for good ways to do the async profiling of nodejs server.
Here's what we tried
Nodetime
This works for us to some extent. It can give the executing time of code specified to the lines. However, we can't locate the error because the server works async and no stacking and calling info can be determined.
Async-profiling
This works with async and is said to be the first of this kind.
Problem is, we've integrated it's js code with our server-side code.
var AsyncProfile = require('async-profile')
AsyncProfile.profile(function () {
///// OUR SERVER-SIDE CODE RESIDES HERE
setTimeout(function () {
// doAsyncStuff
});
});
We can only record the profile of one time of server execution for one request. Can we use this code with things like forever? I've no idea with this.
dtrace
This is too general for us to locate problem in nodejs code.
Do you have any idea on profiling nodejs server code? Any hints or suggestions are appreciated. Thanks.

Resources