Play Enumerator hanging - node.js

I'm using Play 2.1.2 and I'm having some trouble with an Enumerator and I'm seeking ideas on how to debug this.
I'm trying to stream some S3 data through my server. I can get an InputStream from the Amazon SDK for my S3 file (getObject(bucket, key).getObjectContent()). I then turn that InputStream into an Enumerator[Array[Byte]] using Enumerator.fromStream.
All this type checks and on my local development machine it all works perfectly. When I formulate my Result in Play, I just return Ok.stream(enum).
The problem comes when I deploy this to a production server. The very first time I request the file, it works just fine and I get the whole file. But subsequent times it frequently gets part way through (different amounts each time) and then gets "stuck". I wrapped the Enumerator as follows to be able to log whether the enumeration completed:
val wrapped = enum.onDoneEnumerating { println("Contents fully enumerated"); }
Ok.stream(wrapped);
As expected, on my development machine (and the first time through on the production machines), I get the message "Contents fully enumerated". But after that, the production machine will start the download of the file, but it doesn't finish (in both the HTTP sense and in the Enumerator sense).
I'm not sure how to debug this. Obviously, fromStream does some magic and I don't know how to figure out what is happening between chunks. I thought this might be a thread pool issue so I wrapped the whole response in a future { blocking { ... } } block, but it didn't appear to make any difference.
I'm trying to avoid the hassle of creating a local temporary file from S3 and then building my Enumerator from that. Using the fromStream to create an Enumerator seemed like the elegant way to do this...if it worked.
Suggestions?

OK, so I think I figured this out. It turns out that on the Play side, things appear to be working. I tried all kinds of variations (different ways of constructing the enumerator, creating temporary files, etc). It didn't really matter.
What did matter was the proxy I was using. I'm using node-http-proxy and if I make a request on the server behind the proxy, I get the correct response (directly from Play). If I make the request on the server outside the proxy, I get an incorrect (empty) response. So it looks like the proxy is "dropping" the response.
It appears the issue is that the response is chunked by the stream call and this is causing the problem with the proxy. If I reformulate my response to be:
SimpleResult( header = ResponseHeader(200), body = enum)
Then play uses the Enumerator to construct a complete response (not streamed) and things work again. Of course, it is stupid to have to form the complete response in this case, but it does work. Hopefully I can find a better solution than this in the long term, but this seems to work for now.

Related

Tracking Discord with GA4

Bots are amazing, unless you're Google Analytics
After many months of learning to host my own Discord bot, I finally figured it out! I now have a node server running on my localhost that sends and receives data from my Discord server; it works great. I can do all kinds of the things I want to with my Discord bot.
Given that I work with analytics everyday, one project I want to figure out is how to send data to Google Analytics (specifically GA4) from this node server.
NOTE: I have had success in sending data to my Universal Analytics property. However, as awesome as that was to finally see pageviews coming into, it was equally heartbreaking to recall that Google will be getting rid of Universal Analytics in July of this year.
I have tried the following options:
GET/POST requests to the collect endpoint
This option presented itself as impossible from the get-go. In order to send a request to the collection endpoint, a client_id must be sent along with the request itself. And this client_id is something that must be generated using Google's client id algorithm. So, I can't just make one up.
If you consider this option possible, please let me know why.
Install googleapis npm package
At first, I thought I could just install the googleapis package and be ready to go, but that idea fell on its face immediately too. With this package, I can't send data to GA, I can only read with it.
Find and install a GTM npm package
There are GTM npm packages out there, but I quickly found out that they all require there to be a window object, which is something my node server would not have because it isn't a browser.
How I did this for Universal Analytics
My biggest goal is to do this without using Python, Java, C++ or any other low level languages. Because, that route would require me to learn new languages. Surely it's possible with NodeJS alone... no?
I eventually stumbled upon the idea of actually hosting a webpage as some sort of pseudo-proxy that would send data from the page to GA when accessed by something like a page scraper. It was simple. I created an HTML file that has Google Tag Manager installed on it, and all I had to do was use the puppeteer npm package.
It isn't perfect, but it works and I can use Google Tag Manager to handle and manipulate input, which is wonderful.
Unfortunately, this same method will not work for GA4 because GA4 automatically excludes all identified bot traffic automatically, and there is no way to turn that setting off. It is a very useful feature for GA4, giving it quite a bit more integrity than UA, and I'm not trying to get around that fact, but it is now the Bane of my entire goal.
https://support.google.com/analytics/answer/9888366?hl=en
Where to go from here?
I'm nearly at the end of my wits on figuring this one out. So, either an npm package exists out there that I haven't found yet, or this is a futile project.
Does anyone have any experience in sending data from NodeJS to GA4? (or even GTM?) How did you do it?
...and this client_id is something that must be generated using Google's client id algorithm. So, I can't just make one up...
Why, of course you can. GA4 generates it pretty much the same as UA does. You don't need anything from google to do it.
Besides, instead of mimicking just requests to the collect endpoint, you may just wanna go the MP route right away: https://developers.google.com/analytics/devguides/collection/protocol/ga4 The links #dockeryZ gave, work perfectly fine. Maybe try opening them in incognito, or in a different browser? Maybe you have a plugin blocking analytics urls.
Moreover, you don't really need to reinvent the bicycle. Node already has a few packages to send events to GA4, here's one looking good: https://www.npmjs.com/package/ga4-mp?activeTab=readme
Or you can just use gtag directly to send events. I see a lot of people doing it even on the front-end: https://www.npmjs.com/package/ga-gtag Gtag has a whole api not described in there. Here's more on gtag: https://developers.google.com/tag-platform/gtagjs/reference Note how the library allows you to set the client id there.
The only caveat there is that you'll have to track client ids and session ids manually. Shouldn't be too bad though. Oh, and you will have to redefine the concept of a pageview, I guess. Well, the obvious one is whenever people post in the chan that is different from the previous post in a session. Still, this will have to be defined in the code.
Don't worry about google's bot traffic detection. It's really primitive. Just make sure your useragent doesn't scream "bot" in it. Make something better up.

How HTTP response is generated

I'm fairly new to programming and this question is about making sure I get the HTTP protocol correctly. My issue is that when I read about HTTP request/response, it looks like it needs to be in a very specific format with a status code, HTTP version number, headers, a blank line followed by the body.
However, after creating a web app with nodejs/express, I never once had to actually write code that made an HTTP response in this format (I'm assuming, although I don't know for sure that other frameworks like ruby on rails or python/Django are the same). In the express app, I just set up the route handlers to render the appropriate pages, when a request was made to that route.
Is this because express is actually putting the response in the correct HTTP format behind the scenes? In other words, if I looked at the expressJS code, would there be something in that code that actually makes an HTTP response in the HTTP format?
My confusion is that, it seems like the HTTP request/response format is so important but somehow I never had to write any code dealing with it for a node/express application. Maybe this is the entire point of a framework like express... to take out the details so that developers can deal with business logic. And if that is correct, does anyone ever write web apps without a framework to do this. Would you then be responsible for writing code that puts the server's response into the exact HTTP format?
I'm fairly new to programming and this question is about making sure I get the HTTP protocol correctly. My issue is that when I read about HTTP request/response, it looks like it needs to be in a very specific format with a status code, HTTP version number, headers, a blank line followed by the body.
Just to give you an idea, there are probably hundreds of specifications that have something to do with the HTTP protocol. They deal with not only the protocol itself, but also with the data format/encoding for everything you send including headers and all the various content types you can send, authentication schemes, caching, status codes, URL decoding, etc.... You can see some of the specifications involved just by looking here: https://www.w3.org/Protocols/.
Now a simple request and a simple text response could get away with only knowing a few of these specifications, but life is not always that simple.
Is this because express is actually putting the response in the correct HTTP format behind the scenes? In other words, if I looked at the expressJS code, would there be something in that code that actually makes an HTTP response in the HTTP format?
Yes, there would. A combination of Express and the HTTP library that is built into node.js handle all the details of the specification for you. That's the advantage of using a library/framework. They even handle different versions of the protocol and feedback from thousands of other developers have helped them to clean up edge case bugs. A good library/framework allows you to still control any detail about the response (headers, content types, status codes, etc..) without making you have to go through the detail work of actually creating the exact response. This is a good thing. It lets you write code faster and lets you ride on the shoulders of others who have already figured out minutiae details that have nothing to do with the logic of your app.
In fact, one could say the same about the TCP protocol below the HTTP protocol. No regular app developer wants to write their own TCP stack. Instead, you just want a working TCP stack that you can use that's already been tuned and debugged for you.
However, after creating a web app with nodejs/express, I never once had to actually write code that made an HTTP response in this format (I'm assuming, although I don't know for sure that other frameworks like ruby on rails or python/Django are the same). In the express app, I just set up the route handlers to render the appropriate pages, when a request was made to that route.
Yes, this is a good thing. The framework did the detail work for you. You just call res.setHeader(), res.status(), res.cookie(), res.send(), res.json(), etc... and Express makes the entire response for you.
And if that is correct, does anyone ever write web apps without a framework to do this. Would you then be responsible for writing code that puts the server's response into the exact HTTP format?
If you didn't use a framework or library of any kind and were programming at the raw TCP level, then yes you would be responsible for all the details of the HTTP protocol. But, hardly anybody other than library developers ever does this because frankly it's just a waste of time. Every single platform has at least one open source library that does this already and even if you were working on a brand new platform, you could go get an open source body of code and port it to your platform much quicker than you could write all this yourself.
Keep in mind that one of the HUGE advantages of node.js is that there's an enormous body of open source code (mostly in NPM and Github) already prepackaged to work with node.js. And, because node.js is server-side where code memory isn't usually tight and where code just comes from the local hard disk at server init time, there's little downside to grabbing a working and tested package that does what you already need, even if you're only going to use 5% of the functionality in the package. Or, worst case, clone an existing repository and modify it to perfectly suit your needs.
Is this because express is actually putting the response in the
correct HTTP format behind the scenes?
Yes, exactly, HTTP is so ubiquitous that almost all programming languages / frameworks handle the actual writing and parsing of HTTP behind the scenes.
Does anyone ever write web apps without a framework to do this. Would
you then be responsible for writing code that puts the server's
response into the exact HTTP format?
Never (unless you're writing code that needs very low level tweaking of HTTP code or something)

Modifying content delivered by node-http-proxy

Due to some limitations about the web services I am proxying, I have to inject some JS code so that it allows the iframe to access the parent window and perform some actions.
I have built a proxy system with node-http-proxy which works pretty nicely. However I have spent unmeasurable hours trying to modify the content (on my own, using harmon as well, etc) that is being sent to the user without any success. I have found some articles and even some questions here but all of them are outdated and are not useful anymore.
I was wondering if someone can give me an actual example about how to do this, because I am unable to do it and maybe it is just that it is impossible to do at this point?
I haven't tried harmon, but I did try cheerio and it works.
However, I used http-mitm-proxy and not node-http-proxy.
If you are using http-mitm-proxy, you need to return a promise in the response handler. Otherwise, the proxy continues to send the original response without picking up your changes.
I have recently written another proxy at:
https://github.com/noeltimothy/noelsproxy
I'm going to add response handling to this soon. This one uses a callback mechanism, which means it wont return the response until the caller signals it to.
You should be able to use 'cheerio' and alter the content in JQuery style.

Output something once per request

I'm creating a module that exports a method that can may be called several times by any code in node.js using it. The method will be called usually from views and it will output some html/css/js. Some of this html/css/js however only needs to be output once per page so I'd like to output it only the first time the module is called per request. I can accomplish doing it the first time the module is called ever but again the method of my module can be called several times across several requests for the time the server is up so I specifically want to run some specific code only once per page.
Furthermore, I want to do this while requiring the user to pass as little to my method as possible. If they pass the request object when creating the server I figure I can put a variable in there that will tell me if my method was already called or not. Ideally though I'd like to avoid even that. I'm thinking something like the following from within my module:
var http = require('http');
http.Server.on('request', function(request, response){
console.log('REQUEST EVENT FIRED!');
// output one-time css
});
However this doesn't work, I assume it's because I'm not actually pointing to the Server emitter that was/may have been created in the script that was originally called. I'm new to node.js so any ideas, clues or help is greatly appreciated. Thanks!
Setting a variable on the request is an accepted pattern. Or on the response, if you don't even want to pass the request to your function.
One more thing you can do is indeed, like you write, have the app add a middleware and have that middleware either output that thing.
I'm not sure if I completely understand your "problem" but what you are trying to achieve seems to me like building a web application using Node.js. I think you should use one of the web frameworks that are available for Node so you can avoid reinventing the wheel (writing routing, static files serving etc. yourself).
Express framework is a nice place to start. You can find tons of tutorials around the internet and it has strong community: http://expressjs.com/

With ExpressJS or Node, Is there an easy way to read an external image into memory and serve it?

I'm using an external service to create images. I'd like my users to be able to hit my API and ask for the image. Then my Express server would retrieve it from the external service, then serve it to the user. Sort of like a proxy I suppose, but not exactly.
Is there an easy way to do this, preferably one that doesn't involve downloading the image to the hard drive, then reading it back in and serving it?
Using the request library, I was able to come up with this:
var request = require("request");
exports.relayImage = function(req, res){
request(req.params.url).pipe(res);
}
That seems to work. If there is a more efficient way to do this (meaning on server resources, not in terms of lines of code), speak up!
What you are doing is exactly what you should be doing, and is the most efficient method. Using pipe, the data is sent as it comes in, requiring no additional resources than are needed to buffer and transmit.
Also be mindful of content type and other response headers that you may want to relay. Finally, realize that you've effectively built an open proxy where anyone can request anything they want through your servers. This is a bit dangerous, so be sure to lock it down in your final application.
You should be able to use the http module to make a request to the external image service with a callback that returns the image as the response. It won't write to disk unless you explicitly tell it to.

Resources