Cloud Run 502 Bad Gateway - Protocol Error - node.js

Currently working on a microservice architecture where I have one particular microservice that will have a specific mechanism :
Receiving a Request saying it needs some data
Sending Status 202 - Accepted To Client
Generating Data and Saving it to a redis instance
Receiving a Request to see if data is ready
Data is not ready in redis instance : Sending status 102 To Client
Data is ready in redis instance : sending it back
The first point works fine with this kind of code :
res.sendStatus(202)
processData(req)
But I have different behavior Locally and when hosted on Cloud Run for the second point.
Locally, the 2nd request is not handled while the first one process is not ended and I presumed it was normal on a threading perspective.
Is there something that might be used to make express still handle the other request while the first one is sent to the client but the process is not ended ?
But considering that Google Cloud Run is based on instances and auto-scaling, I thought that well, the first one is locked because the process is not ended ? No problem ! A new one will come and handle the other request that will then check the redis instance key status.
It seems that I was wrong ! When I do the call to check the status of the data, if the data is not yet done, Cloud run send me back this error (502 Gateway) :
upstream connect error or disconnect/reset before headers. reset reason: protocol error
However, I don't have any res status to 502 so it seems that either Cloud Run or Express send this itself.
My only option would be to split my Cloud Run instance into a Cloud Function + a Cloud Run. The Cloud run would trigger the process in a Cloud Function but I'm pretty short on time so if I don't have any other option I will have to do that but I would hope to be able to manage it without introducing a new Cloud Function
Do you have any explanation about the fact that id doesn't worky locally and on Cloud run ?
My considerations are not convincing me and I don't find any truth :
Maybe a client can't do 2 request at the same time : Which seems not logical
Maybe express can't handle several request at the same time : Which does not seems logical to me
Any clues that seems more plausible ?

Related

Does AWS ApiGateway Socket Splits Messages at 4 KB?

I'm trying to implement WebRTC Signalling Server using AWS ApiGateway with javascript in order to connect 2 clients (Client A - Web Browser; Client B - Unreal Engine's PixelStreaming Plugin).
(Note: For this implementation I'm not using the signalling sever implementation provided by Unreal, but I'm creating a replacement implementation using AWS ApiGateway)
At first I connect successfully both clients to the Api Gateway WebSocket sever and setup the required configuration.
Then I'm generating the WebRTC "offer" from the browser (Client A) and send it using method ApiGatewayManagementApiClient::PostToConnectionCommand (JS AWS SDK v3) via Lambda to the receiver's ConnectionID (Client B). I expect when the receiver (Client B) get the offer to generate an "answer". Instead of an answer from Client B I get the following error:
Error: Failed to parse SS message ...<message-chunk>...
In the logs I get this error 2 times but with different chunks of my offer (first beginning of the JSON and second somewhere in the middle). This lead me to believe that the message got split, so I tried removing parts of the JSON (shorten it) until I stop getting the error. Doing this I found that this error disappears when the exact length of the JSON message is 4096 chars (4 KB). When I send event one byte over this I get the error (2 times).
According to AWS documentation they have a maximum frame size of 32 KB. My full message is around 7 KB, so I expect that it is going to be sent at once. Unfortunately this doesn't seem to be the case my message is split to 4 KB mark.
I have working implementation of the exact same case in NodeJS and building a custom server with the ws library. Inside this implementation everything is working correctly (my 7 KB offer get delivered in once piece and I receive answer and the connection is initiated without errors). I send "offer" and then receive "answer" back.
I've used wscat to connect to both servers (the NodeJS and the ApiGateway) assuming the role of Client B and I got both "offers". Then I compared them
and they are exactly the same. (In wscat the message doesn't look to be split, but I assume that they concat the chunks before showing it.)
Based on that I can assert that I'm configuring the clients properly and that the "offer" is good to receive "answer". The only difference is that if I send the offer from the NodeJS I receive answer and all works, but if I send exactly the same offer from AWS ApiGateway I got error that the data cannot be parsed (because it is split in 4 KB chunks).
I have spent 3 days trying to resolve this issue and I'm certain that everything I send and the way I send it is correct. So I come to conclusion that I for some reason hit some limit in AWS API Gateway or Lambda (I saw that Lambda has 4 KB limit on environment variables, but I'm not using them so it should not be related) which causes my message to be split in parts. This causes the Unreal's Pixel Streaming Plugin to try deserialize the JSON in chunks instead of the whole JSON message at once and this results in 2 errors that the data cannot be parsed.
I cannot change the code of the client to first wait all the chunks to arrive and then process it. I have to use it "as is". (It is a plugin and it is written in C++)
I also tried to relay the API API Gateway Socket Message via NodeJS client and then to create a NodeJS Web Socket server connect the Unreal Engine Plugin to my NodeJS server and forward the message via it to the Unreal Engine Plugin. (AWS -> NodeJS Client -> NodeJS Server -> Unreal instead of AWS -> Unreal). This variant is working correctly without problem and I receive answer. The problem is that I introduce an additional NodeJS client and Server which beats the purpose of using API Gateway at first place...
So my question is: Is is possible that AWS API Gateway is splitting my message in 4KB chunks (or Lambda does it) and if it is so is it possible to somehow increase this limit so I receive the whole message at once instead?
I can provide code in case it is needed, but I don't think the problem is related to the concrete implementation.

Why Azure VM is not receiving a HTTP GET response?

I've encountered an interesting problem when trying to make a HTTP request from Azure VM. It appears that when the request is ran from this VM the response never arrives. I tried using a custom C# code that makes an HTTP request and Postman. In both cases we can see in the logs on the target API side that the response has been sent, but no data is received on the origin VM. The exact same C# request and Postman request work outside of this VM in multiple networks and machines. The only tool that actually works for this request on VM side is Curl Bash terminal but it is not an option based on current requirements.
Tried on multiple Azure VM sizes, on Windows 10 and Windows Server 2019.
The target API is on-premise hosted and it requires around 5 minutes for the data to be sent back. Payload is very small but due to the computing performed on the API side it takes a while to generate. Modifying this API is not an option.
So to be clear- the requests are perpetually stuck until the timeout on client side is reached (if it was configured). Does anybody know what could be a reason for this?
If these transfers take longer than 4 minutes without keep alives, Azure will typically close the connection.
You should be able to see this by monitoring the connection with wireshark.
TCP Timeouts can be configured when using a Load Balancer, but you can also try adding keep alives in your API server if possible.

Alternative to GraphQL long polling on an Express server for a large request?

Objective
I need to show a big table of data in my React web app frontend.
My backend is an Express server with a GraphQL layer and a few "normal" endpoints.
My server gets data from various sources, including an external API, which is the data source for my current task.
My server has a database that I can use freely. I cannot directly access the external API from my front end.
The data all comes from the external API I mentioned. In fact, it comes from multiple similar calls to the same endpoint with many different IDs. Each of those individual calls takes a while to return but doesn't risk timing out.
Current Solution
My naive implementation: I do one GraphQL query in which the resolver does all the API calls to the external service in parallel. It waits on them all to complete using Promise.all(). It then returns a big array containing all the data I need to my server. My server then returns that data to me.
Problem With Current Solution
Unfortunately, this sometimes leaves my frontend hanging for too long and it times out (takes longer than 2 minutes).
Proposed Solution
Is there a better way than manually implementing long polling in GraphQL?
This is my main plan for a solution at the moment:
Frontend sends a request to my server
Server returns a 200 and starts hitting the external API, and sets a flag in the database
Server stores the result of each API call in the database as it completes
Meanwhile, the frontend shows a loading screen and keeps making the same GraphQL query for an entity like MyBigTableData which will tell me how many of the external API calls have returned
When they've all returned, the next time I ask for MyBigTableData, the server will send back all the data.
Question
Is there a better alternative to GraphQL long polling on an Express server for this large request that I have to do?
An alternative that comes to mind is to not use GraphQL and instead use a standard HTTP endpoint, but I'm not sure that really makes much difference.
I also see that HTTP/2 has multiplexing which could be relevant. My server currently runs HTTP/1.1 and upgrading is something of an unknown to me.
I see here that Keep-Alive, which sounds like it could be relevant, is unusable in Safari which is bad as many of my users use Safari to access the frontend.
I can't use WebSockets because of technical restraints. I don't want to set a ridiculously long timeout on my client either (and I'm not sure if it's possible)
I discovered that GraphQL has polling built in https://www.apollographql.com/docs/react/data/queries/#polling
In the end, I made a REST polling system.

Error 503 on Cloud Run for a simple messaging application

I have built a simple Python/Flask app for sending automatic messages in Slack and Telegram after receiving a post request in the form of:
response = requests.post(url='https://my-cool-endpoint.a.run.app/my-app/api/v1.0/',
json={'message': msg_body, 'urgency': urgency, 'app_name': app_name},
auth=(username, password))
, or even a similar curl request. It works well in localhost, as well as a containerized application. However, after deploying it to Cloud Run, the requests keep resulting in the following 503 Error:
POST 503 663 B 10.1 s python-requests/2.24.0 The request failed because either the HTTP response was malformed or connection to the instance had an error.
Does it have anything to do with a Flask timeout or something like that? I really don't understand what is happening, because the response doesn't take (and shouldn't) take more than a few seconds (usually less than 5s).
Thank you all.
--EDIT
Problem solved after thinking about AhmetB reply. I've found that I was setting the host as the public ip address of the SQL instance, and that is not the case when you post it to Cloud Run. For that to work out, you must replace host by unix_socket and then set its path.
Thanks you all! This question is closed.

Poco::Net::HTTPSClientSession receiveResponse always truncated abnormally

Encountered a very weird issue.
I have two VMs, running CentOS Linux.
Server side has a REST API (Using none-Poco socket), and one of the API is to response a POST.
On the client side, use POCO library to call the REST.
If the returned message is long, it will be truncated at 176 k, or 240 k, or 288 k.
Same code, same environment, running on server side, Good.
On the client VM, use python to do the REST call, Good.
ONLY failed if I use the same good code, on client VM.
When msg got truncated, the https status code always return 200
On the server side, I logged the response message that I sent every time. Everything looks normal.
I have tried whole bunch of things, like:
set the socket timeout and receiveResponse timeout to an hour
wait for 2 seconds after I send the request but before I call the receive
Set the receive buffer big enough
Try whole bunch of approach to make sure receive stream is empty, no more data
It just does not work.
Anyone have similar issue? I started pulling my hair.... Please talk to me, anything... before I am bold.

Resources