Pubsub latency reaching minutes - node.js

I've been working on a project utilizing the Gcloud pubsub platform on the node.js flexible runtime and for some reason have had some pretty crazy latency that has been increasing in severity over time. At first, only messages of a certain kind would sometimes experience heavy latency. However, as I've continued working over the past few days, now all messages are experiencing latency reaching over several minutes regardless of type. It's reached the point where testing has become impossible because the latency is taking so long.
What could be done to help alleviate the issue? I have not edited the code receiving the messages except for the function handling the messages. I'm receiving them through the subscription.on() function.
Please let me know what information I can provide to help reproduce the error. Thanks in advance.
Edit: I was able to get it working again by deleting the topic & subscriptions and recreating them. Now latency is at a minimum, but I'm still not sure what caused this issue. Maybe a large backlog of unprocessed messages? Any ideas would help to avoid this in the future.

Related

Express App, diagnosing latency spikes in EC2 instances in Beanstalk

We have a medium-sized Express API that performs well as long as incoming traffic is relatively stable, but when there is a quick uptick in requests, the latency has a tendency to briefly shoot through the roof.
You can see in this image 3 latency spikes that clearly correlate to incoming traffic spikes.
enter image description here
The app is perfectly capable of handling a larger sum of requests with low latency, it is just when there is a spike that this happens.
Also of note, the DB does not struggle or show any correlating spikes at all, and the CPU usage is only moderately affected, rarely going much above 50%-60% even at peak times.
Load-testing shows same issues, whether on local machine or against another AWS QA environment, so probably not related to network.
Given that the DB and CPU are rarely chugging to process requests, and given that the app performs well under higher traffic amounts (reqs fulfilled in <100ms), how would you go about diagnosing the root cause specific issue we have here with traffic spikes?
We have tried pinpointing poorly performing code bottlenecks, but the savings have been negligible. We have engaged with our devops folks to see if there were some options to smooth this out with config changes, but efforts to give us larger instances and adjust auto-scaling haven't really helped this particular problem.
Here's hoping seasoned eyes can give us some clues, assumption is still that there is some code improvements that will help, maybe this is a typical issue in node/express apps. Happy to give additional details, thx in advance.

Azure Functions - Concurrency Issue

I'm planning a project and working through all the potential issues I might face. One that I keep running into which might be specific to my project is concurrency issues. From my understanding, Azure Functions scale when under demand which is exactly what I'm looking for but causes a problem when it comes to concurrency. Let me explain the scenario:
Http triggered Azure Function which does the below
Gets clients available credit, if zero, auto-charge clients card.
Deduct credit from the client for the request.
Processes the request and returns to the client.
Where I see an issue is getting the available credit and auto-charging card. Due to the possibility of having multiple instances of the function I might auto-charge the card multiple times and on top of that getting and deducting the credit will be affected.
I'm wanting the scaling of Azure Functions but can't figure a way around these concurrency issues. Any insight or pointers in the right direction would be very much appreciated.

Google app engine terrible latency with specific pubsub messages

I'm working on a project in the Google Cloud App Engine (node.js flexible runtime) and while I've had a pretty good experience with it thus far, I've recently ran into a problem where the engine will sometimes not respond a specific pubsub notification. Sometimes, the results will appear after a few minutes, but often times it requires me to redeploy the app and lose any messages that I had queued up prior. Interestingly enough, when I send a pubsub message through a different subscription, the engine responds to it well. The backlogged messages are then handled, but sending the problematic message again will still not work.
I'm not really sure how to solve this issue. There is no evidence that google received the message from pubsub in the logs. Additionally, waiting for this long will have negative impacts on the project overall, implying that the messages will reach the end destination given enough time.
I'm willing to provide more information to help reproduce the error.
Thanks in advance
Edit The issue has increased significantly in severity. Please see the updated post I made to see the current extent of the issue.

Can I charge my clients for debugging the code I have developed for them?

I charge my clients on an hourly basis, some times they came back with an error or bug in code requesting me to resolve it. It takes time, sometimes reaching 2-3 hours. Most clients think it should not be charged as it was my fault and I should fix it for free. Is that so? It's almost impossible to code 100% error free.
To me it depends. Is the product that you sold working as described in the contract ? If not, well you can't decently ask for more money since you didn't do your job in the first place. You should test your software and do debug for free. It is true that no software is bug free, but it isn't the customer's fault and, as long as you didn't explicitly state that debugging had a cost, well I think it isn't okay to charge for it. (be sure to not let them add features pretending it's a bug, though) !

NodeJS and Socket.IO to keep track of user visit time High performance

I've built an application with node and socket.io to keep track when a visitor visits a page, and leaves. When the visitor leaves I store the time the visitor was on the page to redis. That's all the application has to do.
Here's the thing, the application needs to support ~15k concurrent connections, but I'm getting a lot of handshake errors when the benchmark hits around 10.000 conc. visitors. I don't know why exactly. Does anyone have any experience with these kind of problems?
I also tried scaling the application in multiple processes, using the RedisStore backend for node but haven't had much success.
There are a number of things which could be causing this. You should check your system error logs and see if there are any errors there which may indicate where the problem actually is.
This question has some good information and the limit they were hitting is close to what you are hitting as well. It's worth a look.
https://serverfault.com/questions/10852/what-limits-the-maximum-number-of-connections-on-a-linux-server

Resources