Persistent Spotify 429 errors - with ridiculous retry-after suggestion of 76,000s (about 21hr) - node.js

I am working on a application that uses the Spotify Web API to build and maintain playlists for the user based on a given recipe (just a JSON that represents a logic-scheme basically). Currently the application is in development mode. I use delays between each API call I make, currently about 400ms. And I also had delays of 7.5s when I got the occasional 429 error (too many requests).
Anyway, I recently made it so that all of the playlist recipes get rebuilt in an infinite loop. So the process is just always running and making API calls about every 100ms, in order to keep all of the playlists up-to-date based on the recipes. However after letting this loop run for about 10 minutes, I started persistently getting 429s even after retrying after 7.5s and longer.
Apparently the 429 responses contain a header called 'retry-after' which is how long Spotify suggests waiting before making another call (as I said, before I was just using a fixed 7.5s delay on 429s). I am seeing that the value I am receiving for 'retry-after' is on the order of about 76,000s (21 hours).
But I thought that the rate limits are enforced over a 30s window...
(see https://developer.spotify.com/documentation/web-api/guides/rate-limits/) So why is my 'retry-after' header so high?
This is mostly a design philosophy question so the code itself I think is mostly irrelevant but if you'd like to take a look it's available here: https://github.com/jakefoglia/Smart-Playlist-Manager
site/SPM-core/maintainer.js : contains the 'infinite loop'
site/SPM-core/spotify_api_hook.js : contains most of the API calls

The 30s window is presented in the documentation only as an example, not as an actual way in which the API works. As you correctly say, Retry-After header (value is seconds) is all the information you need to decide how long to wait before doing the next call.
Each time your app "violates" the rate limit by making an early request, it gets "punished" by an increased delay period, — and since the app apparently never even consulted the header, and repeatedly violated the limit, the delay got this high. This however did not result in shutdown, or blocking, or rejection, or something similar, because the header only suggests the duration of a delay, rather than enforcing it.

Related

How to measure how much time does it take for a SSR Next server to send back the HTML and why?

Currently I have a Next application with SSR using getInitialProps that takes too long to deliver the HTML based on the complexity of the app (I'm getting high Waiting for Server to respond times in Chrome in the network tab).
I'd want to figure out what is adding so much time (sometimes I get seconds), so I'm trying to:F
Find out how much time does it take for the server from the time it receives the GET request for the page to the time it sends the HTML
Have a clear picture of what's happening and how much time does it take during the SSR. Because at the moment is a black box for me.
I tried suggested improvements: code splitting, lazy loading components, code improvements, etc.
I tried using Server Timing API to measure the requests performed in getInitialProps, to narrow down a part of the process. But it doens't help with the rendering process and other Next processes that might add to the response time.
I tried using the Node.js profiler for Chrome using NODE_OPTIONS='--inspect' next dev. This is the closest I got to what I wanted, but I can't tell where does the server respond back, and what do each Activity corresponds to. Some documentation could be helpful.
I tried middleware. Not sure if I got something wrong, but I can't measure the time from start to finish.
Some observations were that other more simple pages, have faster response times, but regardless the time it takes is extremely longer (1 - 2 orders of magnitude)

Chrome Extension performance difference - alarm + history vs storage + history

My extension(manifest v3) needs to track the number of times a set of websites are visited either during the whole day or during certain time windows and then perform an action if the visit count exceeds a limit.
There are two ways I could think of implementing this:
alarm + history: Create an alarm that runs every 5 mins, search the history for the required websites and count the visits. If the count exceeds the limit perform an action
storage + history: Add a listener to chrome.history.onVisited. If the visited site is from the required list, increment the visit count in storage. If the storage count exceeds the limit perform an action
Which of the above approaches has least impact on Chrome's browsing performance? Or, is there any another api(s) that I can use to achieve the same?
I would like my extension to consume least amount of user's battery :)
In 1 the extension will do a lot of unnecessary work when the user isn't using the browser.
In 2 the extension's background script will restart more often if the user navigates a lot but makes pauses between navigating for more than the lifetime duration of the service worker (30 seconds by default), which is a typical interaction scenario.
In both cases the bigger inherent problem of ManifestV3 for an extension such as yours that observes user activity is not what the extension does itself, but the extremely huge overhead to restart the background worker, which is automatically terminated after 30 seconds since the last observed event (or 5 minutes if you use waitUntil). Such pauses in user activity are typical when browsing/interacting so for many users the worker will restart hundreds of times a day. Starting the worker takes 50-100ms and stresses the CPU+memory+disk for the entire duration, while a typical time spent in a simple observation extension's code is just 1-2ms.
In other words, an extension that observes user activity, such as yours, is inherently 25-100 times less efficient in ManifestV3 than it would be in ManifestV2 with a persistent background script.
Solutions.
Prolong the service worker's lifetime to reduce the amount of its restarts as shown here. To avoid wasting memory for users that keep the browser open without using it for hours you can dynamically adjust the lifetime duration by measuring and averaging intervals between the events or offer an option to set the duration in your extension UI. Hopefully, in the future the browser will do it automatically, but it may take years before this feature is actually implemented and even then it will still likely restart the background script way too often.
Use chrome.webNavigation events with a URL filter for your sites so that the background script wakes up only when these specific URLs are visited. If the URLs are configured by the user, you will need to unregister the listener first (e.g. by making the listener a named global function), then register it with the new URL filter. You may still need to prolong the worker's lifetime if these URLs are visited a lot.

Delay the execution of an expressJS method for 30 days or more

Can the execution of an expressJS method be delayed for 30 days or more just by using setTimeout ?
Let's say I want to create an endpoint /sendMessage that send a message to my other app after a timeout of 30 days. Will my expressJS method execution will last long time enough to fire this message after this delay ?
If your server runs continuously for 30 days or more, then setTimeout() will work for that. But, it is probably not smart to rely on that fact that your server never, ever has to restart.
There are 3rd party programs/modules designed explicitly for this. If you don't want to use one of them, then what I have done in the past is I write each future firing time into a JSON file and I set a timer for it with setTimeout(). If the timer successfully fires, then I remove that time from the JSON file.
So, at any point in time, the JSON file always contains a list of times in the future that I want timers to fire for. Any timer that fires is immediately removed from the JSON file.
Anytime my server starts up, I read the times from the JSON file and reconfigure the setTimeout() for each one.
This way, even if my server restarts, I won't lose any of the timers.
In case you were wondering, the way nodejs creates timers, it does not cost you anything to have a bunch of future timers configured. Nodejs keeps the timers in a sorted linked list and the event loop just checks the time for the next timer to fire - the one at the front of the sorted list (the rest of the timers are not looked at until they get to the front of the sorted list). This means the only time it costs anything to have lots of future timers is when inserting a new timer into the sorted list and there is no regular cost in the event loop to having lots of pending timers present.

Why is Python consistently struggling to keep up with constant generation of asyncio tasks?

I have a Python project with a server that distributes work to one or more clients. Each client is given a number of assignments which contain parameters for querying a target API. This includes a maximum number of requests per second they can make with a given API key. The clients process the response and send the results back to the server to store into a database.
Both the server and clients use Tornado for asynchronous networking. My initial implementation for the clients relied on the PeriodicCallback to ensure that n-number of calls to the API would occur. I thought that this was working properly as my tests would last 1-2 minutes.
I added some telemetry to collect statistics on performance and noticed that the clients were actually having issues after almost exactly 2 minutes of runtime. I had set the API requests to 20 per second (the maximum allowed by the API itself) which the clients could reliably hit. However, after 2 minutes performance would fluctuate between 12 and 18 requests per second. The number of active tasks steadily increased until it hit the maximum amount of active assignments (100) given from the server and the HTTP request time to the API was reported by Tornado to go from 0.2-0.5 seconds to 6-10 seconds. Performance is steady if I only do 14 requests per second. Anything higher than 15 requests will experience issues 2-3 minutes after starting. Logs can be seen here. Notice how the column of "Active Queries" is steady until 01:19:26. I've truncated the log to demonstrate
I believed the issue was the use of a single process on the client to handle both communication to the server and the API. I proceeded to split the primary process into several different processes. One handles all communication to the server, one (or more) handles queries to the API, another processes API responses into a flattened class, and finally a multiprocessing Manager for Queues. The performance issues were still present.
I thought that, perhaps, Tornado was the bottleneck and decided to refactor. I chose aiohttp and uvloop. I split the primary process in a similar manner to that in the previous attempt. Unfortunately, performance issues are unchanged.
I took both refactors and enabled them to split work into several querying processes. However, no matter how much you split the work, you still encounter problems after 2-3 minutes.
I am using both Python 3.7 and 3.8 on MacOS and Linux.
At this point, it does not appear to be a limitation of a single package. I've thought about the following:
Python's asyncio library cannot handle more than 15 coroutines/tasks being generated per second
I doubt that this is true given that different libraries claim to be able to handle several thousand messages per second simultaneously. Also, we can hit 20 requests per second just fine at the start with very consistent results.
The API is unable to handle more than 15 requests from a single client IP
This is unlikely as I am not the only user of the API and I can request 20 times per second fairly consistently over an extended period of time if I over-subscribe processes to query from the API.
There is a system configuration causing the limitation
I've tried both MacOS and Debian which yield the same results. It's possible that's it a *nix problem.
Variations in responses cause a backlog which grows linearly until it cannot be tackled fast enough
Sometimes responses from the API grow and shrink between 0.2 and 1.2 seconds. The number of active tasks returned by asyncio.all_tasks remains consistent in the telemetry data. If this were true, we wouldn't be consistently encountering the issue at the same time every time.
We're overtaxing the hardware with the number of tasks generated per second and causing thermal throttling
Although CPU temperatures spike, neither MacOS nor Linux report any thermal throttling in the logs. We are not hitting more than 80% CPU utilization on a single core.
At this point, I'm not sure what's causing it and have considered refactoring the clients into a different language (perhaps C++ with Boost libraries). Before I dive into something so foolish, I wanted to ask if I'm missing something simple.
Conclusion
Performance appears to vary wildly depending on time of day. It's likely to be the API.
How this conclusion was made
I created a new project to demonstrate the capabilities of asyncio and determine if it's the bottleneck. This project takes two websites, one to act as the baseline and the other is the target API, and runs through different methods of testing:
Spawn one process per core, pass a semaphore, and query up to n-times per second
Create a single event loop and create n-number of tasks per second
Create multiple processes with an event loop each to distribute the work, with each loop performing (n-number / processes) tasks per second
(Note that spawning processes is incredibly slow and often commented out unless using high-end desktop processors with 12 or more cores)
The baseline website would be queried up to 50 times per second. asyncio could complete 30 tasks per second reliably for an extended period, with each task completing their run in 0.01 to 0.02 seconds. Responses were very consistent.
The target website would be queried up to 20 times per second. Sometimes asyncio would struggle despite circumstances being identical (JSON handling, dumping response data to queue, returning immediately, no CPU-bound processing). However, results varied between tests and could not always be reproduced. Responses would be under 0.4 seconds initially but quickly increase to 4-10 seconds per request. 10-20 requests would return as complete per second.
As an alternative method, I chose a parent URI for the target website. This URI wouldn't have a large query to their database but instead be served back with a static JSON response. Responses bounced between 0.06 seconds to 2.5-4.5 seconds. However, 30-40 responses would be completed per second.
Splitting requests across processes with their own event loop would decrease response time in the upper-bound range by almost half, but still took more than one second each to complete.
The inability to reproduce consistent results every time from the target website would indicate that it's a performance issue on their end.

"Resequencing" messages after processing them out-of-order

I'm working on what's basically a highly-available distributed message-passing system. The system receives messages from someplace over HTTP or TCP, perform various transformations on it, and then sends it to one or more destinations (also using TCP/HTTP).
The system has a requirement that all messages sent to a given destination are in-order, because some messages build on the content of previous ones. This limits us to processing the messages sequentially, which takes about 750ms per message. So if someone sends us, for example, one message every 250ms, we're forced to queue the messages behind each other. This eventually introduces intolerable delay in message processing under high load, as each message may have to wait for hundreds of other messages to be processed before it gets its turn.
In order to solve this problem, I want to be able to parallelize our message processing without breaking the requirement that we send them in-order.
We can easily scale our processing horizontally. The missing piece is a way to ensure that, even if messages are processed out-of-order, they are "resequenced" and sent to the destinations in the order in which they were received. I'm trying to find the best way to achieve that.
Apache Camel has a thing called a Resequencer that does this, and it includes a nice diagram (which I don't have enough rep to embed directly). This is exactly what I want: something that takes out-of-order messages and puts them in-order.
But, I don't want it to be written in Java, and I need the solution to be highly available (i.e. resistant to typical system failures like crashes or system restarts) which I don't think Apache Camel offers.
Our application is written in Node.js, with Redis and Postgresql for data persistence. We use the Kue library for our message queues. Although Kue offers priority queueing, the featureset is too limited for the use-case described above, so I think we need an alternative technology to work in tandem with Kue to resequence our messages.
I was trying to research this topic online, and I can't find as much information as I expected. It seems like the type of distributed architecture pattern that would have articles and implementations galore, but I don't see that many. Searching for things like "message resequencing", "out of order processing", "parallelizing message processing", etc. turn up solutions that mostly just relax the "in-order" requirements based on partitions or topics or whatnot. Alternatively, they talk about parallelization on a single machine. I need a solution that:
Can handle processing on multiple messages simultaneously in any order.
Will always send messages in the order in which they arrived in the system, no matter what order they were processed in.
Is usable from Node.js
Can operate in a HA environment (i.e. multiple instances of it running on the same message queue at once w/o inconsistencies.)
Our current plan, which makes sense to me but which I cannot find described anywhere online, is to use Redis to maintain sets of in-progress and ready-to-send messages, sorted by their arrival time. Roughly, it works like this:
When a message is received, that message is put on the in-progress set.
When message processing is finished, that message is put on the ready-to-send set.
Whenever there's the same message at the front of both the in-progress and ready-to-send sets, that message can be sent and it will be in order.
I would write a small Node library that implements this behavior with a priority-queue-esque API using atomic Redis transactions. But this is just something I came up with myself, so I am wondering: Are there other technologies (ideally using the Node/Redis stack we're already on) that are out there for solving the problem of resequencing out-of-order messages? Or is there some other term for this problem that I can use as a keyword for research? Thanks for your help!
This is a common problem, so there are surely many solutions available. This is also quite a simple problem, and a good learning opportunity in the field of distributed systems. I would suggest writing your own.
You're going to have a few problems building this, namely
2: Exactly-once delivery
1: Guaranteed order of messages
2: Exactly-once delivery
You've found number 1, and you're solving this by resequencing them in redis, which is an ok solution. The other one, however, is not solved.
It looks like your architecture is not geared towards fault tolerance, so currently, if a server craches, you restart it and continue with your life. This works fine when processing all requests sequentially, because then you know exactly when you crashed, based on what the last successfully completed request was.
What you need is either a strategy for finding out what requests you actually completed, and which ones failed, or a well-written apology letter to send to your customers when something crashes.
If Redis is not sharded, it is strongly consistent. It will fail and possibly lose all data if that single node crashes, but you will not have any problems with out-of-order data, or data popping in and out of existance. A single Redis node can thus hold the guarantee that if a message is inserted into the to-process-set, and then into the done-set, no node will see the message in the done-set without it also being in the to-process-set.
How I would do it
Using redis seems like too much fuzz, assuming that the messages are not huge, and that losing them is ok if a process crashes, and that running them more than once, or even multiple copies of a single request at the same time is not a problem.
I would recommend setting up a supervisor server that takes incoming requests, dispatches each to a randomly chosen slave, stores the responses and puts them back in order again before sending them on. You said you expected the processing to take 750ms. If a slave hasn't responded within say 2 seconds, dispatch it again to another node randomly within 0-1 seconds. The first one responding is the one we're going to use. Beware of duplicate responses.
If the retry request also fails, double the maximum wait time. After 5 failures or so, each waiting up to twice (or any multiple greater than one) as long as the previous one, we probably have a permanent error, so we should probably ask for human intervention. This algorithm is called exponential backoff, and prevents a sudden spike in requests from taking down the entire cluster. Not using a random interval, and retrying after n seconds would probably cause a DOS-attack every n seconds until the cluster dies, if it ever gets a big enough load spike.
There are many ways this could fail, so make sure this system is not the only place data is stored. However, this will probably work 99+% of the time, it's probably at least as good as your current system, and you can implement it in a few hundred lines of code. Just make sure your supervisor is using asynchronous requests so that you can handle retries and timeouts. Javascript is by nature single-threaded, so this is slightly trickier than normal, but I'm confident you can do it.

Resources