Azure Function Consumption streaming large file silently error 500 - azure

I am using Azure Functions, and have stumbled upon an issue.
When requesting large amount of data from an external source, it seems the stream is shut. I have simplified the example as simple as I can below it will silently fail, and return 500. By silent I mean there are no errors in Application Insights that I can see, and no way to know the issue.
This works locally as an azure function.
Is there some sort of limit on data that can be read? It doesn't take much memory (70mb or so locally).
So really banging my head against a wall the last two days on this one. Any help appreciated!
[FunctionName("FeedDownloads")]
public HttpResponseMessage Run([HttpTrigger(AuthorizationLevel.Anonymous, "get" )]HttpRequestMessage req,
ILogger log)//, [FromQuery]string format = "", [FromQuery]bool debug = false)
{
System.Net.HttpWebRequest webRequest = (System.Net.HttpWebRequest)System.Net.WebRequest.Create("{large 1.5GB gzipped file}");
webRequest.AutomaticDecompression = System.Net.DecompressionMethods.GZip;
var webRequestResponse = webRequest.GetResponse();
var res = req.CreateResponse(HttpStatusCode.OK);
res.Content = new StreamContent(webRequestResponse.GetResponseStream());
res.Content.Headers.ContentType = new MediaTypeHeaderValue(webRequestResponse.ContentType);
return res;
}

Azure Functions are not meant for long term communication with the client devices. Large, long-running functions can cause unexpected timeout issues.
There are several other problems that you must take a note of before anything. And this is an excerpt from official documentation, functions-best-practices
. Functions should be stateless and idempotent if possible. Associate any required state information with your data. For example, an order being processed would likely have an associated state member. A function could process an order based on that state while the function itself remains stateless.

Related

Appropriate solution for long running computations in Azure App Service and .NET Core 3.1?

What is an appropriate solution for long running computations in Azure App Service and .NET Core 3.1 in an application that has no need for a database and no IO to anything outside of this application ? It is a computation task.
Specifically, the following is unreliable and needs a solution.
[Route("service")]
[HttpPost]
public Outbound Post(Inbound inbound)
{
Debug.Assert(inbound.Message.Equals("Hello server."));
Outbound outbound = new Outbound();
long Billion = 1000000000;
for (long i = 0; i < 33 * Billion; i++) // 230 seconds
;
outbound.Message = String.Format("The server processed inbound object.");
return outbound;
}
This sometimes returns a null object to the HttpClient (not shown). A smaller workload will always succeed. For example 3 billion iterations always succeeds. A bigger number would be nice specifically 240 billion is a requirement.
I think in the year 2020 a reasonable goal in Azure App Service with .NET Core might be to have a parent thread count to 240 billion with the help of 8 child threads so each child counts to 30 billion and the parent divides an 8 M byte inbound object into smaller objects inbound to each child. Each child receives a 1 M byte inbound and returns to the parent a 1 M byte outbound. The parent re-assembles the result into a 8 M byte outbound.
Obviously the elapsed time will be 12.5%, or 1/8, or one-eighth, of the time a single thread implementation would need. The time to cut-up and re-assemble objects is small compared to the computation time. I am assuming the time to transmit the objects is very small compared to the computation time so the 12.5% expectation is roughly accurate.
If I can get 4 or 8 cores that would be good. If I can get threads that give me say 50% of the cycles of a core, then I would need may be 8 or 16 threads. If each thread gives me 33% of the cycles of a core then I would need 12 or 24 threads.
I am considering the BackgroundService class but I am looking for confirmation that this is the correct approach. Microsoft says...
BackgroundService is a base class for implementing a long running IHostedService.
Obviously if something is long running it would be better to make it finish sooner by using multiple cores via System.Threading but this documentation seems to mention System.Threading only in the context of starting tasks via System.Threading.Timer. My example code shows there is no timer needed in my application. An HTTP POST will serve as the occasion to do work. Typically I would use System.Threading.Thread to instantiate multiple objects to use multiple cores. I find the absence of any mention of multiple cores to be a glaring omission in the context of a solution for work that takes a long time but may be there is some reason Azure App Service doesn't deal with this matter. Perhaps I am just not able to find it in tutorials and documentation.
The initiation of the task is the illustrated HTTP POST controller. Suppose the longest job takes 10 minutes. The HTTP client (not shown) sets the timeout limit to 1000 seconds which is much more than 10 minutes (600 seconds) in order for there to be a margin of safety. HttpClient.Timeout is the relevant property. For the moment I am presuming the HTTP timeout is a real limit; rather than some sort of non-binding (fake limit) such that some other constraint results in the user waiting 9 minutes and receiving an error message. A real binding limit is a limit for which I can say "but for this timeout it would have succeeded". If the HTTP timeout is not the real binding limit and there is something else constraining the system, I can adjust my HTTP controller to instead have three (3) POST methods. Thus POST1 would mean start a task with the inbound object. POST2 means tell me if it is finished. POST3 means give me the outbound object.
What is an appropriate solution for long running computations in Azure App Service and .NET Core 3.1 in an application that has no need for a database and no IO to anything outside of this application ? It is a computation task.
Prologue
A few years ago a ran in to a pretty similar problem. We needed a service that could process large amounts of data. Sometimes the processing would take 10 seconds, other times it could take an hour.
At first we did it how your question illustrates: Send a request to the service, the service processes the data from the request and returns the response when finished.
Issues At Hand
This was fine when the job only took around a minute or less, but anything above this, the server would shut down the session and the caller would report an error.
Servers have a default of around 2 minutes to produce a response before it gives up on the request. It doesn't quit the processing of the request... but it does quit the HTTP session. It doesn't matter what parameters you set on your HttpClient, the server is the one that delegates how long is too long.
Reasons For Issues
All this is for good reasons. Server sockets are extremely expensive. You have a finite amount to go around. The server is trying to protect your service by severing requests that are taking longer than a specified time in order to avoid socket starvation issues.
Typically you want your HTTP requests to take only a few milliseconds. If they are taking longer than this, you will eventually run in to socket issues if your service has to fulfil other requests at a high rate.
Solution
We decided to go the route of IHostedService, specifically the BackgroundService. We use this service in conjunction with a Queue. This way you can set up a queue of jobs and the BackgroundService will process them one at a time (in some instances we have service processing multiple queue items at once, in others we scaled horizontally producing two or more queues).
Why an ASP.NET Core service running a BackgroundService? I wanted to handle this without tightly-coupling to any Azure-specific constructs in case we needed to move out of Azure to some other cloud service (back in the day we were contemplating this for other reasons we had at the time.)
This has worked out quite well for us and we haven't seen any issues since.
The work flow goes like this:
Caller sends a request to the service with some parameters
Service generates a "job" object and returns an ID immediately via 202 (accepted) response
Service places this job in to a queue that is being maintained by a BackgroundService
Caller can query the job status and get information about how much has been done and how much is left to go using this job ID
Service finishes the job, puts the job in to a "completed" state and goes back to waiting on the queue to produce more jobs
Keep in mind your service has the capability to scale horizontally where there would be more than one instance running. In this case I am using Redis Cache to store the state of the jobs so that all instances share the same state.
I also added in a "Memory Cache" option to test things locally if you don't have a Redis Cache available. You could run the "Memory Cache" service on a server, just know that if it scales then your data will be inconsistent.
Example
Since I'm married with kids, I really don't do much on Friday nights after everyone goes to bed, so I spent some time putting together an example that you can try out. The full solution is also available for you to try out.
QueuedBackgroundService.cs
This class implementation serves two specific purposes: One is to read from the queue (the BackgroundService implementation), the other is to write to the queue (the IQueuedBackgroundService implementation).
public interface IQueuedBackgroundService
{
Task<JobCreatedModel> PostWorkItemAsync(JobParametersModel jobParameters);
}
public sealed class QueuedBackgroundService : BackgroundService, IQueuedBackgroundService
{
private sealed class JobQueueItem
{
public string JobId { get; set; }
public JobParametersModel JobParameters { get; set; }
}
private readonly IComputationWorkService _workService;
private readonly IComputationJobStatusService _jobStatusService;
// Shared between BackgroundService and IQueuedBackgroundService.
// The queueing mechanism could be moved out to a singleton service. I am doing
// it this way for simplicity's sake.
private static readonly ConcurrentQueue<JobQueueItem> _queue =
new ConcurrentQueue<JobQueueItem>();
private static readonly SemaphoreSlim _signal = new SemaphoreSlim(0);
public QueuedBackgroundService(IComputationWorkService workService,
IComputationJobStatusService jobStatusService)
{
_workService = workService;
_jobStatusService = jobStatusService;
}
/// <summary>
/// Transient method via IQueuedBackgroundService
/// </summary>
public async Task<JobCreatedModel> PostWorkItemAsync(JobParametersModel jobParameters)
{
var jobId = await _jobStatusService.CreateJobAsync(jobParameters).ConfigureAwait(false);
_queue.Enqueue(new JobQueueItem { JobId = jobId, JobParameters = jobParameters });
_signal.Release(); // signal for background service to start working on the job
return new JobCreatedModel { JobId = jobId, QueuePosition = _queue.Count };
}
/// <summary>
/// Long running task via BackgroundService
/// </summary>
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while(!stoppingToken.IsCancellationRequested)
{
JobQueueItem jobQueueItem = null;
try
{
// wait for the queue to signal there is something that needs to be done
await _signal.WaitAsync(stoppingToken).ConfigureAwait(false);
// dequeue the item
jobQueueItem = _queue.TryDequeue(out var workItem) ? workItem : null;
if(jobQueueItem != null)
{
// put the job in to a "processing" state
await _jobStatusService.UpdateJobStatusAsync(
jobQueueItem.JobId, JobStatus.Processing).ConfigureAwait(false);
// the heavy lifting is done here...
var result = await _workService.DoWorkAsync(
jobQueueItem.JobId, jobQueueItem.JobParameters,
stoppingToken).ConfigureAwait(false);
// store the result of the work and set the status to "finished"
await _jobStatusService.StoreJobResultAsync(
jobQueueItem.JobId, result, JobStatus.Success).ConfigureAwait(false);
}
}
catch(TaskCanceledException)
{
break;
}
catch(Exception ex)
{
try
{
// something went wrong. Put the job in to an errored state and continue on
await _jobStatusService.StoreJobResultAsync(jobQueueItem.JobId, new JobResultModel
{
Exception = new JobExceptionModel(ex)
}, JobStatus.Errored).ConfigureAwait(false);
}
catch(Exception)
{
// TODO: log this
}
}
}
}
}
It is injected as so:
services.AddHostedService<QueuedBackgroundService>();
services.AddTransient<IQueuedBackgroundService, QueuedBackgroundService>();
ComputationController.cs
The controller used to read/write jobs looks like this:
[ApiController, Route("api/[controller]")]
public class ComputationController : ControllerBase
{
private readonly IQueuedBackgroundService _queuedBackgroundService;
private readonly IComputationJobStatusService _computationJobStatusService;
public ComputationController(
IQueuedBackgroundService queuedBackgroundService,
IComputationJobStatusService computationJobStatusService)
{
_queuedBackgroundService = queuedBackgroundService;
_computationJobStatusService = computationJobStatusService;
}
[HttpPost, Route("beginComputation")]
[ProducesResponseType(StatusCodes.Status202Accepted, Type = typeof(JobCreatedModel))]
public async Task<IActionResult> BeginComputation([FromBody] JobParametersModel obj)
{
return Accepted(
await _queuedBackgroundService.PostWorkItemAsync(obj).ConfigureAwait(false));
}
[HttpGet, Route("computationStatus/{jobId}")]
[ProducesResponseType(StatusCodes.Status200OK, Type = typeof(JobModel))]
[ProducesResponseType(StatusCodes.Status404NotFound, Type = typeof(string))]
public async Task<IActionResult> GetComputationResultAsync(string jobId)
{
var job = await _computationJobStatusService.GetJobAsync(jobId).ConfigureAwait(false);
if(job != null)
{
return Ok(job);
}
return NotFound($"Job with ID `{jobId}` not found");
}
[HttpGet, Route("getAllJobs")]
[ProducesResponseType(StatusCodes.Status200OK,
Type = typeof(IReadOnlyDictionary<string, JobModel>))]
public async Task<IActionResult> GetAllJobsAsync()
{
return Ok(await _computationJobStatusService.GetAllJobsAsync().ConfigureAwait(false));
}
[HttpDelete, Route("clearAllJobs")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status401Unauthorized)]
public async Task<IActionResult> ClearAllJobsAsync([FromQuery] string permission)
{
if(permission == "this is flakey security so this can be run as a public demo")
{
await _computationJobStatusService.ClearAllJobsAsync().ConfigureAwait(false);
return Ok();
}
return Unauthorized();
}
}
Working Example
For as long as this question is active, I will maintain a working example you can try out. For this specific example, you can specify how many iterations you would like to run. To simulate long-running work, each iteration is 1 second. So, if you set the iteration value to 60, it will run that job for 60 seconds.
While it's running, run the computationStatus/{jobId} or getAllJobs endpoint. You can watch all the jobs update in real time.
This example is far from a fully-functioning-covering-all-edge-cases-full-blown-ready-for-production example, but it's a good start.
Conclusion
After a few years of working in the back-end, I have seen a lot of issues arise by not knowing all the "rules" of the back-end. Hopefully this answer will shed some light on issues I had in the past and hopefully this saves you from having to deal with said problems.
One option could be to try out Azure Durable Functions, which are more oriented to long-running jobs that warrant checkpoints and state as against attempting to finish within the context of the triggering request. It also has the concept of fan-out/fan-in, in case what you're describing could be divided into smaller jobs with an aggregated result.
If just raw compute is the goal, Azure Batch might be a better option since it facilitates that scaling.
I assume the actual work that needs be done is something other than iterating over a loop doing nothing, so in terms of possible parallelization I can't offer much help right now. Is the work CPU intensive or IO related?
When it comes to long running work in an Azure App Service, one of the option is to use a Web Job. A possible solution would be to post the request for computation to a queue (Storage Queue or Azure Message Bus Queues). The webjob then processes those messages and possibly puts a new message on another queue that the requester can use to handle the results.
If the time needed for processing is guaranteed to be less than 10 minutes you could replace the Web Job with an Queue Triggered Azure Function. It is a serverless offering on Azure with great scaling possibilities.
Another option is indeed using a Service Worker or an instance of an IHostingService and do some queue processing there.
Since you're saying that your computation succeeds at fewer iterations, a simple solution is to simply save your results periodically and resume the computation.
For example, say you need to perform 240 Billion iterations, and you know that the highest number of iterations to perform reliably is 3 Billion iterations, I would set up the following:
A slave, that actually performs the task (240Billion iterations)
A master that periodically received input from the slave about progress.
The slave can periodically send a message to the master (say once every 2billion iterations ?). This message could contain whatever is relevant to resume the computation should the computation be interrupted.
The master should keep track of the slave. If the master determines that the slave has died / crashed / whatever, the master should simply create a new slave which should resume computation from the last reported position.
How exactly you implement the master and slave is a matter of your personal preference.
Rather than have a single loop perform 240 billion iterations, if you can split your computation across nodes, I would try to simultaneously compute the solution in parallel across as many nodes as possible.
I personally use node.js for multicore projects. Although you are using asp.net, I include this example of node.js to illustrate the architecture that works for me.
Node.js on multi-core machines
https://dzone.com/articles/multicore-programming-in-nodejs
As Noah Stahl has mentioned in his answer, Azure Durable Functions and Azure Batch seem like options to help you achieve your goal on your platform. Please see his answer for more details.
The standard answer is to use asynchronous messaging. I have a blog series on the topic. This is particularly the case since you're already in Azure.
You already have an Azure web app service, but now you want to run code outside of a request - "request-extrinsic code". The proper way to run that code is in a separate process - Azure Functions or Azure WebJobs are a good match for Azure webapps.
First, you want a durable queue. Azure Storage Queues are a good fit since you're in Azure anyway. Then your webapi can just write a message into the queue and return. The important part here is that this is a durable queue, not an in-memory queue.
Meanwhile, the Azure Function / WebJob is processing that queue. It will pick up the work from the queue and execute it.
The final piece of the puzzle is the completion notification. This is a pretty common approach:
I can adjust my HTTP controller to instead have three (3) POST methods. Thus POST1 would mean start a task with the inbound object. POST2 means tell me if it is finished. POST3 means give me the outbound object.
To do this, your background processor should save the "in-progress" / "complete/result" state somewhere where the webapi process can access it. If you already have a shared database (and it makes sense to keep results), then this may be the easiest choice. I would also consider using Azure Cosmos DB, which has a nice time-to-live setting so the background service can inject the results that are "good for 24 hours" or whatever, after which they're automatically cleaned up.

Azure functions with Javascript runtime slow to read httptrigger requests

I am sending data (4MB) as gipped request to azure functions with javascript runtime and HttpTrigger. In the function, I decompress the data and process it. It takes 6-7 Seconds to run the code in function but the round trip of request takes almost 60 Seconds. I understand that it takes some time to upload the request but I didn't expect such huge delay. How can I debug where the time is going?
It is not cold start issue as request takes 60 Seconds consistently.
As far as I'm aware Azure Functions doesn't yet support attaching the App Insights Profiler, however you could add your own telemetry.
This won't necessarily help if the time is being spent inside the Azure functions runtime, but it could help alleviate if the bottleneck is during de-compression / processing:
https://learn.microsoft.com/en-us/azure/application-insights/application-insights-custom-operations-tracking?toc=/azure/azure-monitor/toc.json#outgoing-dependencies-tracking
The general approach for custom dependency tracking is to:
Call the TelemetryClient.StartOperation (extension) method that fills the DependencyTelemetry properties that are needed for correlation and some other properties (start time stamp, duration).
Set other custom properties on the DependencyTelemetry, such as the name and any other context you need.
Make a dependency call and wait for it.
Stop the operation with StopOperation when it's finished.
Handle exceptions.
Example from the docs:
public async Task RunMyTaskAsync()
{
using (var operation = telemetryClient.StartOperation<DependencyTelemetry>("task 1"))
{
try
{
var myTask = await StartMyTaskAsync();
// Update status code and success as appropriate.
}
catch(...)
{
// Update status code and success as appropriate.
}
}
}

Why can't EventData.GetBytes() be called before sending?

I'm working with Azure Event Hubs and initially when sending data to try and calculate batch size I had code similar to the below that would call EventData.GetBytes
EventHubClient client;//initialized before the relevant code
EventData curr = new EventData(data);
//Setting a partition key, and other operations.
long itemLength = curr.GetBytes().LongLength;
client.SendAsync(curr);
Unfortunately I would receive an exception in the SDK code.
The message body cannot be read multiple times. To reuse it store the value after reading.
While removing the ultimately unnecessary call to GetBytes meant that I could send messages, the rationale for this exception to occur is rather puzzling. Calling GetBytes() twice in a row is an easy way to reproduce the same exception, but a single call will mean that the EventData cannot be sent successfully.
It seems likely that underneath a Message is used and this is set to throw an exception if called more than once as Message.GetBody documents; however, there is no documentation to this effect in EventData's methods GetBodyStream, GetBody w/serializer, GetBody, or GetBytes.
I imagine this should either be documented, or corrected since currently it is an unpleasant surprise in a separate thread.
Have you tried using EventData.SerializedSizeInBytes to get the size? that is a much more accurate way to get the size for batching calculation.

Trying to batch AddMessage to an Azure Queue

I've got about 50K messages I wish to add to an azure queue.
I'm not sure if the code I have is safe. It feels/smells bad.
Basically, give a collection of POCO's, serialize the POCO to some json, then add that json text to the queue.
public void AddMessage(T content)
{
content.ShouldNotBe(null);
var json = JsonConvert.SerializeObject(content);
var message = new CloudQueueMessage(json);
Queue.AddMessage(message);
}
public void AddMessages(ICollection<T> contents)
{
contents.ShouldNotBe(null);
Parallel.ForEach(contents, AddMessage);
}
Can someone tell me what I should be doing to fix this up -- and most importantly, why?
I feel that the Queue might not be thread safe, in this scenario.
A few things I have observed regarding Parallel.ForEach and dealing with Azure Storage (my experience has been with uploading blobs/blocks in parallel):
Azure storage operations are Network (IO) based operations and not processor intensive operations. If I am not mistaken, Parallel.ForEach is more suitable for processor intensive applications.
Another thing we noticed with uploading a large number of blobs (or blocks) using Parallel.ForEach is that we started to get a lot of Timeout exceptions and actually slowed down the entire operation. I believe the reason for this is when you iterate over a collection with large number of items using this approach, you're essentially handling the control to underlying framework which decides how to deal with that collection. In this case, a lot of Context Switching will take place which slows down the operation. Not sure how this would work in your scenario considering the payload is smaller.
My recommendation would be have the application control the number of parallel threads it can spawn. A good criteria would be the number of logical processor. Another good criteria would be the number of ports IE can open. So you would spawn that many number of parallel threads. Then you could either wait for all threads to finish to spawn next set of parallel threads or start a new thread as soon as one task finishes.
Pseudo Code:
ICollection<string> messageContents;
private void AddMessages()
{
int maxParallelThreads = Math.Min(Environment.ProcessorCount, messageContents.Count);
if (maxParallelThreads > 0)
{
var itemsToAdd = messageContents.Take(maxParallelThreads);
List<Task> tasks = new List<Task>();
for (var i = 0; i < maxParallelThreads; i++)
{
tasks.Add(Task.Factory.StartNew(() =>
{
AddMessage(itemsToAdd[i]);
RemoveItemFromCollection();
}));
}
Task.WaitAll(tasks.ToArray());
AddMessages();
}
}
Your code looks fine to me at a high level. Gaurav's additions make sense, so you have more controls over the parallel processing of your requests. Make sure you add some form of retry logic, and perhaps setting the DefaultConnectionLimit to something greater than its default value (which is 2). You may also consider adding multiple Azure Queues across multiple storage accounts if you hit a form of throttling, depending on the type of errors you are getting.
For anyone looking to add a large number of non-POCO/string messages in bulk/batch to a queue, an alternate/better solution will be to add the list of messages as a single message or blob, and then in a queue/blob trigger traverse & add each message to a [seperate] queue.
var maxDegreeOfParallelism = Math.Min(Environment.ProcessorCount,cloudQueueMessageCollection.Count());
var parallelOptions=new ParallelOptions { MaxDegreeOfParallelism = maxDegreeOfParallelism };
Parallel.ForEach(cloudQueueMessageCollection, parallelOptions,
async (m) => await AddMessageAsync(queue, connectionStringOrKey, m));

Multi requests to Youtube API V2 using Groovy

I have a list of youtube videos from different playlists and I need to check if these videos are still valid (they are around 1000). What I am doing at the moment it is hitting Youtube using its API v2 and Groovy with this simple script:
import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.Method.GET
http = new HTTPBuilder('http://gdata.youtube.com')
myVideoIds.each { id ->
if (!isValidYoutubeUrl(id)) {
// do stuff
}
}
boolean isValidYoutubeUrl (id) {
boolean valid = true
http.request(GET) {
uri.path = "feeds/api/videos/${id}"
headers.'User-Agent' = 'Mozilla/5.0 Ubuntu/8.10 Firefox/3.0.4'
response.failure = { resp ->
valid = false
}
}
valid
}
but after a few seconds it starts to return 403 for any single id (it may be due to the fact it is running too many requests closely). The problem is reduced if I insert something like Thread.sleep(3000). Is there a better solution than just delaying the requests?
In V2 of the API, there are time-based limits on how many requests you can make, but they aren't a hard and fast limit (that is, it depends somewhat on many under-the-hood factors and may not always be the same limit). Here's what the documentation says:
The YouTube API enforces quotas to prevent problems associated with
irregular API usage. Specifically, a too_many_recent_calls error
indicates that the API servers have received too many calls from the
same caller in a short amount of time. If you receive this type of
error, then we recommend that you wait a few minutes and then try your
request again.
You can avoid this by putting in a sleep like you do, but you'd want it to be 10-15 seconds or so.
It's more important, though, to implement batch processing. With this, you can make up to 50 requests at once (this counts as 50 requests against your overall request per day quota, but only as one against your per time quota). Batch processing with v2 of the API is a little involved, as you make a POST request to a batch endpoint first, and then based on those results you can send in the multiple requests. Here's the documentation:
https://developers.google.com/youtube/2.0/developers_guide_protocol?hl=en#Batch_processing
If you use v3 of the API, batch processing becomes quite a bit easier, as you just send 50 IDs at a time in the request. Change:
http = new HTTPBuilder('http://gdata.youtube.com')
to:
http = new HTTPBuilder('https://www.googleapis.com')
Then set your uri.path to
youtube/v3/videos?part=id&max_results=key={your API key}&id={variable here that represents 50 YouTube IDs, comma separated}
For 1000 videos, then, you'll only need to make 20 calls. Any video that doesn't come back in the list doesn't exist anymore (if you need to get video details, change the part parameter to be id,snippet,contentDetails or something appropriate for your needs.
Here's the documentation:
https://developers.google.com/youtube/v3/docs/videos/list#id

Resources