We have a simple ETL process to extract data from an API to a Document DB which we would like to implement using functions. In brief, the process is to take a ~16,500 line file, extract an ID from each line (Function 1), build a URL for each ID (Function 2), hit an API using the URL (Function 3), store the response in a document DB (Function 4). We are using queues for inter-function communication and are seeing problems with timeouts in the first function while doing this.
Function 1 (index.js)
module.exports = function (context, odsDataFile) {
context.log('JavaScript blob trigger function processed blob \n Name:', context.bindingData.odaDataFile, '\n Blob Size:', odsDataFile.length, 'Bytes');
const odsCodes = [];
odsDataFile.split('\n').map((line) => {
const columns = line.split(',');
if (columns[12] === 'A') {
odsCodes.push({
'odsCode': columns[0],
'orgType': 'pharmacy',
});
}
});
context.bindings.odsCodes = odsCodes;
context.log(`A total of: ${odsCodes.length} ods codes have been sent to the queue.`);
context.done();
};
function.json
{
"bindings": [
{
"type": "blobTrigger",
"name": "odaDataFile",
"path": "input-ods-data",
"connection": "connecting-to-services_STORAGE",
"direction": "in"
},
{
"type": "queue",
"name": "odsCodes",
"queueName": "ods-org-codes",
"connection": "connecting-to-services_STORAGE",
"direction": "out"
}
],
"disabled": false
}
Full code here
This function works fine when the number of ID's is in the 100's but times out when it is in the 10's of 1000's. The building of the ID array happens in milliseconds and the function completes but the adding of the items to the queue seems to take many minutes and eventually causes a timeout at the default of 5 mins.
I am surprised that the simple act of populating the queue seems to take such a long time and that the timeout for a function seems to include the time for tasks external to the function (i.e. queue population). Is this to be expected? Are there more performant ways of doing this?
We are running under the Consumption (Dynamic) Plan.
I did some testing of this from my local machine and found that it takes ~200ms to insert a message into the queue, which is expected. So if you have 17k messages to insert and are doing it sequentially, the time will take:
17,000 messages * 200ms = 3,400,000ms or ~56 minutes
The latency may be a bit quicker when running from the cloud, but you can see how this would jump over 5 minutes pretty quickly when you are inserting that many messages.
If message ordering isn't crucial, you could insert the messages in parallel. Some caveats, though:
You can't do this with node -- it'd have to be C#. Node doesn't expose the IAsyncCollector interface to you so it does it all behind-the-scenes.
You can't insert everything in parallel because the Consumption plan has a limit of 250 network connections at a time.
Here's an example of batching up the inserts 200 at a time -- with 17k messages, this took under a minute in my quick test.
public static async Task Run(string myBlob, IAsyncCollector<string> odsCodes, TraceWriter log)
{
List<Task> tasks = new List<Task>();
string[] lines = myBlob.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int skip = 0;
int take = 200;
IEnumerable<string> batch = lines.Skip(skip).Take(take);
while (batch.Count() > 0)
{
await AddBatch(batch, odsCodes);
skip += take;
batch = lines.Skip(skip).Take(take);
}
}
public static async Task AddBatch(IEnumerable<string> lines, IAsyncCollector<string> odsCodes)
{
List<Task> tasks = new List<Task>();
foreach (string line in lines)
{
tasks.Add(odsCodes.AddAsync(line));
}
await Task.WhenAll(tasks);
}
As other answers have pointed out, because Azure Queues does not have a batch API you should consider an alternative such as Service Bus Queues. But if you are sticking with Azure Queues you need to avoid outputting the queue items sequentially, i.e. some form of constrained parallelism is necessary. One way to achieve this is to use the TPL Dataflow library.
One advantage Dataflow has to using batches of tasks and doing a WhenAll(..) is that you will never have a scenario where a batch is almost done and you are waiting for one slow execution to complete before starting the next batch.
I compared inserting 10,000 items with task batches of size 32 and dataflow with parallelism set to 32. The batch approach completed in 60 seconds, while dataflow completed in almost half that (32 seconds).
The code would look something like this:
using System.Threading.Tasks.Dataflow;
...
var addMessageBlock = new ActionBlock<string>(async message =>
{
await odscodes.AddAsync(message);
}, new ExecutionDataflowBlockOptions { SingleProducerConstrained = true, MaxDegreeOfParallelism = 32});
var bufferBlock = new BufferBlock<string>();
bufferBlock.LinkTo(addMessageBlock, new DataflowLinkOptions { PropagateCompletion = true });
foreach(string line in lines)
bufferBlock.Post(line);
bufferBlock.Complete();
await addMessageBlock.Completion;
Related
Basically, each of the clients ---that have a clientId associated with them--- can push messages and it is important that a second message from the same client isn't processed until the first one is finished processing (Even though the client can send multiple messages in a row, and they are ordered, and multiple clients sending messages should ideally not interfere with each other). And, importantly, a job shouldn't be processed twice.
I thought that using Redis I might be able to fix this issue, I started with some quick prototyping using the bull library, but I am clearly not doing it well, I was hoping someone would know how to proceed.
This is what I tried so far:
Create jobs and add them to the same queue name for one process, using the clientId as the job name.
Consume jobs while waiting large random amounts of random time on 2 separate process.
I tried adding the default locking provided by the library that I am using (bull) but it locks on the jobId, which is unique for each job, not on the clientId .
What I would want to happen:
One of the consumers can't take the job from the same clientId until the previous one is finished processing it.
They should be able to, however, get items from different clientIds in parallel without problem (asynchronously). (I haven't gotten this far, I am right now simply dealing with only one clientId)
What I get:
Both consumers consume as many items as they can from the queue without waiting for the previous item for the clientId to be completed.
Is Redis even the right tool for this job?
Example code
// ./setup.ts
import Queue from 'bull';
import * as uuid from 'uuid';
// Check that when a message is taken from a place, no other message is taken
// TO do that test, have two processes that process messages and one that sets messages, and make the job take a long time
// queue for each room https://stackoverflow.com/questions/54178462/how-does-redis-pubsub-subscribe-mechanism-works/54243792#54243792
// https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
export async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
export interface JobData {
id: string;
v: number;
}
export const queue = new Queue<JobData>('messages', 'redis://127.0.0.1:6379');
queue.on('error', (err) => {
console.error('Uncaught error on queue.', err);
process.exit(1);
});
export function clientId(): string {
return uuid.v4();
}
export function randomWait(minms: number, maxms: number): Promise<void> {
const ms = Math.random() * (maxms - minms) + minms;
return sleep(ms);
}
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
// eslint-disable-next-line #typescript-eslint/ban-ts-comment
//#ts-ignore
queue.LOCK_RENEW_TIME = 5 * 60 * 1000;
// ./create.ts
import { queue, randomWait } from './setup';
const MIN_WAIT = 300;
const MAX_WAIT = 1500;
async function createJobs(n = 10): Promise<void> {
await randomWait(MIN_WAIT, MAX_WAIT);
// always same Id
const clientId = Math.random() > 1 ? 'zero' : 'one';
for (let index = 0; index < n; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
const job = { id: clientId, v: index };
await queue.add(clientId, job).catch(console.error);
console.log('Added job', job);
}
}
export async function create(nIds = 10, nItems = 10): Promise<void> {
const jobs = [];
await randomWait(MIN_WAIT, MAX_WAIT);
for (let index = 0; index < nIds; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
jobs.push(createJobs(nItems));
await randomWait(MIN_WAIT, MAX_WAIT);
}
await randomWait(MIN_WAIT, MAX_WAIT);
await Promise.all(jobs)
process.exit();
}
(function mainCreate(): void {
create().catch((err) => {
console.error(err);
process.exit(1);
});
})();
// ./consume.ts
import { queue, randomWait, clientId } from './setup';
function startProcessor(minWait = 5000, maxWait = 10000): void {
queue
.process('*', 100, async (job) => {
console.log('LOCKING: ', job.lockKey());
await job.takeLock();
const name = job.name;
const processingId = clientId().split('-', 1)[0];
try {
console.log('START: ', processingId, '\tjobName:', name);
await randomWait(minWait, maxWait);
const data = job.data;
console.log('PROCESSING: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('PROCESSED: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('FINISHED: ', processingId, '\tjobName:', name, '\tdata:', data);
} catch (err) {
console.error(err);
} finally {
await job.releaseLock();
}
})
.catch(console.error); // Catches initialization
}
startProcessor();
This is run using 3 different processes, which you might call like this (Although I use different tabs for a clearer view of what is happening)
npx ts-node consume.ts &
npx ts-node consume.ts &
npx ts-node create.ts &
I'm not familir with node.js. But for Redis, I would try this,
Let's say you have client_1, client_2, they are all publisher of events.
You have three machines, consumer_1,consumer_2, consumer_3.
Establish a list of tasks in redis, eg, JOB_LIST.
Clients put(LPUSH) jobs into this JOB_LIST, in a specific form, like "CLIENT_1:[jobcontent]", "CLIENT_2:[jobcontent]"
Each consumer takes out jobs blockingly (RPOP command of Redis) and process them.
For example, consumer_1 takes out a job, content is CLIENT_1:[jobcontent]. It parses the content and recognize it's from CLIENT_1. Then it wants to check if some other consumer is processing CLIENT_1 already, if not, it will lock the key to indicate that it's processing CLIENT_1.
It goes on to set a key of "CLIENT_1_PROCESSING" , with content as "consumer_1", using the Redis SETNX command (set if the key not exists), with an appropriate timeout. For example, the task norally takes one minute to finish, you set a timeout of the key of five minutes, just in case consumer_1 crashes and holds on the lock indefinitely.
If the SETNX returns 0, it means it fails to acquire the lock of CLIENT_1 (someone is already processing a job of client_1). Then it returns the job (a value of "CLIENT_1:[jobcontent]")to the left side of JOB_LIST, by using Redis LPUSH command.Then it might wait a bit (sleep a few seconds), and RPOP another task from the right side of the LIST. If this time SETNX returns 1, consumer_1 acquires the lock. It goes on to process job, after it finishes, it deletes the key of "CLIENT_1_PROCESSING", releasing the lock. Then it goes on to RPOP another job, and so on.
Some things to consider:
The JOB_LIST is not fair,eg, earlier jobs might be processed later
The locking part is a bit rudimentary, but will suffice.
----------update--------------
I've figured another way to keep tasks in order.
For each client(producer), build a list. Like "client_1_list", push jobs into the left side of the list.
Save all the client names in a list "client_names_list", with values "client_1", "client_2", etc.
For each consumer(processor), iterate the "client_names_list", for example, consumer_1 get a "client_1", check if the key of client_1 is locked(some one is processing a task of client_1 already), if not, right pop a value(job) from client_1_list and lock client_1. If client_1 is locked, (probably sleep one second) and iterate to the next client, "client_2", for example, and check the keys and so on.
This way, each client(task producer)'s task is processed by their order of entering.
EDIT: I found the problem regarding BullJS is starting jobs in parallel on one processor: We are using named jobs and where defining many named process functions on one queue/processor. The default concurrency factor for a queue/processor is 1. So the queue should not process any jobs in parallel.
The problem with our mentioned setup is if you define many (named) process-handlers on one queue the concurrency is added up with each process-handler function: So if you define three named process-handlers you get a concurrency factor of 3 for given queue for all the defined named jobs.
So just define one named job per queue for queues where parallel processing should not happen and all jobs should run sequentially one after the other.
That could be important e.g. when pushing a high number of jobs onto the queue and the processing involves API calls that would give errors if handled in parallel.
The following text is my first approach of answering the op's question and describes just a workaround to the problem. So better just go with my edit :) and configure your queues the right way.
I found an easy solution to operators question.
In fact BullJS is processing many jobs in parallel on one worker instance:
Let's say you have one worker instance up and running and push 10 jobs onto the queue than possibly that worker starts all processes in parallel.
My research on BullJS-queues gave that this is not intended behavior: One worker (also called processor by BullJS) should only start a new job from the queue when its in idle state so not processing a former job.
Nevertheless BullJS keeps starting jobs in parallel on one worker.
In our implementation that lead to big problems during API calls that most likely are caused by t00 many API calls at a time. Tests gave that when only starting one worker the API calls finished just fine and gave status 200.
So how to just process one job after the other once the previous is finished if BullJS does not do that for us (just what the op asked)?
We first experimented with delays and other BullJS options but thats kind of workaround and not the exact solution to the problem we are looking for. At least we did not get it working to stop BullJS from processing more than one job at a time.
So we did it ourself and started one job after the other.
The solution was rather simple for our use case after looking into BullJS API reference (BullJS API Ref).
We just used a for-loop to start the jobs one after another. The trick was to use BullJS's
job.finished
method to get a Promise.resolve once the job is finished. By using await inside the for-loop the next job gets just started immediately after the job.finished Promise is awaited (resolved). Thats the nice thing with for-loops: Await works in it!
Here a small code example on how to achieve the intended behavior:
for (let i = 0; i < theValues.length; i++) {
jobCounter++
const job = await this.processingQueue.add(
'update-values',
{
value: theValues[i],
},
{
// delay: i * 90000,
// lifo: true,
}
)
this.jobs[job.id] = {
jobType: 'socket',
jobSocketId: BackgroundJobTasks.UPDATE_VALUES,
data: {
value: theValues[i],
},
jobCount: theValues.length,
jobNumber: jobCounter,
cumulatedJobId
}
await job.finished()
.then((val) => {
console.log('job finished:: ', val)
})
}
The important part is really
await job.finished()
inside the for loop. leasingValues.length jobs get started all just one after the other as intended.
That way horizontally scaling jobs across more than one worker is not possible anymore. Nevertheless this workaround is okay for us at the moment.
I will get in contact with optimalbits - the maker of BullJS to clear things out.
I have an EventHub configured in Azure, also a consumer group for reading the data. It was working fine for some days. Suddenly, I see there is a delay in incoming data(around 3 days). I use Windows Service to consume data in my server. I have around 500 incoming messages per minute. Can anyone help me out to figure this out ?
It might be that you are processing them items too slow. Therefore the work to be done grows and you will lag behind.
To get some insight in where you are in the event stream you can use code like this:
private void LogProgressRecord(PartitionContext context)
{
if (namespaceManager == null)
return;
var currentSeqNo = context.Lease.SequenceNumber;
var lastSeqNo = namespaceManager.GetEventHubPartition(context.EventHubPath, context.ConsumerGroupName, context.Lease.PartitionId).EndSequenceNumber;
var delta = lastSeqNo - currentSeqNo;
logWriter.Write(
$"Last processed seqnr for partition {context.Lease.PartitionId}: {currentSeqNo} of {lastSeqNo} in consumergroup '{context.ConsumerGroupName}' (lag: {delta})",
EventLevel.Informational);
}
the namespaceManager is build like this:
namespaceManager = NamespaceManager.CreateFromConnectionString("Endpoint=sb://xxx.servicebus.windows.net/;SharedAccessKeyName=yyy;SharedAccessKey=zzz");
I call this logging method in the CloseAsync method:
public Task CloseAsync(PartitionContext context, CloseReason reason)
{
LogProgressRecord(context);
return Task.CompletedTask;
}
logWriter is just some logging class I have used to write info to blob storage.
It now outputs messages like
Last processed seqnr for partition 3: 32780931 of 32823804 in consumergroup 'telemetry' (lag: 42873)
so when the lag is very high you could be processing events that have occurred a long time ago. In that case you need to scale up/out your processor.
If you notice a lag you should measure how long it takes to process a given number of item. You can then try to optimize performance and see whether it improves. We did it like:
public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> events)
{
try
{
stopwatch.Restart();
// process items here
stopwatch.Stop();
await CheckPointAsync(context);
logWriter.Write(
$"Processed {events.Count()} events in {stopwatch.ElapsedMilliseconds}ms using partition {context.Lease.PartitionId} in consumergroup {context.ConsumerGroupName}.",
EventLevel.Informational);
}
}
It seems like my kafka node consumer:
var kafka = require('kafka-node');
var consumer = new Consumer(client, [], {
...
});
is fetching way too many messages than I can handle in certain cases.
Is there a way to limit it (for example accept no more than 1000 messages per second, possibly using the pause api?)
I'm using kafka-node, which seems to have a limited api comparing to the Java version
In Kafka, poll and process should happen in a coordinated/synchronized way. Ie, after each poll, you should process all received data first, before you do the next poll. This pattern will automatically throttle the number of messages to the max throughput your client can handle.
Something like this (pseudo-code):
while(isRunning) {
messages = poll(...)
for(m : messages) {
process(m);
}
}
(That is the reason, why there is not parameter "fetch.max.messages" -- you just do not need it.)
I had a similar situation where I was consuming messages from Kafka and had to throttle the consumption because my consumer service was dependent on a third party API which had its own constraints.
I used async/queue along with a wrapper of async/cargo called asyncTimedCargo for batching purpose.
The cargo gets all the messages from the kafka-consumer and sends it to queue upon reaching a size limit batch_config.batch_size or timeout batch_config.batch_timeout.
async/queue provides saturated and unsaturated callbacks which you can use to stop the consumption if your queue task workers are busy. This would stop the cargo from filling up and your app would not run out of memory. The consumption would resume upon unsaturation.
//cargo-service.js
module.exports = function(key){
return new asyncTimedCargo(function(tasks, callback) {
var length = tasks.length;
var postBody = [];
for(var i=0;i<length;i++){
var message ={};
var task = JSON.parse(tasks[i].value);
message = task;
postBody.push(message);
}
var postJson = {
"json": {"request":postBody}
};
sms_queue.push(postJson);
callback();
}, batch_config.batch_size, batch_config.batch_timeout)
};
//kafka-consumer.js
cargo = cargo-service()
consumer.on('message', function (message) {
if(message && message.value && utils.isValidJsonString(message.value)) {
var msgObject = JSON.parse(message.value);
cargo.push(message);
}
else {
logger.error('Invalid JSON Message');
}
});
// sms-queue.js
var sms_queue = queue(
retryable({
times: queue_config.num_retries,
errorFilter: function (err) {
logger.info("inside retry");
console.log(err);
if (err) {
return true;
}
else {
return false;
}
}
}, function (task, callback) {
// your worker task for queue
callback()
}), queue_config.queue_worker_threads);
sms_queue.saturated = function() {
consumer.pause();
logger.warn('Queue saturated Consumption paused: ' + sms_queue.running());
};
sms_queue.unsaturated = function() {
consumer.resume();
logger.info('Queue unsaturated Consumption resumed: ' + sms_queue.running());
};
From FAQ in the README
Create a async.queue with message processor and concurrency of one (the message processor itself is wrapped with setImmediate function so it will not freeze up the event loop)
Set the queue.drain to resume() the consumer
The handler for consumer's message event to pause() the consumer and pushes the message to the queue.
As far as I know the API does not have any kind of throttling. But both consumers (Consumer and HighLevelConsumer) have a 'pause()' function. So you could stop consuming if you get to much messages. Maybe that already offers what you need.
Please keep in mind what's happening. You send a fetch request to the broker and get a batch of message back. You can configure the min and max size of the messages (according to the documentation not the number of messages) you want to fetch:
{
....
// This is the minimum number of bytes of messages that must be available to give a response, default 1 byte
fetchMinBytes: 1,
// The maximum bytes to include in the message set for this partition. This helps bound the size of the response.
fetchMaxBytes: 1024 * 1024,
}
I was facing the same issue, initially fetchMaxBytes value was
fetchMaxBytes: 1024 * 1024 * 10 // 10MB
I just chanbed it to
fetchMaxBytes: 1024
It worked very smoothly after the change.
I have an async method that gets api data from a server. When I run this code on my local machine, in a console app, it performs at high speed, pushing through a few hundred http calls in the async function per minute. When I put the same code to be triggered from an Azure WebJob queue message however, it seems to operate synchronously and my numbers crawl - I'm sure I am missing something simple in my approach - any assistance appreciated.
(1) .. WebJob function that listens for a message on queue and kicks off the api get process on message received:
public class Functions
{
// This function will get triggered/executed when a new message is written
// on an Azure Queue called queue.
public static async Task ProcessQueueMessage ([QueueTrigger("myqueue")] string message, TextWriter log)
{
var getAPIData = new GetData();
getAPIData.DoIt(message).Wait();
log.WriteLine("*** done: " + message);
}
}
(2) the class that outside azure works in async mode at speed...
class GetData
{
// wrapper that is called by the message function trigger
public async Task DoIt(string MessageFile)
{
await CallAPI(MessageFile);
}
public async Task<string> CallAPI(string MessageFile)
{
/// create a list of sample APIs to call...
var apiCallList = new List<string>();
apiCallList.Add("localhost/?q=1");
apiCallList.Add("localhost/?q=2");
apiCallList.Add("localhost/?q=3");
apiCallList.Add("localhost/?q=4");
apiCallList.Add("localhost/?q=5");
// setup httpclient
HttpClient client =
new HttpClient() { MaxResponseContentBufferSize = 10000000 };
var timeout = new TimeSpan(0, 5, 0); // 5 min timeout
client.Timeout = timeout;
// create a list of http api get Task...
IEnumerable<Task<string>> allResults = apiCallList.Select(str => ProcessURLPageAsync(str, client));
// wait for them all to complete, then move on...
await Task.WhenAll(allResults);
return allResults.ToString();
}
async Task<string> ProcessURLPageAsync(string APIAddressString, HttpClient client)
{
string page = "";
HttpResponseMessage resX;
try
{
// set the address to call
Uri URL = new Uri(APIAddressString);
// execute the call
resX = await client.GetAsync(URL);
page = await resX.Content.ReadAsStringAsync();
string rslt = page;
// do something with the api response data
}
catch (Exception ex)
{
// log error
}
return page;
}
}
First because your triggered function is async, you should use await rather than .Wait(). Wait will block the current thread.
public static async Task ProcessQueueMessage([QueueTrigger("myqueue")] string message, TextWriter log)
{
var getAPIData = new GetData();
await getAPIData.DoIt(message);
log.WriteLine("*** done: " + message);
}
Anyway you'll be able to find usefull information from the documentation
Parallel execution
If you have multiple functions listening on different queues, the SDK will call them in parallel when messages are received simultaneously.
The same is true when multiple messages are received for a single queue. By default, the SDK gets a batch of 16 queue messages at a time and executes the function that processes them in parallel. The batch size is configurable. When the number being processed gets down to half of the batch size, the SDK gets another batch and starts processing those messages. Therefore the maximum number of concurrent messages being processed per function is one and a half times the batch size. This limit applies separately to each function that has a QueueTrigger attribute.
Here is a sample code to configure the batch size:
var config = new JobHostConfiguration();
config.Queues.BatchSize = 50;
var host = new JobHost(config);
host.RunAndBlock();
However, it is not always a good option to have too many threads running at the same time and could lead to bad performance.
Another option is to scale out your webjob:
Multiple instances
if your web app runs on multiple instances, a continuous WebJob runs on each machine, and each machine will wait for triggers and attempt to run functions. The WebJobs SDK queue trigger automatically prevents a function from processing a queue message multiple times; functions do not have to be written to be idempotent. However, if you want to ensure that only one instance of a function runs even when there are multiple instances of the host web app, you can use the Singleton attribute.
Have a read of this Webjobs SDK documentation - the behaviour you should expect is that your process will run and process one message at a time, but will scale up if more instances are created (of your app service). If you had multiple queues, they will trigger in parallel.
In order to improve the performance, see the configurations settings section in the link I sent you, which refers to the number of messages that can be triggered in a batch.
If you want to process multiple messages in parallel though, and don't want to rely on instance scaling, then you need to use threading instead (async isn't about multi-threaded parallelism, but making more efficient use of the thread you're using). So your queue trigger function should read the message from the queue, the create a thread and "fire and forget" that thread, and then return from the trigger function. This will mark the message as processed, and allow the next message on the queue to be processed, even though in theory you're still processing the earlier one. Note you will need to include your own logic for error handling and ensuring that the data wont get lost if your thread throws an exception or can't process the message (eg. put it on a poison queue).
The other option is to not use the [queuetrigger] attribute, and use the Azure storage queues sdk API functions directly to connect and process the messages per your requirements.
I have a Node.js timerTrigger Azure function that processes a collection and queues the processing results for further processing (by a Node.js queueTrigger function).
The code is something like the following:
module.exports = function (context, myTimer) {
collection.forEach(function (item) {
var items = [];
// do some work and fill 'items'
var toBeQueued = { items: items };
context.bindings.myQueue = toBeQueued;
});
context.done();
};
This code will only queue the last toBeQueued and not each one I'm trying to queue.
Is there any way to queue more than one item?
Update
To be clear, I'm talking about queueing a toBeQueued in each iteration of forEach, not just queueing an array. Yes, there is an issue with Azure Functions because of which I cannot queue an array, but I have a workaround for it; i.e., { items: items }.
Not yet, but we'll address that within the week, stay tuned :) You'll be able to pass an array to the binding as you're trying to do above.
We have an issue tracking this in our public repo here. Thanks for reporting.
Mathewc's answer is the correct one wrt Node.
For C# you can today by specifying ICollector<T> as the type of your output queue parameter.
Below is an example I have of two output queues, one of which I add via a for loop.
public static void Run(Item inbound, DateTimeOffset InsertionTime, ICollector<Item> outbound, ICollector<LogItem> telemetry, TraceWriter log)
{
log.Verbose($"C# Queue trigger function processed: {inbound}");
telemetry.Add(new LogItem(inbound, InsertionTime));
if(inbound.current_generation < inbound.max_generation)
{
for(int i = 0; i < inbound.multiplier; i++) {
outbound.Add(Item.nextGen(inbound));
}
}
}