how to read x number of messages in kafkajs consumer at a time - node.js

i have situation where to achieve better performance i have to read multiple kakfka message at a time, i have search on the internet and found their is functionality of kafka called batch where we can read messages in batch the problem is that i am not able to configure it to receive only max x number of message at a time .
code that i found
await consumer.run({
eachBatchAutoResolve: true,
eachBatch: async ({
batch,
resolveOffset,
heartbeat,
commitOffsetsIfNecessary,
uncommittedOffsets,
isRunning,
isStale,
}) => {
for (let message of batch.messages) {
console.log({
topic: batch.topic,
partition: batch.partition,
highWatermark: batch.highWatermark,
message: {
offset: message.offset,
key: message.key.toString(),
value: message.value.toString(),
headers: message.headers,
}
})
resolveOffset(message.offset)
await heartbeat()
}
},
})
environment = Node

Related

RxJS non-blocking pooling

I'm working with Amazon Transcribe Service and in the SDK doesn't have any way to get when the transcription job has finished.
So we need to pool that information. Nowdays we have the following working code...
const jobid = nanoid();
await amazonTrascribeClient
.startTranscriptionJob({
IdentifyMultipleLanguages: true,
TranscriptionJobName: jobId,
Media: {
MediaFileUri: "s3://file-location",
},
Subtitles: {
OutputStartIndex: 1,
Formats: ["vtt", "srt"],
},
OutputBucketName: `file-location`,
OutputKey: `transcriptions/${jobId}/`,
})
.promise();
// HELP HERE:
const callIntervalFunc = () => {
const callInverval = setInterval(async () => {
const { TranscriptionJob } = await amazonTrascribeClient
.getTranscriptionJob({ TranscriptionJobName: jobId })
.promise();
if (
["COMPLETED", "FAILED"].includes(TranscriptionJob.TranscriptionJobStatus)
) {
clearInterval(callInverval);
// Persist in database etc...
}
}, 2000);
};
callIntervalFunc();
But as you can see it's extremally expensive and don't work in the concurrency mode since it's lock the thread. The objective is pool the information without block the event loop, some people said to use fire and forget, but I have no idea where I should start.

Discord.js bot random presence and status not showing nor changing when the bot starts

I've been trying to make a random bot presence/status change using Discord.js v13 that changes every 15 minutes. The problem I'm facing with my code is that the custom status and presence don't show when I first start the bot, I have to wait 15 minutes for it to show up and start changing.
Here is the code:
client.on("ready", async () => {
let servers = await client.guilds.cache.size
let servercount = await client.guilds.cache.reduce((a,b) => a+b.memberCount, 0 )
const statusArray = [
{
type: 'WATCHING',
content: `${servers} servers`,
status: 'online'
},
{
type: 'PLAYING',
content: `with my ${servercount} friends`,
status: 'online'
}
];
async function pickPresence() {
const option = Math.floor(Math.random() * statusArray.length);
try {
await client.user.setPresence({
activities: [
{
name: statusArray[option].content,
type: statusArray[option].type,
url: statusArray[option].url
},
],
status: statusArray[option].status
});
} catch (error) {
console.error(error);
}
}
setInterval(pickPresence, 1000*60*15);
});
Any ideas as to why it doesn't work instantly when I start the bot?
setInterval actually waits for the specified delay (15 minutes) before executing the code in the function for the first time. So all you need to do is simply add pickPresence() on the line before the setInterval.
pickPresence();
setInterval(pickPresence, 1000*60*15);

Consumer stop messages consuming from specific topic after some time of running

Environment Information
docker image based on node:12.13.1-alpine
Node Version : 12.13.1
node-rdkafka version : latest
The below code snippet is working fine. But sometimes it's stopping reading messages from specific Kafka's partition (we are having about 20 topics (5 partitions each one) with same pattern). We are not getting any errors. After service restart and rebalance consuming continue as usual. Which tuning should be done to manage those stuck partitions?
Throughput is low, it's about 150 messages for all topics per minute, each message is small JSON with some details (~500kb). We are running with 10 pods for specific service.
import { ConsumerStream, createReadStream } from 'node-rdkafka';
const kafkaConsumer = createConsumerStream(shutdown, config.kafka.topics);
kafkaConsumer.on('data', async (rawMessage) => {
const {
topic, partition, offset, value
} = rawMessage;
try {
await processKafkaMessage(rawMessage);
kafkaConsumer.consumer.commit({
topic: topic,
partition: partition,
offset: offset + 1
});
} catch (err) {
logger.error('Failed to process inbound kafka message');
}
});
export const createConsumerStream = (shutdown, topics:Array<string>):ConsumerStream => {
const globalConfig = {
'metadata.broker.list': ['kafka:9092'],
'group.id': 'my_group_1',
'enable.auto.commit': false,
'partition.assignment.strategy': 'roundrobin',
'topic.metadata.refresh.interval.ms': 30 * 100,
'batch.num.messages': 100000,
'queued.max.messages.kbytes': 10000,
'fetch.message.max.bytes': 10000,
'fetch.max.bytes': 524288000,
'retry.backoff.ms': 200,
retries: 5
};
const topicConfig = { 'auto.offset.reset': 'earliest' };
const streamOptions = {
topics: topics,
waitInterval: batchMaxTime,
fetchSize: batchMaxSize
};
const stream:ConsumerStream = createReadStream(globalConfig, topicConfig, streamOptions);
stream.on('error', (err) => {
logger.error('Error in kafka consumer stream', {
error_msg: err.message,
error_name: err.name
});
});
stream.consumer.on('event.error', (err) => {
if (err.stack === 'Error: Local: Broker transport failure') return;
logger.error('Error in kafka consumer');
stream.emit('rd-kafka-error', err);
});
stream.consumer.on('rebalance', ({ message }, assignment) => {
logger.info('Rebalance event', { assigned_topics: assignment });
});

my api needs time to process a request, how can I use React + SWR to continue checking on the status?

I have an endpoint in my API that initiates a process on AWS. This process takes time. It can last several seconds or minutes depending on the size of the request. As of right now, I'm rebuilding my app to use swr. However, before this new update with swr I created a recursive function that would call itself with a timeout and continuously ping the API to request the status of my AWS process, only exiting once the response had the appropriate type.
I'd like to dump that recursive function because, well ... it was kinda hacky. Though, I'm still getting familiar with swr and I'm not a NodeJS API building master so I'm curious what thoughts come to mind in regards to improving the pattern below.
Ideally, the lowest hanging fruit would be to set up swr in some way to handle the incoming response and keep ping if the response isn't type: "complete" but I'm not sure how I'd do that. It pretty much just pings once and shows me whatever status it found at that time.
any help is appreciated!
tldr;
how can I set up swr to continually ping the API until my content is finished loading?
part of my API that sends out responses based how far along the AWS process is:
if (serviceResponse !== undefined) {
// * task is not complete
const { jobStatus } = serviceResponse.serviceJob;
if (serviceJobStatus.toLowerCase() === 'in_progress') {
return res.status(200).send({ type: 'loading', message: serviceJobStatus });
}
if (serviceJobStatus.toLowerCase() === 'queued') {
return res.status(200).send({ type: 'loading', message: serviceJobStatus });
}
if (serviceJobStatus.toLowerCase() === 'failed') {
return res.status(400).send({ type: 'failed', message: serviceJobStatus });
}
// * task is complete
if (serviceJobStatus.toLowerCase() === 'completed') {
const { serviceFileUri } = serviceResponse.serviceJob?.Data;
const { data } = await axios.get(serviceUri as string);
const formattedData = serviceDataParser(data.results);
return res.status(200).send({ type: 'complete', message: formattedData });
}
} else {
return res.status(400).send({ type: 'error', message: serviceResponse });
}
}
my current useSWR hook:
const { data: rawServiceData } = useSwr(
serviceEndpoint,
url => axios.get(url).then(r => r.data),
{
onSuccess: data => {
if (data.type === 'complete') {
dispatch(
setStatus({
type: 'success',
data: data.message,
message: 'service has been successfully generated.',
display: 'support-both',
})
);
dispatch(setRawService(data.message));
}
if (data.type === 'loading') {
dispatch(
setStatus({
type: 'success',
data: data.message,
message: 'service job is in progress.',
display: 'support-both',
})
);
}
},
}
);
After some digging around, figured I'd use the refreshInterval option that comes with swr. I am changing the state of a boolean on my component.
while the request is 'loading' the boolean in state is false.
once the job is 'complete' the boolean in state is set to true.
there is a ternary within my hook that sets the refreshInterval to 0 (default:off) or 3000.
const [serviceJobComplete, setServiceJobComplete] = useState(false);
const { data: serviceData } = useSwr(
serviceEndpoint,
url => axios.get(url).then(r => r.data),
{
revalidateIfStale: false,
revalidateOnFocus: false,
revalidateOnReconnect: false,
refreshInterval: serviceJobComplete ? 0 : 3000,
...
// other options
}
);
helpful resources:
https://github.com/vercel/swr/issues/182
https://swr.vercel.app/docs/options

How to test send message in kafka with jest?

export const produce = async (topic: string, url: string): Promise<void> => {
await producer.connect();
await producer.send({
topic,
compression: CompressionTypes.GZIP,
messages: [{ key: `key-${Math.random() * 100000}`, value: url }],
});
await producer.disconnect();
};
I have a send message kafka like this how can I test this function with jest, because that if we want to run jest we must run service Kafka and zookeeper. what could I do ?
I had try something like :
describe("Helper kafka", () => {
describe("produce", () => {
it("Pass 1 url should return 1", () => {
expect(produce("testing", "URL")).toHaveBeenCalled();
});
});
});
It's failed because the service is not running.

Resources