Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
i want to know the purpose of the hostname in EventProcessHost and how to set partition in consumer side . right now i am able to get data from consumer group but all partitions goes to the output stream .
Questions:
1.How to set partition via code java.
2.Use of hostname in EventProcessHost
3.example for multi consumer each has it's own partition in java code.
I highly appreciate any help.
There is a complete Java example, see the docs
You don't need to set a partition when you use an EventProcessHost. Instead, each instance will lease a partition it will work on. So if you created the event hub using ,say 4 partitions, you should instantiate 4x EventProcessHost to get better troughput. See the linked docs as well:
This tutorial uses a single instance of EventProcessorHost. To increase throughput, it is recommended that you run multiple instances of EventProcessorHost, preferably on separate machines. This provides redundancy as well. In those cases, the various instances automatically coordinate with each other in order to load balance the received events.
Leases are given out for a specific time only. After that another receiver can take over that lease. If you give it a while you should notice all instances will retrieve data.
About the hostname:
When receiving events from different machines, it might be useful to specify names for EventProcessorHost instances based on the machines (or roles) in which they are deployed.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have written a TCP Socket program through which OBD-II devices connect now I want to host my TCP Socket code on Azure VM and this is my first time going with azure so the questions arise:
Will Azure VM be able to handle 100K devices or more?
If not which other Azure services should I use for this kind of problem.
Please help me to figure it out.
Thanks
For TCP Listeners, you should use Azure VMs. In the past, it was also possible to work with Worker Roles (Cloud Services) but it is now deprecated. In terms of the workload, we can't tell if a single VM is enough or not, but you should have more than one for high availability reasons.
In your shoes, I would do a Proof Of Concept with two virtual machines, then a load test to check how it behaves. And in case, add more vms.
PS: Try to use Azure Virtual Machine Scale Sets as it will help you in case you need to scale the architectre when / if needed.
According to the VM network throughput documentation on Github, Azure VMs support up to 500,000 inbound and 500,000 outbound flows. That's just from a networking perspective.
The next question is, can your server application handle the load induced by that many connections? That would seem to be a function of the number of concurrent requests, and the processing resources required to handle them.
So, the answer to 1. above is, "definitely maybe". You would need to benchmark in order to answer this question definitively.
If the answer you determine for your load pattern and application is "no", the answer to 2. might be to spin up multiple VMs, which you could do in a Virtual Machine Scale Set, or by manually spinning them up, and then putting an Azure Load Balancer in front of it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a RabbitMQ design question. Let say I have 6 exchanges and 10 queues and split up as below
5 exchange with type 'fanout' and has 5 queues bound to it
1 exchange with type 'topic' and it is routed to 5 different queues based on the routing key.
I have a microservice application which runs Kubernetes with the scale of 25 and all these applications acquire 1 rabbitmq connection per process. So 25 rabbitmq connections act as producer.
I have another application which also runs in Kubernetes with the scale of 1 and these applications acquire 1 rabbitmq connection. So 1 rabbitmq connection act as a consumer.
Numbers: Let say every exchange gets 100k messages per day.
Tech stack: Node.js + amqplib
Questions:
How many channels should producer needs to create for publishing the messages to exchanges?
How many channels should consumer needs to create for consuming the messages from the queues?
Is it a good approach to have one application act as a consumer which consumes the message from all the queues?
How can I scale the consumers automatically based on the queue size in Kubernetes?
Is it possible to build priority based on consumers? Let say due to heavy load conditions, I would like the consumers to stop consuming from a couple of queues and focus all the resources on the rest of the queues.
How many connections should producer & consumer create?
Thanks in advance :)
Semantically, there will be publishing and consuming components in your system. Each should use a channel, primarily because error reporting and handling is channel-scoped.
Whether a single application should consume from "all" queues, entirely depends on how you structure your services.
Same for controlling what consumers consume from what queues. Usually queue and consumers have semantical "types" and serve certain purposes.
Simply adding more consumers and increasing prefetch will only work up to a point, a single queue has a realistic throughput limit.
Scaling application instances based on queue length (messages in Ready state specifically) involves monitoring individual queue metrics. That only works with a small number of queues (with e.g. 100K queues collecting all metrics from all them becomes really expensive).
A small application that monitors metrics of an individual queue or the totals, and updates the number of replicas of an app in a deployment using the Kubernetes API should do.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Both apps are hosted in aws beanstalk and they are the same code base
Divide connection limit across apps e.g if i have two instances then its 50/50 both instances
OR
Set the Connection limit equal to the connection pool.
I have an API accessed by two kinds of users over a monolithic architecture, and I expect a +10k unique users. The users are event organizers and participants. I'm using a pooling mechanism of each nodejs instances.
My question is: how should I distribute the connection pool based on my database, having a limit of say 500 connection_limi. Or do I adjust my nodejs replica to the limit of my database or just set all replica to 500 pool connectionn?
It seems like you have several application server instances connecting to a single database.
Then you should install pgBouncer on the database server, because connection pooling on the application server alone would not be effective. You should set the pool size to something very small like 20 - 50, depending on the number of cores on the database and the number of concurrent I/O requests the database storage can handle. It does not really matter how many connections to pgBouncer you allow.
Make sure your database transactions are short and use transaction pooling mode, and you will be able to handle a big workload.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to ask what is the best practice for Azure Service Bus message TTL (time to live) option - https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-expiration.
We use Azure Service Bus to import data from one system to another, amount of records is a couple of millions.
Briefly saying, this option tells ASB how much time a message can stay in a queue or a topic before it moved to dead letter queue(if it is configured) - https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dead-letter-queues#moving-messages-to-the-dlq.
Even so, I cannot find how TTL value impacts on ASB throughput and performance. What is difference between 5 minutes, 1 hour and 20 hours set for TTL in terms of ASB queue/topic performance?
Thank you in advance
Time to live property is used to set the expiration time window for messages in Service Bus.
Based the time configured for TTL, the messages either moved to dead-letter or lost from the Queue. The usage of this property may differ based on the use cases.
For example, if I am sure that my system will not go down and will pick the messages as soon as it is en-queued, I will configure the TTL to very minimal time window say 1 minute (helps to verify the system is working fine by monitoring the dead-letter length of the Queue). If my system is not reliable or the system runs only once a day to process the messages, then I should have a higher value for this property, so that the messages will be available in the Queue for a longer time, letting the system to process.
Coming to the performance, there will not be much lack in the performance in the Queue due to the higher values of TTL.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to update multiple processes with several different pieces of data, at varying rates, but as fast as 10 Hz. I don't want the receiving processes to have to actively get this data, but rather have it pushed to them, so that they only have to do anything about the new data when there actually is any (no polling).
I'm only sending probably a few bytes of data to each process. The data being transmitted will not likely need to be stored permanently, at least not before being received and processed by the recipients. Also, no data is updated less frequently than once every few seconds, so receiver crashes are not a concern (once a crashed receiver recovers, it can just wait for the next update).
I've looked at unix domain sockets and UDP and a little bit at pipes and shared memory, but it seems that they don't quite fit what I'm trying to do:
Domain sockets require the sender to send a separate message to each recipient (i.e., no broadcasting/multicasting)
Shared memory has the disadvantage of having the clients check that data has been updated (unless there's a mechanism I'm not familiar with that can notify them)
UDP doesn't guarantee that the messages will arrive (maybe not likely a problem for communication on the same computer?), and I have some concern about the overhead from the network stack (which domain sockets doesn't have)
The concern about TCP (and other protocols that support inter-device communication) is that there is functionality that's not needed for interprocess communication on a single device, and that that could create unnecessary overhead.
Any suggestions and direction to references and resources are appreciated.
Have you looked at zeroMQ? It is a lightweight messaging library that supports various push/pull access patterns over several transport mechanisms.
One option is to write flat files or SQLite database on the same box.
And have another control file with a process shared mutex, condition variable and record count mapped into memory of the publisher and subscribers. This is the notification mechanism.
This way you would have full history of records in the file or the database which makes it easy to replay records, debug and recover subscribers from crashes.
The publisher would:
Map the control file into memory.
Add new records to the file or the database.
Lock the mutex.
Update the record count.
notify_all on the condition variable.
Unlock the mutex.
The subscribers would:
Map the control file into memory.
Lock the mutex.
Wait on the condition variable till there are new records (each subscriber maintains its own count of already processed records).
Unlock the mutex.
Process the new records from the file or the database.