Azure Eventhub Sequence Number sequencing - azure

I know the below is true for sequence number within a partition :-
Sequence Number follows a predictable pattern where numbering is contiguous and unique within the scope of a partition. So if message x has sequence number 500 , message y will have sequence number 501.
Say last sequence number when message was received was 5000, and then no more messages were received and after 7 days retention policy , on 10th day we receive a message on the partition , will the sequence number start from 5001 or will it be different this time?
The reason i am asking because I am seeing:
"The expected sequence number xxx is less than the received sequence number yyy."?
For example:-
The supplied sequence number '33508' is invalid. The last sequence number in the system is '583'

Just to update this post with answer, we found the concerned EventHubs are indeed recreated, but checkpoint store was not reset. Thus, SequenceNumber presented by client (read from checkpoint store) was greater than latest sequence number at service.
As Serkant confirmed above, Sequence number is always contiguous. It can only be interrupted by recreating the eventhub. Typically, you shouldn’t have a need to delete and recreate EventHub in production. But in case you run into the situation, you should also reset checkpoint store.
HTH,
Amit Bhatia

Related

Efficient Algorithm for updating a pointer to the earliest pending work log

There's a DynamoDB table which holds records. This table has 1 row that's dedicated to just tracking the nextPending message. The partition key field has the value nextPending and it has a messageId. All other rows of the table contain messages. Each record has a messageId which is unique and has no gaps and is non-decreasing. At this point in time there are over 1M records in this table. The records indicate a piece of work. A service consumes this queue and processes each message one by one. After completely processing the message, it does two things. Firstly, it sets the status field of the record to a terminal state. Next, it updates the nextPending record with messageId = previous messageId + 1.
Now, we are attempting to make this service multi-threaded. Instead of processing messages one at a time, we'll have multiple threads that work on messages in parallel and these messages can be completed in random order.
I'm looking for an efficient and elegant algorithm for updating the nextPending field appropriately. Imagine nextPending currently has the value 101. Various threads in the service are working on messages between 101 and 110. Say, they complete in this order: 109, 105, 104, 108, 103, 102, 114, 101, .... We'd need to update nextPending to 105 after we see that 101 is done. Until 101 is done, we cannot update nextPending because 101 might fail and will need to be retried and as long as it is not done, nextPending should always point to the earliest pending message.
One algorithm maybe:
After completely processing the message, each thread does two things. Firstly, it sets the status field of the record to a terminal state. Next, it updates the nextPending record with the the earliest messageId that's pending at this point. But, this solution requires each thread to read several records from DynamoDB and check the status of the message. Also, several threads will now compete to conditionally update this one row in the table. That's also not something ideal.
Another algorithm maybe:
Each thread shares a common rolling window of some sort which tracks all the completed messages. When a message is completed, a dedicated thread checks if it's messageId = nextPending + 1. If yes, we update nextPending to be the largest number in the current sequence of completed messages. In this method, we don't unnecessarily read from DynamoDB, we also don't have several threads racing against one another to do the same unit of work.
Any better ideas here?

Aggregate continuous stream of number from a file using hazelcast jet

I am trying to sum continuous stream of numbers from a file using hazelcast jet
pipe
.drawFrom(Sources.fileWatcher)<dir>))
.map(s->Integer.parseInt(s))
.addTimestamps()
.window(WindowDefinition.sliding(10000,1000))
.aggregate(AggregateOperations.summingDouble(x->x))
.drainTo(Sinks.logger());
Few questions
It doesn't give the expected output, my expectation is as soon as new number appears in the file, it should just add it to the existing sum
To do this why i need to give window and addTimestamp method, i just need to do sum of infinite stream
How can we achieve fault tolerance, i. e. if server restarts will it save the aggregated result and when it comes up it will aggregate from the last computed sum?
if the server is down and few numbers come in file now when the server comes up, will it read from last point from when the server went down or will it miss the numbers when it was down and will only read the number it got after the server was up.
Answer to Q1 & Q2:
You're looking for rollingAggregate, you don't need timestamps or windows.
pipe
.drawFrom(Sources.fileWatcher(<dir>))
.rollingAggregate(AggregateOperations.summingDouble(Double::parseDouble))
.drainTo(Sinks.logger());
Answer to Q3 & Q4: the fileWatcher source isn't fault tolerant. The reason is that it reads local files and when a member dies, the local files won't be available anyway. When the job restarts, it will start reading from current position and will miss numbers added while the job was down.
Also, since you use global aggregation, data from all files will be routed to single cluster member and other members will be idle.

Analysis of Testing Report of Caliper

My Question is:
Q1. What do we mean by 'label 1, 2' and how 4 Peers are contributing to it?
Q2. What do we mean by label 3, when we compare it with 'send rate' ?
Q3. What is difference between label 3 and lable 5 and why there is much gap in memory utilization of both?
Q1: Lable 1 is the number of transactions that were successfully processed and written on the ledger. Lable 2 is the number of transactions that are being submitted every second. Lable 2 has nothing to do with the number of peers but the number of peers (and their processing power) contributes to this as if a peer fails to do its job (endorsement, verification, etc.) the transaction would fail therefore this number would be different.
Q2: Lable 3 represents the number of transaction that has been processed vs send rate which is the per second rate of the transactions submitted to the blockchain. e.g., in your Test1 the 49 transactions per second were submitted but only 47 transactions per second were processed hence the 2.4 seconds Max Latency (It is more complex than what I said.)
Q3: Lable 5 represents a peer which is in charge of running and verifying the smart contracts and probably endorsement as well (depending on your endorsement policy) but the label 3 is a world state database peer (more here: https://vitalflux.com/hyperledger-fabric-difference-world-state-db-transaction-logs/ ) and running smart contracts uses more resources.

Azure ServiceBus Eventhub, is the "offset" still available/durable when some of event data is expired?

When I write some code to test the EventHub which is a newly released on azure service bus.
As there is very few article online and msdn also do not have rich documentation about the detail of event hub. So I hope someone could share your experience for my question.
For EventHub, we have following statement:
we use "offset" to remember where we are when reading the event data from some partition
the event data on the EventHub would be expired (automatically?) after some configurable time span
So my question is, can the offset still be available/durable when some of the event data is deleted as the result of expiration?
For example, we have following data on one of partition:
M1 | M2 | M3 | M4 ( oldest --> latest )
After my processing logic runs, let's say that I have processed M1 and M2, so the offset would be the start of M2(when use exclusive mode).
After some time, and if my service is down during that time. M1 is deleted as the result of expiration. so the partition would become:
M2 | M3 | M4 | M.... ( oldest -> latest )
In this case, when my server is restart again, is the offset i stored before is still be available to be used to read from M3?
We can also image this case on runtime when my consumer server is reading the event data on eventhub when some of the oldest event data is expired, does the offset still be available on runtime?
Thanks for any sharing of this question.
Based upon how various parts of the documentation is written I believe you will start from the beginning of the current stream as desired if your starting offset is no longer available. EventProcessorHost should follow similar restrictions. Since the sequence numbers are 64 bits, I would expect one of those to be able to serve as an offset within a partition since they monotonically increase without being recycled. The offset should have a similar property. So if EventHubs are designed in a reasonable fashion (ie like similar solutions), then the offsets within partitions can hold despite data expiration. But since I have not yet tested this myself, I will be very unhappy if it is not so, and I'd expect an Azure person to be able to give true confirmation.

Strange data access time in Azure Table Storage while using .Take()

this is our situation:
We store user messages in table Storage. The Partition key is the UserId and the RowKey is used as a message id.
When a users opens his message panel we want to just .Take(x) number of messages, we don't care about the sortOrder. But what we have noticed is that the time it takes to get the messages varies very much by the number of messages we take.
We did some small tests:
We did 50 * .Take(X) and compared the differences:
So we did .Take(1) 50 times and .Take(100) 50 times etc.
To make an extra check we did the same test 5 times.
Here are the results:
As you can see there are some HUGE differences. The difference between 1 and 2 is very strange. The same for 199-200.
Does anybody have any clue how this is happening? The Table Storage is on a live server btw, not development storage.
Many thanks.
X: # Takes
Y: Test Number
Update
The problem only seems to come when I'm using a wireless network. But I'm using the cable the times are normal.
Possibly the data is collected in batches of a certain number x. When you request x+1 rows, it would have to take two batches and then drop a certain number.
Try running your test with increments of 1 as the Take() parameter, to confirm or dismiss this assumption.

Resources