I'm trying to run a project using Hyperledger Fabric with a setup similar to the Fabcar example.
I'm surprised by the huge amount of time it takes to submit a transaction.
Just to make it simple and fully reproducible I measured the time needed submit the transaction createCar on the actual Fabcar project.
After setting up the network (startFabric.sh javascript) and enrolling the admin and a user, I run the invoke.js script. The whole script takes about 2.5 seconds!
As far as I can understand running the contract takes just few milliseconds. The same for sending the Transaction Proposal.
It mostly spend time having the eventHandler listening and waiting for the event (in the transaction.js library).
Is there a way of speeding up the process? I was expecting to be able to submit several transactions per second.
Short answer : decrease BatchTimeout in configtx.yaml
Long answer :
If you only run one query, like you describe, this is totaly normal.
If you look at your configtx.yaml, in the part 'Orderer', you can see this :
Orderer: &OrdererDefaults
# Orderer Type: The orderer implementation to start
# Available types are "solo" and "kafka"
OrdererType: solo
Addresses:
- orderer.example.com:7050
# Batch Timeout: The amount of time to wait before creating a batch
BatchTimeout: 2s
# Batch Size: Controls the number of messages batched into a block
BatchSize:
# Max Message Count: The maximum number of messages to permit in a batch
MaxMessageCount: 10
# Absolute Max Bytes: The absolute maximum number of bytes allowed for
# the serialized messages in a batch.
AbsoluteMaxBytes: 99 MB
# Preferred Max Bytes: The preferred maximum number of bytes allowed for
# the serialized messages in a batch. A message larger than the preferred
# max bytes will result in a batch larger than preferred max bytes.
PreferredMaxBytes: 512 KB
There are 2 important things :
BatchTimeout
MaxMessageCount
BatchTimeout define the maximum time before creating a block. This mean, when an invoke is made, the orderer compute the transaction and wait for 2 seconds before creating the block. This mean, each first transaction will take more than 2 seconds ! But if there is another invoke, lets say, at 1,5s after the first transaction, the second call will take less than 1s !
MaxMessageCount, speak for itself. It mean that if there are more than 10 invoke, a block will be created, even if the 2 seconds are not past. For example, 10 invoke in a row of 0.5s will result in a block creation in less than a second.
This settings are here to balance the load depending on your network. Let's say you have a low use application, with less than 10 tps (transaction per second), you can reduce the BatchTimeout to less than 2s to increased response time of invoke. If you have a hight tps, you can increased the MaxMessageCount to create larger block.
The others settings define the max size of a message.
Try to know how your network will be, simulate the estimated tps with a test case and tweak the parameters to find the configuration of your needs.
Related
How come the process rate can be greater than the input rate?
From my understanding, process rate is the rate by which spark can process arriving data, ie, the process capacity. If so, the process rate must be on average lower or equal to the input rate. If it is lower, we know we need more processing power, or rethink about trigger time.
I am basing my understanding on this blog post and common sense, but I might be wrong. I looking for the formal formula in the source code while writing this question, as well.
This is an example where the process rate is constantly greater than the input rate:
You can see that on averege we have 200-300 records being processed per sec, whereas we have 80-120 records arriving per sec.
Setup background: Spark 3.x reading from Kafka and writing to Delta.
Thank you all.
Process rate more than input rate could mean its processing much faster than input rate. i.e it could process 300-400 per sec although event rate is 100 per sec. For ex: lets say ~100 per sec is the input rate and spark is able to process 100 records within half a sec than it means it can process 100 more in the next half of the sec and on an average this would lead to ~200 process rate.
In screenshot attached it could be interpreted as
It can process ~3000 records within each batch(~200*~15s) with 15s processing time for each batch (based on ~15000 ms seen in latency chart) but its processing around ~1000 records within each batch with 15s processing time.
My Question is:
Q1. What do we mean by 'label 1, 2' and how 4 Peers are contributing to it?
Q2. What do we mean by label 3, when we compare it with 'send rate' ?
Q3. What is difference between label 3 and lable 5 and why there is much gap in memory utilization of both?
Q1: Lable 1 is the number of transactions that were successfully processed and written on the ledger. Lable 2 is the number of transactions that are being submitted every second. Lable 2 has nothing to do with the number of peers but the number of peers (and their processing power) contributes to this as if a peer fails to do its job (endorsement, verification, etc.) the transaction would fail therefore this number would be different.
Q2: Lable 3 represents the number of transaction that has been processed vs send rate which is the per second rate of the transactions submitted to the blockchain. e.g., in your Test1 the 49 transactions per second were submitted but only 47 transactions per second were processed hence the 2.4 seconds Max Latency (It is more complex than what I said.)
Q3: Lable 5 represents a peer which is in charge of running and verifying the smart contracts and probably endorsement as well (depending on your endorsement policy) but the label 3 is a world state database peer (more here: https://vitalflux.com/hyperledger-fabric-difference-world-state-db-transaction-logs/ ) and running smart contracts uses more resources.
What is the relationship between MaxMessageCount, AbsoluteMaxBytes, and PreferredMaxBytes?
A block in fabric consists of a MaxMessageCount number of transaction or PreferredMaxBytes?
What should be the value of these to get maximum throughput?
Max Message Count: The maximum number of transactions/messages to permit in block.
Absolute Max Bytes: The (strict) maximum number of bytes allowed for serialized transactions/messages in a block.
Preferred Max Bytes: The preferred maximum number of bytes allowed the serialized transactions/messages in a batch. A transaction/message larger than the preferred max bytes will result in a batch larger than preferred max bytes.
The criteria that is encountered first will be taken into consideration while the orderer cuts the block.
If you have a constantly flowing high number of transactions, then pack as many transactions as possible in a block to get max throughput. Otherwise tweak the BatchTimeout and MaxMessageCount to optimize your transaction throughput.
If you want to dig deep on this aspect refer to this research paper: https://arxiv.org/pdf/1805.11390.pdf
I need to increase the input rate per partition for my application and I have use .set("spark.streaming.kafka.maxRatePerPartition",100) for the config. The stream duration is 10s so I expect process 5*100*10=5000 messages for this batch. However, the input rate I received is just about 500. Can You suggest any modifications to increase this rate?
The stream duration is 10s so I expect process 5*100*10=5000 messages
for this batch.
That's not what the setting means. It means "how many elements each partition can have per batch", not per second. I'm going to assume you have 5 partitions, so you're getting 5 * 100 = 500. If you want 5000, set maxRatePerPartition to 1000.
From "Exactly-once Spark Streaming From Apache Kafka" (written by the Cody, the author of the Direct Stream approach, emphasis mine):
For rate limiting, you can use the Spark configuration variable
spark.streaming.kafka.maxRatePerPartition to set the maximum number of
messages per partition per batch.
Edit:
After #avrs comment, I looked inside the code which defines the max rate. As it turns out, the heuristic is a bit more complex than stated in both the blog post and the docs.
There are two branches. If backpressure is enabled alongside maxRate, then the maxRate is the minimum between the current backpressure rate calculated by the RateEstimator object and maxRate set by the user. If it isn't enabled, it takes the maxRate defined as is.
Now, after selecting the rate it always multiplies by the total batch seconds, effectively making this a rate per second:
if (effectiveRateLimitPerPartition.values.sum > 0) {
val secsPerBatch = context.graph.batchDuration.milliseconds.toDouble / 1000
Some(effectiveRateLimitPerPartition.map {
case (tp, limit) => tp -> (secsPerBatch * limit).toLong
})
} else {
None
}
Property fetches N messages from a partition per seconds. If I have M partition and batch interval is B, then total messages I can see in batch is N * M * B.
There are few things you should verify
Is your input rate is >500 for 10s.
Is kafka topic is properly partitioned.
I found that the number of Events/sec in my Streaming Application suddenly increases (see screenshot). I made sure that the sender doesn't send more data at these moments (it always sends 800 - 900 messages per TCP). I guess it has to do with the other times below (processing / delay), but why is this shown in the Events-count graph? And can someone explain what happens there more exactly? Thanks!
The number of events is directly related to the producer. This means your producer is increasing the send rate over time. At the moment of the screenshot, your mean received messages throughput was 953 msgs/s which is higher than the expected 850, taking the 800-900 range mentioned in the question as a normally distributed.
Check the producer for the particular reasons of this increase.