Is there a good/right way to bulk update a shared matrix in Fluid - fluid-framework

We have an application that uses Fluid Framework where we have our data in a grid format. In order to enable collaboration on the grid, we had the data stored in a SharedMatrix. One of the issues we are having now is that the grid data can also be updated with an import. When the data is imported we update the shared matrix with a call similar to the following:
this.sharedMatrix.setCells(rowStart, 0, colCount, data);
We have a matrix consumer which is handles the events that occur. However when we get alot of data updates (ie over 3000 value changes in the data variable) we end up getting errors like:
ContainerClose {"category":"error","error":"Runtime detected too many reconnects with no progress syncing local ops","errorType":"genericError","fluidErrorCode":"-","attempts":15,"message":"Runtime detected too many reconnects with no progress syncing local ops","errorInstanceId":"5021c069-c811-42ce-9522-bf98b61ab032","clientType":"interactive","containerId":"85ea5d8a-44ae-4194-9bfe-2043e4a64569","docId":"045e982f-c69f-4b9d-bbcc-418a5c6db082","containerAttachState":"Attached","containerLifecycleState":"closing","containerConnectionState":"Disconnected","dmInitialSeqNumber":97,"dmLastProcessedSeqNumber":146,"dmLastKnownSeqNumber":146,"containerLoadedFromVersionId":"f6293302ea7ef8ad4269c5f936d65a1159f65268","containerLoadedFromVersionDate":"2022-04-22T22:52:13.000Z","dmLastMsqSeqNumber":146,"dmLastMsqSeqTimestamp":1650667974489,"dmLastMsqSeqClientId":null,"connectionStateDuration":3.3999999994412065,"loaderId":"022ecce4-0fe1-4f50-8e9e-33f17bceb5c5","loaderVersion":"0.58.2000"} tick=12031 Error at ContainerRuntime.setConnectionState (http://localhost:4200/src_app_main_main_module_ts.js:32011:22) at ContainerContext.setConnectionState (http://localhost:4200/src_app_main_main_module_ts.js:27682:13) at Container.propagateConnectionState (http://localhost:4200/src_app_main_main_module_ts.js:27284:20) at Object.connectionStateChanged (http://localhost:4200/src_app_main_main_module_ts.js:25993:16) at ConnectionStateHandler.setConnectionState (http://localhost:4200/src_app_main_main_module_ts.js:25674:18) at ConnectionStateHandler.receivedDisconnectEvent (http://localhost:4200/src_app_main_main_module_ts.js:25589:10) at DeltaManager. (http://localhost:4200/src_app_main_main_module_ts.js:27188:35) at DeltaManager.emit (http://localhost:4200/polyfills.js:416:7) at DeltaManager.disconnectHandler (http://localhost:4200/src_app_main_main_module_ts.js:28582:10) at Object.disconnectHandler (http://localhost:4200/src_app_main_main_module_ts.js:28115:41) +0ms
Is there a good way to handle these kinds of bulk data updates?

Related

Tracking a counter value in application insights

I'm trying to use application insights to keep track of a counter of number of active streams in my application. I have 2 goals to achieve:
Show the current (or at least recent) number of active streams in a dashboard
Activate a kind of warning if the number exceeds a certain limit.
These streams can be quite long lived, and sometimes brief. So the number can sometimes change say 100 times a second, and sometimes remain unchanged for many hours.
I have been trying to track this active streams count as an application insights metric.
I'm incrementing a counter in my application when a new stream opens, and decrementing when one closes. On each change I use the telemetry client something like this
var myMetric = myTelemetryClient.GetMetric("Metricname");
myMetric.TrackValue(myCount);
When I query my metric values with Kusto, I see that because of these clusters of activity within a 10 sec period, my metric values get aggregated. For the purposes of my alarm, I can live with that, as I can look at the max value of the aggregate. But I can't present a dashboard of the number of active streams, as I have no way of knowing the number of active streams between my measurement points. I know the min value, max and average, but I don't know the last value of the aggregate period, and since it can be somewhere between 0 and 1000, its no help.
So the solution I have doesn't serve my needs, I thought of a couple of changes:
Adding a scheduled pump to my counter component, which will send the current counter value, once every say 5 minutes. But I don't like that I then have to add a thread for each of these counters.
Adding a timer to send the current value once, 5 minutes after the last change. Countdown gets reset each time the counter changes. This has the same problem as above, and does an excessive amount of work to reset the counter when it could be changing thousands of times a second.
In the end, I don't think my needs are all that exotic, so I wonder if I'm using app insights incorrectly.
Is there some way I can change the metric's behavior to suit my purposes? I appreciate that it's pre-aggregating before sending data in order to reduce ingest costs, but it's preventing me from solving a simple problem.
Is a metric even the right way to do this? Are there alternative approaches within app insights?
You can use TrackMetric instead of the GetMetric ceremony to track individual values withouth aggregation. From the docs:
Microsoft.ApplicationInsights.TelemetryClient.TrackMetric is not the preferred method for sending metrics. Metrics should always be pre-aggregated across a time period before being sent. Use one of the GetMetric(..) overloads to get a metric object for accessing SDK pre-aggregation capabilities. If you are implementing your own pre-aggregation logic, you can use the TrackMetric() method to send the resulting aggregates.
But you can also use events as described next:
If your application requires sending a separate telemetry item at every occasion without aggregation across time, you likely have a use case for event telemetry; see TelemetryClient.TrackEvent (Microsoft.ApplicationInsights.DataContracts.EventTelemetry).

Getting Multiple Last Price Quotes from Interactive Brokers's API

I have a question regarding the Python API of Interactive Brokers.
Can multiple asset and stock contracts be passed into reqMktData() function and obtain the last prices? (I can set the snapshots = TRUE in reqMktData to get the last price. You can assume that I have subscribed to the appropriate data services.)
To put things in perspective, this is what I am trying to do:
1) Call reqMktData, get last prices for multiple assets.
2) Feed the data into my prediction engine, and do something
3) Go to step 1.
When I contacted Interactive Brokers, they said:
"Only one contract can be passed to reqMktData() at one time, so there is no bulk request feature in requesting real time data."
Obviously one way to get around this is to do a loop but this is too slow. Another way to do this is through multithreading but this is a lot of work plus I can't afford the extra expense of a new computer. I am not interested in either one.
Any suggestions?
You can only specify 1 contract in each reqMktData call. There is no choice but to use a loop of some type. The speed shouldn't be an issue as you can make up to 50 requests per second, maybe even more for snapshots.
The speed issue could be that you want too much data (> 50/s) or you're using an old version of the IB python api, check in connection.py for lock.acquire, I've deleted all of them. Also, if there has been no trade for >10 seconds, IB will wait for a trade before sending a snapshot. Test with active symbols.
However, what you should do is request live streaming data by setting snapshot to false and just keep track of the last price in the stream. You can stream up to 100 tickers with the default minimums. You keep them separate by using unique ticker ids.

Solution for delaying events for N days

We're currently writing an application in Microsoft Azure and we're planning to use Event Hubs to handle processing of real time events.
However, after an initial processing we will have to delay further processing of the events for N number of days. The process will work like this:
Event triggered -> Place event in Event Hub -> Event gets fetched from Event Hub and processed -> Event should be delay for X days -> Event gets' further processed (two last steps might be a loop)
How can we achieve this delay of further event processing without using polling or similar strategies. One idea is to use Azure Queues and their visibility timeout, but 7 days is the supported maximum according to the documentation and our business demands are in the 1-3 months maximum range. Number of events in our system should be max 10k per day.
Any ideas would be appreciated, thanks!
As you already mentioned - EventHubs supports only 7 days window of data to be retained.
Event Hubs are typically used as real-time telemetry data pipe-lines where data seek performance is critical. For 99.9% usecases/scenarios our users typically require last couple of hours, if not seconds.
However, after the real-time processing is over, and If you still need to re-analyze the data after a while, for ex: run a Hadoop job on last months data - our seek pattern & store are not optimized for it. We recommend to forward the messages to other data archival stores which are specialized for big-data queries.
As - data archival is an ask that most of our customers naturally look for - we are releasing a new feature which automatically archives the data in AVRO format into Azure storage.

Azure ServiceBus Eventhub, is the "offset" still available/durable when some of event data is expired?

When I write some code to test the EventHub which is a newly released on azure service bus.
As there is very few article online and msdn also do not have rich documentation about the detail of event hub. So I hope someone could share your experience for my question.
For EventHub, we have following statement:
we use "offset" to remember where we are when reading the event data from some partition
the event data on the EventHub would be expired (automatically?) after some configurable time span
So my question is, can the offset still be available/durable when some of the event data is deleted as the result of expiration?
For example, we have following data on one of partition:
M1 | M2 | M3 | M4 ( oldest --> latest )
After my processing logic runs, let's say that I have processed M1 and M2, so the offset would be the start of M2(when use exclusive mode).
After some time, and if my service is down during that time. M1 is deleted as the result of expiration. so the partition would become:
M2 | M3 | M4 | M.... ( oldest -> latest )
In this case, when my server is restart again, is the offset i stored before is still be available to be used to read from M3?
We can also image this case on runtime when my consumer server is reading the event data on eventhub when some of the oldest event data is expired, does the offset still be available on runtime?
Thanks for any sharing of this question.
Based upon how various parts of the documentation is written I believe you will start from the beginning of the current stream as desired if your starting offset is no longer available. EventProcessorHost should follow similar restrictions. Since the sequence numbers are 64 bits, I would expect one of those to be able to serve as an offset within a partition since they monotonically increase without being recycled. The offset should have a similar property. So if EventHubs are designed in a reasonable fashion (ie like similar solutions), then the offsets within partitions can hold despite data expiration. But since I have not yet tested this myself, I will be very unhappy if it is not so, and I'd expect an Azure person to be able to give true confirmation.

C# 2 instances of same app reading from same SQL table, each row processed once

I'm writing a Windows Service in C# in .Net 4.0 to forfill the following functionality:
At a set time every night the app connects to SQL Server, opens a User table and for each record retrieves the user's IP address, does a WCF call to the user's PC to determine if it's available for transacting and inserts a record into a State table (with y/n and the error if there is one).
Once all users have been proccessed the app then reads each record in the State table where IsPcAvailable = true, retrieves a list of reports for that user from another table and for each report fetches the report from the Enterprise doc repository, calls the user's PC via WCF and pushes the report onto their harddrive, then updates the state table to its success.
The above senario is easy enough to code if single threaded running on 1 app server; but due to redundancy & performance there will be at least 2 app servers doing exactly the same thing at the same time.
So how do I make sure that each user is processed only once in firstly the User table then the State table (same problem) as fetching the reports and pushing them out to PCs all across the country is a lengthy process. And optimally the app should be multithreaded, so for example, having 10 threads running on 2 servers processing all the users.
I would prefer a C# solution as I'm not a DataBase guru :) The closest I've found to my problem is:
SQL Server Process Queue Race Condition - it uses SQL code
and multithreading problems with the entity framework, I'm probally going to have to go 1 layer down and use ADO.net?
I would recommend using the techniques at http://rusanu.com/2010/03/26/using-tables-as-queues/ That's an excellent read for you at this time.
Here is some sql for a fifo
create procedure usp_dequeueFifo
as
set nocount on;
with cte as (
select top(1) Payload
from FifoQueue with (rowlock, readpast)
order by Id)
delete from cte
output deleted.Payload;
go
And one for a heap (order does not matter)
create procedure usp_dequeueHeap
as
set nocount on;
delete top(1) from HeapQueue with (rowlock, readpast)
output deleted.payload;
go
This reads so beautifully its almost poetry
You could simply just have each application server polling a common table (work_queue). You can use a common table expression to read/update the row so you don't have them stepping on each other.
;WITH t AS
(
SELECT TOP 1 *
FROM work_queue WHERE NextRun <= GETDATE()
AND IsPcAvailable = 1
)
UPDATE t WITH (ROWLOCK, READPAST)
SET
IsProcessing = 1,
MachineProcessing = 'TheServer'
OUTPUT INSERTED.*
Now you could have a producer thread in your application checking for unprocessed records periodically. Once that thread finishes it's work, it pushes the item in to a ConcurrentQueue and consumer threads can process the work as it's available. You can set the number of consumer threads yourself to the optimum level. Once the consumer threads are done, it simply sets IsProcessing = 0 as to show that the PC was updated.

Resources