Change Feed Processor Lib does not honour ChangeFeedProcessorOptions FeedPollDelay / CheckPointFrequency - azure

I am following this sample code (https://github.com/Azure/azure-documentdb-changefeedprocessor-dotnet#example) to register an observer to process change feed in cosmos db collection.
I am creating new documents in the cosmos db collection using a utility (say create 400 documents within a for loop).
I am using using FeedPollDelay of 30 seconds. But it doesn't seem to be honoured by CFP lib. ProcessChangesAsync method gets invoked repeatedly even before feed poll delay interval expires.
In the first batch, around 60 docs are retrieved and in the second batch around 20 docs are retrieved, in the third batch around 100 docs are retrieved.
DocumentCollectionInfo feedCollectionInfo = new DocumentCollectionInfo()
{
DatabaseName = databaseName,
CollectionName = monitoredCollectionName,
Uri = new Uri(uri),
MasterKey = masterKey
};
DocumentCollectionInfo leaseCollectionInfo = new DocumentCollectionInfo()
{
DatabaseName = databaseName,
CollectionName = leaseCollectionName,
Uri = new Uri(uri),
MasterKey = masterKey
};
ChangeFeedProcessorOptions feedProcessorOptions = new ChangeFeedProcessorOptions()
{
FeedPollDelay = TimeSpan.FromSeconds(30)
//LeasePrefix = Guid.NewGuid().ToString(),
//MaxItemCount = 100
};
ChangeFeedProcessorBuilder builder = new ChangeFeedProcessorBuilder();
processor = await builder
.WithHostName(hostName)
.WithFeedCollection(feedCollectionInfo)
.WithLeaseCollection(leaseCollectionInfo)
.WithProcessorOptions(feedProcessorOptions)
.WithObserver<LiveWorkItemChangeFeedObserver>()
.BuildAsync();
await processor.StartAsync();
Receiving 60 docs in first batch is fine. But I am expecting the second batch to be invoked with rest 340 docs in a single batch after the feed poll delay (30 seconds) interval expires.
But ProcessChangesAsync method gets triggered frequently and this option is not being honoured.

FeedPollDelay is used when the Change Feed Processor reads the Change Feed and finds no new changes, not in-between each batch.
Example flow:
CFP polls for changes, finds X.
ProcessChangesAsync is called with X
After ProcessChangesAsync finishes, CFP immediately polls for changes, finds Y.
ProcessChangesAsync is called with Y.
After ProcessChangesAsync finishes, CFP immediately polls for changes, finds nothing, waits FeedPollDelay.
CFP polls for changes, finds Z.
ProcessChangesAsync is called with Z
After ProcessChangesAsync finishes, CFP immediately polls for changes, finds nothing, waits FeedPollDelay.
Etc….

Related

Why does transact and wait behave differently when within a function is behaving different?

Within my project I intend to send large volumes of transactions therefore for simplicity I am building a wrapper function for the following functions to be executed together: contractName.functions.functionName(params).transact() and w3.eth.wait_for_transaction(tx_hash). However when I write the functions transact_and_wait with the above implemented within in the transactions do not get executed!
Implementation of Transact and wait
def transact_and_wait(contract_function, transaction_params= {"gas": 100000}):
# Send the transaction
if transaction_params != {"gas": 100000}:
transaction_params["gas"] = 100000
transaction_hash = contract_function.transact(transaction_params)
# Wait for the transaction to be mined
transaction_receipt = w3.eth.wait_for_transaction_receipt(transaction_hash)
return transaction_receipt
Where it is called via: Transact_and_wait(contractName.functions.functionName(account.address))
For example this should set the a role for a user defined via index 1
However when I call. print(contractName.functions.stateVariable(account.address).call()) it returns 0
If i do the same process above but not within a functions:
tx_hash = contractName.functions.functionName(account.address).transact({"gas": 100000}))
transaction_receipt = w3.eth.wait_for_transaction_receipt(tx_hash)
Then I can call the same getter: print(contractName.functions.stateVariable(account.address).call()
It returns 1.

Seperating AioRTC datachannel into multiple threads

I have a two-way datachannel setup that takes a heartbeat from a browser client and keeps the session alive as long as the heartbeat stays. The heartbeat is the 'main' communication for WebRTC, but I have other bits of into (Such as coordinates) I need to send constantly.
To do this when a webrtc offer is given, it takes that HTTP request:
Creates a new event loop 'rtcloop'
Set's that as the main event loop.
Then run 'rtcloop' until complete, calling my webRtcStart function and passing through the session info.
Then run a new thread with the target being 'rtcloop', run it forever and start.
Inside the new thread I set the loop with 'get_event_loop' and later define ' #webRtcPeer.on("datachannel")' so when we get a Datachannel message, we run code around that. Depending on the situation, I attempt to do the following:
ptzcoords = 'Supported' #PTZ Coords will be part of WebRTC Communication, send every 0.5 seconds.
ptzloop = asyncio.new_event_loop()
ptzloop.run_until_complete(updatePTZReadOut(webRtcPeer, cameraName, loop))
ptzUpdateThread = Thread(target=ptzloop.run_forever)
ptzUpdateThread.start()
The constant error I get no matter how I structure things is "coroutine 'updatePTZReadOut' was never awaited"
With updatePTZReadOut being:
async def updatePTZReadOut(rtcPeer, cameraName, eventLoop):
# Get Camera Info
# THE CURRENT ISSUE I am having is with the event loops, because this get's called to run in another thread, but it still needs
# to be awaitable,
# Current Warning Is: /usr/lib/python3.10/threading.py:953: RuntimeWarning: coroutine 'updatePTZReadOut' was never awaited
# Ref Article: https://xinhuang.github.io/posts/2017-07-31-common-mistakes-using-python3-asyncio.html
# https://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio/
# Get current loop
# try:
loop = asyncio.set_event_loop(eventLoop)
# loop.run_until_complete()
# except RuntimeError:
# loop = asyncio.new_event_loop()
# asyncio.set_event_loop(loop)
# Getting Current COORDS from camera
myCursor.execute("Select * from localcameras where name = '{0}' ".format(cameraName))
camtuple = myCursor.fetchall()
camdata = camtuple[0]
# Create channel object
channel_local = rtcPeer.createDataChannel("chat")
while True:
ptzcoords = readPTZCoords(camdata[1], camdata[3], cryptocode.decrypt(str(camdata[4]), passwordRandomKey))
print("Updating Coords to {0}".format(ptzcoords))
# Publish Here
await channel_local.send("TTTT")
asyncio.sleep(0.5)
Any help here?
updatePTZReadOut is async function. You need to add await whenever you call this function.

Chainlink node CRON job. How does it get paid?

these are the docs
https://docs.chain.link/docs/jobs/types/cron/
type = "cron"
schemaVersion = 1
schedule = "CRON_TZ=UTC * */20 * * * *"
externalJobID = "0EEC7E1D-D0D2-476C-A1A8-72DFB6633F01"
observationSource = """
fetch [type="http" method=GET url="https://chain.link/ETH-USD"]
parse [type="jsonparse" path="data,price"]
multiply [type="multiply" times=100]
fetch -> parse -> multiply
"""
But what I am wondering is how the job connects to the Oracle contract. How does it connect with the user contract to get paid. Where and how do we send the data once job is complete at specified increment.
Does the job start when the job is posted on the node side. Or does it start the clock once a user contract calls it?
Any help would be much appreciated. I trying to run through the types of jobs to familarize myself with the capbilities of a chainlink node.
A Chron job executes a job based on a chron defined schedule. This means it's triggered based on some condition that the Chainlink node evaluates, and not triggered externally via a smart contract. Because of this, the node isn't paid in LINK tokens for processing the request like it does for API calls, because it's initiating a request itself as opposed to receiving a request (and payment) from on-chain. A node can't initiate a job on its own and then expect to receive payment from a consuming contract, if you require such functionality then you can try to have some logic in the function called to withdraw some LINK. But be careful of who can call this function.
If you want to send data on-chain to a smart contract once the job is completed, you need to manually define an ethtx task at the end of the cron job.
Here's an extended version of your job that sends the result back to a function called someFunction at the contract deployed at address 0xa36085F69e2889c224210F603D836748e7dC0088. The data can be in any format, as long as the abi of the function matches what's being encoded in the job. ie if the function expects a bytes param, you need to ensure your encoding a bytes param, if it expects a uint param, then you need to encode a uint param. In this example, a bytes parameter is used
type = "cron"
schemaVersion = 1
name = "GET > bytes32 (cron)"
schedule = "CRON_TZ=UTC #every 1m"
observationSource = """
fetch [type="http" method=GET url="https://min-api.cryptocompare.com/data/price?fsym=ETH&tsyms=USD"]
parse [type="jsonparse" path="USD"]
multiply [type="multiply" times=100]
encode_response [type="ethabiencode"
abi="(uint256 data)"
data="{\\"data\\": $(multiply) }"]
encode_tx [type="ethabiencode"
abi="someFunction(bytes32 data)"
data="{ \\"data\\": $(encode_response) }"]
submit_tx [type="ethtx"
to="0x6495C9684Cc5702522A87adFd29517857FC99f45"
data="$(encode_tx)"]
fetch -> parse -> multiply -> encode_response -> encode_tx -> submit_tx
"""
And here's the consuming contract for the cron job above:
// SPDX-Lincense-Identifier: MIT
pragma solidity ^0.8.7;
contract Cron {
bytes32 public currentPrice;
function someFunction(bytes32 _price) public {
currentPrice = _price;
}
}
To answer your other question, the is active as soon as it's created on the node, and will start evaluating the triggering conditions for it to active a run, based on the schedule defined

RxJS - check array of observables with concurrency in interval

Working on a scheduler with RxJS that every second checks the array of jobs. When job is finished it is removed from array. I would like to run that with the .mergeAll(concurrency) parameter so for example there are only two jobs running at the same time.
Currently I have an workaround which can be seen here.
What I am trying is something like
Observable
.interval(1000)
.timeInterval()
.merge(...jobProcesses.map(job => Observable.fromPromise(startJob(job.id))))
.mergeAll(config.concurrency || 10)
.subscribe();
which obviously doesn't work. Any help would be appreciated.
From the comments, it seems you are simply trying to limit concurrency, and this interval stuff is just a detour. You should be able to get what you need with:
const Rx = require('rxjs/Rx')
let startTime = 0
const time = () => {
if (!startTime)
startTime = new Date().getTime()
return Math.round((new Date().getTime() - startTime) / 1000)
}
const jobs = new Rx.Subject() // You may additionally rate-limit this with bufferTime(x).concatAll()
const startJob = j => Rx.Observable.of(undefined).delay(j * 1000).map(() => time())
const concurrency = 2
time()
jobs
.bufferCount(concurrency)
.concatMap(buf => Rx.Observable.from(buf).flatMap(startJob))
.subscribe(x => console.log(x))
Rx.Observable.from([3, 1, 3]).subscribe(jobs)
// The last job is only processed after the first two are completed, so you see:
// 1
// 3
// 6
Note that this technically isn't squeezing out the maximum amount of concurrency possible, since it breaks the jobs up into constant batches. If your jobs have significantly uneven processing times, the longest job in the batch will delay pulling work from the next batch.

Akka: How to ensure that message has been received?

I have an actor Dispenser. What it does is it
dispenses some objects by request
listens to arriving new ones
Code follows
class Dispenser extends Actor {
override def receive: Receive = {
case Get =>
context.sender ! getObj()
case x: SomeType =>
addObj(x)
}
}
In real processing it doesn't matter whether 1 ms or even few seconds passed since new object was sent until the dispenser starts to dispense it, so there's no code tracking it.
But now I'm writing test for the dispenser and I want to be sure that firstly it receives new object and only then it receives a Get request.
Here's the test code I came up with:
val dispenser = system.actorOf(Props.create(classOf[Dispenser]))
dispenser ! obj
Thread.sleep(100)
val task = dispenser ? Get()
val result = Await.result(task, timeout)
check(result)
It satisfies one important requirement - it doesn't change original code. But it is
At least 100ms seconds slow even on very high performance boxes
Unstable and fails sometimes because 100 ms or any other constant doesn't provide any guaranties.
And the question is how to make a test that satisfies requirement and doesn't have cons above (neither any other obvious cons)
You can take out the Thread.sleep(..) and your test will be fine. Akka guarantees the ordering you need.
With the code
dispenser ! obj
val task = dispenser ? Get()
dispenser will process obj before Get deterministically because
The same thread puts obj then Get in the actor's mailbox, so they're in the correct order in the actor's mailbox
Actors process messages sequentially and one-at-a-time, so the two messages will be received by the actor and processed in the order they're queued in the mailbox.
(..if there's nothing else going on that's not in your sample code - routers, async processing in getObj or addObj, stashing, ..)
Akka FSM module is really handy for testing underlying state and behavior of the actor and does not require to change its implementation specifically for tests.
By using TestFSMRef one can get actors current state and and data by:
val testActor = TestFSMRef(<actors constructor or Props>)
testActor.stateName shouldBe <state name>
testActor.stateData shouldBe <state data>
http://doc.akka.io/docs/akka/2.4.1/scala/fsm.html

Resources