tableSvc.retrieveEntity retrieves previous version of Azure Table Storage data

tableSvc.retrieveEntity retrieves previous version of Azure Table Storage data - node.js

I have a waterfall of dialogs in Bot Framework SDK3,
each dialog does something, until it switches to dialog with tableSvc.retrieveEntity which correctly identifies a required to be retrieved entity (according to given PartionKey & RowKey) from Azure Table...
...but the entity which is retreived (I check it with console.log('Result') is outdated (one step [a few seconds, which pass during conversation of user with a bot] behind the actual data stored in Azure Tables - the real data which needs to be retrieved in this dialog...)
The Conversation is not closed yet (it will be later) - it is important to store and retrieve actual data at this stage...
How to get actual data in this dialog?

Well, for those of you who had similar problem...
I guess, it has to do with event loop of Node.js...
I'm not sure whether it is a bullet-proof solution, or a temporary 'hack',
but I put it like this and it works (when I try to use setTimeout 0 ms - it does not work for me, when I set it to 500ms - it works, so I guess 1000 ms could be a safe temporary hack..before I find better solution)..
If someone knows a better, more robust, solution, please, update this thread.
setTimeout( () => {
tableSvc.retrieveEntity('table', pkey, rkey, funcdtion(error, result, response) {
if(!error) {
var res1 = result.Data._;
console.log(res1); // Now it prints actual data stored in 'table' - which I really need, and not its previous (outdated) version
} else {
console.log('Some error happened...');
};
});
}, 1000);

Related

Firebase doc changes

thanks for your help, I am new to firebase, I am designing an application with Node.js, what I want is that every time it detects changes in a document, a function is invoked that creates or updates the file system according to the new structure of data in the firebase document, everything works fine but the problem I have is that if the document is updated with 2 or more attributes the makeBotFileSystem function is invoked the same number of times which brings me problems since this can give me performance problems or file overwriting problems since what I do is generate or update multiple files.
I would like to see how the change can be expected but wait until all the information in the document is finished updating, not attribute by attribute, is there any way? this is my code:
let botRef = firebasebotservice.db.collection('bot');
botRef.onSnapshot(querySnapshot => {
querySnapshot.docChanges().forEach(change => {
if (change.type === 'modified') {
console.log('bot-changes ' + change.doc.id);
const botData = change.doc.data();
botData.botId = change.doc.id;
//HERE I CREATE OR UPDATE FILESYSTEM STRUCTURE, ACCORDING Data changes
fsbotservice.makeBotFileSystem(botData);
}
});
});

The onSnapshot function will notify you anytime a document changes. If property changes are commited one by one instead of updating the document all at once, then you will receive multiple snapshots.
One way to partially solve the multiple snapshot thing would be to change the code that updates the document to commit all property changes in a single operation so that you only receive one snapshot.
Nonetheless, you should design the function triggered by the snapshot so that it can handle multiple document changes without breaking. Given that document updates will happen no matter if by single/multiple property changes your code should be able to handle those. IMHO the problem is the filesystem update rather than how many snaphots are received

You should use docChanges() method like this:
db.collection("cities").onSnapshot(querySnapshot => {
let changes = querySnapshot.docChanges();
for (let change of changes) {
var data = change.doc.data();
console.log(data);
}
});

Variable lifetime in Dialogflow's fulfillment editor?

I'm helping create a chatbot that pulls real estate data from a google sheet, and displays that information to a user. The user inputs an address, then the pulled information is assigned to a houseData variable that has been declared outside of the function with the http request. The houseData object has keys like bedrooms, bathrooms, monthlyPayment, address, etc, which is then referenced in the agent.add Payload for displaying the values for the keys to the user in the chat widget. This is all mapped to one intent.
Later on I try to access monthlyPayment of the houseData object in a function mapped to a different intent to see how a user's reported income compares to the payment for the house. The houseData variable is undefined, even though in firebase I can see the information is received from the google sheet in a console.log during the first intent.
Do variable values reset in the fulfillment editor after the intent which specific functions are mapped to is finished? How can I store persistent information in the editor that can be accessed throughout the conversation flow? Is this a situation where I should set a new context?
Thank you.
edit: I added a quick mockup of the code to help visualize the problem:
let houseData;
function sendHouseData(agent) {
let address = userInput
return getHouseData(address).then(agent.add(new Payload(
`Your house has ${houseData.bedrooms} bedrooms and ${houseData.bathrooms} bathrooms.`
))).catch(err);
}
function getHouseData(address) {
https.get(apiURL, (res) => {
res.on('end', () => {
houseData = parsedJSONData
})
})
}
// ------- LATER ------ //
let userData = {
income: '',
qualified: true
}
function qualify(agent) {
// user self reports their monthly income thru some quick reply options
if (houseData.monthlyPayment / userData.income < 0.39) { // houseData is undefined at this point
userData.qualified = false
}
}
intentMap.set(enterAddress, sendHouseData);
intentMap.set(qualify, qualify)

Global variables might be maintained across your Dialogflow requests, but it is not guaranteed. From the Cloud Functions Tips & Tricks:
There is no guarantee that the state of a Cloud Function will be preserved for future invocations. However, Cloud Functions often recycles the execution environment of a previous invocation. If you declare a variable in global scope, its value can be reused in subsequent invocations without having to be recomputed.
Reference: https://cloud.google.com/functions/docs/bestpractices/tips#use_global_variables_to_reuse_objects_in_future_invocations
Typically (to cache data for performance reasons) you would store all your data in Firestore/RTDB and if houseData is undefined then pull it out of the database, else you can use houseData directly for a cached copy. In this case though you probably want to keep a record of how old it is and refresh it accordingly.
However, if houseData is going to be different for each user/session then you may be forced to pull it from the database on every request. (Note that if you are running functions and database on Firebase it is going to be fast anyway.)

Mongo Change Streams running multiple times (kind of): Node app running multiple instances

My Node app uses Mongo change streams, and the app runs 3+ instances in production (more eventually, so this will become more of an issue as it grows). So, when a change comes in the change stream functionality runs as many times as there are processes.
How to set things up so that the change stream only runs once?
Here's what I've got:
const options = { fullDocument: "updateLookup" };
const filter = [
{
$match: {
$and: [
{ "updateDescription.updatedFields.sites": { $exists: true } },
{ operationType: "update" }
]
}
}
];
const sitesStream = Client.watch(sitesFilter, options);
// Start listening to site stream
sitesStream.on("change", async change => {
console.log("in site change stream", change);
console.log(
"in site change stream, update desc",
change.updateDescription
);
// Do work...
console.log("site change stream done.");
return;
});

It can easily be done with only Mongodb query operators. You can add a modulo query on the ID field where the divisor is the number of your app instances (N). The remainder is then an element of {0, 1, 2, ..., N-1}. If your app instances are numbered in ascending order from zero to N-1 you can write the filter like this:
const filter = [
{
"$match": {
"$and": [
// Other filters
{ "_id": { "$mod": [<number of instances>, <this instance's id>]}}
]
}
}
];

Doing this with strong guarantees is difficult but not impossible. I wrote about the details of one solution here: https://www.alechenninger.com/2020/05/building-kafka-like-message-queue-with.html
The examples are in Java but the important part is the algorithm.
It comes down to a few techniques:
Each process attempts to obtain a lock
Each lock (or each change) has an associated fencing token
Processing each change must be idempotent
While processing the change, the token is used to ensure ordered, effectively-once updates.
More details in the blog post.

It sounds like you need a way to partition updates between instances. Have you looked into Apache Kafka? Basically what you would do is have a single application that writes the change data to a partitioned Kafka Topic and have your node application be a Kafka consumer. This would ensure only one application instance ever receives an update.
Depending on your partitioning strategy, you could even ensure that updates for the same record always go to the same node app (if your application needs to maintain its own state). Otherwise, you can spread out the updates in a round robin fashion.
The biggest benefit to using Kafka is that you can add and remove instances without having to adjust configurations. For example, you could start one instance and it would handle all updates. Then, as soon as you start another instance, they each start handling half of the load. You can continue this pattern for as many instances as there are partitions (and you can configure the topic to have 1000s of partitions if you want), that is the power of the Kafka consumer group. Scaling down works in the reverse.

While the Kafka option sounded interesting, it was a lot of infrastructure work on a platform I'm not familiar with, so I decided to go with something a little closer to home for me, sending an MQTT message to a little stand alone app, and letting the MQTT server monitor messages for uniqueness.
siteStream.on("change", async change => {
console.log("in site change stream);
const mqttClient = mqtt.connect("mqtt://localhost:1883");
const id = JSON.stringify(change._id._data);
// You'll want to push more than just the change stream id obviously...
mqttClient.on("connect", function() {
mqttClient.publish("myTopic", id);
mqttClient.end();
});
});
I'm still working out the final version of the MQTT server, but the method to evaluate uniqueness of messages will probably store an array of change stream IDs in application memory, as there is no need to persist them, and evaluate whether to proceed any further based on whether that change stream ID has been seen before.
var mqtt = require("mqtt");
var client = mqtt.connect("mqtt://localhost:1883");
var seen = [];
client.on("connect", function() {
client.subscribe("myTopic");
});
client.on("message", function(topic, message) {
context = message.toString().replace(/"/g, "");
if (seen.indexOf(context) < 0) {
seen.push(context);
// Do stuff
}
});
This doesn't include security, etc., but you get the idea.

Will that having a field in DB called status which will be updated using findAnUpdate based on the event received from change stream. So lets say you get 2 events at the same time from change stream. First event will update the status to start and the other will throw error if status is start. So the second event will not process any business logic.

I'm not claiming those are rock-solid production grade solutions, but I believe something like this could work
Solution 1
applying Read-Modify-Write:
Add version field to the document, all the created docs have version=0
Receive ChangeStream event
Read the document that needs to be updated
Perform the update on the model
Increment version
Update the document where both id and version match, otherwise discard the change
Yes, it creates 2 * n_application_replicas useless queries, so there is another option
Solution 2
Create collection of ResumeTokens in mongo which would store collection -> token mapping
In the changeStream handler code, after successful write, update ResumeToken in the collection
Create a feature toggle that will disable reading ChangeStream in your application
Configure only a single instance of your application to be a "reader"
In case of "reader" failure you might either enable reading on another node, or redeploy the "reader" node.
As a result: there might be an infinite amount of non-reader replicas and there won't be any useless queries

wit.ai runActions how to handle context in follow-up message

I'm using node-wit to develop a chatbot application.
This is working fine mostly, but I've run into a problem with the use of the context.
I'm using the runActions api :
this.witClient.runActions(customer._key, messageText, witContext).then((newContext => {}
)).catch(reject);
I have defined a number of actions, which set the context.
This is working fine, as long the context is taking place over one message.
For example, if I were to call an action called addProduct :
addProduct({sessionId, context, text, entities}) {
return new Promise((resolve, reject) => {
context.product = `myNewProduct';
resolve(context);
});
},
It will then show a message using the 'product' context key.
However, when I try to use it over 2 messages, it seems to have lost the context ( for example, when asking a multiple choice question, and then handling that response ).
If I understand how it's working correctly, then node-wit doesn't keep the context beyond messages ( I assumed this at first because I'm passing a session key ).
A solution I see is to store the resulting context ( newContext in this case) in a session/user specific way, and then restore it and pass it again when the user is sending his new message.
Meaning, something like this :
witContext = getContextFromSession();
this.witClient.runActions(customer._key, messageText, witContext).then((newContext => { setContextInSession(newContext) }
)).catch(reject);
Would this be the correct way of handling it ?

Off course you have to store your context state, you decide how to store it. But, take into account what is the most efficient way if you're gonna have a lot of users, and your reasources available.
As you can see in the official example for nodeJs, there's a method named findOrCreateSession on https://github.com/wit-ai/node-wit/blob/master/examples/messenger.js they get the session before the wit actions are called.
In my particular case, I am storing it in the database, so I get the session before the action is called, so I can send the context, then in the actions I query the session again to modify the resulting context and store it again, try the best implementation for your needs.

range.address throws context related errors

We've been developing using Excel JavaScript API for quite a few months now. We have been coming across context related issues which got resolved for unknown reasons. We weren't able to replicate these issues and wondered how they got resolved. Recently similar issues have started popping up again.
Error we consistently get:
property 'address' is not available. Before reading the property's
value, call the load method on the containing object and call
"context.sync()" on the associated request context.
We thought as we have multiple functions defined to modularise code in project, may be context differs somewhere among these functions which has gone unnoticed. So we came up with single context solution implemented via JavaScript Module pattern.
var ContextManager = (function () {
var xlContext;//single context for entire project/application.
function loadContext() {
xlContext = new Excel.RequestContext();
}
function sync(object) {
return (object === undefined) ? xlContext.sync() : xlContext.sync(object);
}
function getWorksheetByName(name) {
return xlContext.workbook.worksheets.getItem(name.toString());
}
//public
return {
loadContext: loadContext,
sync: sync,
getWorksheetByName: getWorksheetByName
};
})();
NOTE: above code shortened. There are other methods added to ensure that single context gets used throughout application.
While implementing single context, this time round, we have been able to replicate the issue though.
Office.initialize = function (reason) {
$(document).ready(function () {
ContextManager.loadContext();
function loadRangeAddress(rng, index) {
rng.load("address");
ContextManager.sync().then(function () {
console.log("Address: " + rng.address);
}).catch(function (e) {
console.log("Failed address for index: " + index);
});
}
for (var i = 1; i <= 1000; i++) {
var sheet = ContextManager.getWorksheetByName("Sheet1");
loadRangeAddress(sheet.getRange("A" + i), i);//I expect to see a1 to a1000 addresses in console. Order doesn't matter.
}
});
}
In above case, only "A1" gets printed as range address to console. I can't see any of the other addresses (A2 to A1000)being printed. Only catch block executes. Can anyone explain why this happens?
Although I've written for loop above, that isn't my use case. In real use case, such situations occur where one range object in function a needs to load range address. In the mean while another function b also wants to load range address. Both function a and function b work asynchronously on separate tasks such as one creates table object (table needs address) and other pastes data to sheet (there's debug statement to see where data was pasted).
This is something our team hasn't been able to figure out or find a solution for.

There is a lot packed into this code, but the issue you have is that you're calling sync a whole bunch of times without awaiting the previous sync.
There are several problems with this:
If you were using different contexts, you would actually see that there is a limit of ~50 simultaneous requests, after which you'll get errors.
In your case, you're running into a different (and almost opposite) problem. Given the async nature of the APIs, and the fact that you're not awaiting on the sync-s, your first sync request (which you'd think is for just A1) will actually contain all the load requests from the execution of the entire for loop. Now, once this first sync is dispatched, the action queue will be cleared. Which means that your second, third, etc. sync will see that there is no pending work, and will no-op, executing before the first sync ever came back with the values!
[This might be considered a bug, and I'll discuss with the team about fixing it. But it's still a very dangerous thing to not await the completion of a sync before moving on to the next batch of instructions that use the same context.]
The fix is to await the sync. This is far and away the simplest to do in TypeScript 2.1 and its async/await feature, otherwise you need to do the async version of the for loop, which you can look up, but it's rather unintuitive (requires creating an uber-promise that keeps chaining a bunch of .then-s)
So, your modified TypeScript-ified code would be
ContextManager.loadContext();
async function loadRangeAddress(rng, index) {
rng.load("address");
await ContextManager.sync().then(function () {
console.log("Address: " + rng.address);
}).catch(function (e) {
OfficeHelpers.Utilities.log(e);
});
}
for (var i = 1; i <= 1000; i++) {
var sheet = ContextManager.getWorksheetByName("Sheet1");
await loadRangeAddress(sheet.getRange("A" + i), i);//I expect to see a1 to a1000 addresses in console. Order doesn't matter.
}
Note the async in front of the loadRangeAddress function, and the two await-s in front of ContextManager.sync() and loadRangeAddress.
Note that this code will also run quite slowly, as you're making an async roundtrip for each cell. Which means you're not using batching, which is at the very core of the object-model for the new APIs.
For completeness sake, I should also note that creating a "raw" RequestContext instead of using Excel.run has some disadvantages. Excel.run does a number of useful things, the most important of which is automatic object tracking and un-tracking (not relevant here, since you're only reading back data; but would be relevant if you were loading and then wanting to write back into the object).
Finally, if I may recommend (full disclosure: I am the author of the book), you will probably find a good bit of useful info about Office.js in the e-book "Building Office Add-ins using Office.js", available at https://leanpub.com/buildingofficeaddins. In particular, it has a very detailed (10-page) section on the internal workings of the object model ("Section 5.5: Implementation details, for those who want to know how it really works"). It also offers advice on using TypeScript, has a general Promise/async-await primer, describes what .run does, and has a bunch more info about the OM. Also, though not available yet, it will soon offer information on how to resume using the same context (using a newer technique than what was originally described in How can a range be used across different Word.run contexts?). The book is a lean-published "evergreen" book, son once I write the topic in the coming weeks, an update will be available to all existing readers.
Hope this helps!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

tableSvc.retrieveEntity retrieves previous version of Azure Table Storage data - node.js

Related

Firebase doc changes

Variable lifetime in Dialogflow's fulfillment editor?

Mongo Change Streams running multiple times (kind of): Node app running multiple instances

wit.ai runActions how to handle context in follow-up message

range.address throws context related errors

Categories

Resources