I'm working on a proxy that caches files and I'm trying to add some logic that prevents multiple clients from downloading the same files before the proxy has a chance to cache them.
Basically, the logic I'm trying to implement is the following:
Client 1 requests a file. The proxy checks if the file is cached. If it's not, it requests it from the server, caches it, then sends it to the client.
Client 2 requests the same file after client 1 requested it, but before the proxy has a chance to cache it. So the proxy will tell client 2 to wait a few seconds because there is already a download in progress.
A better approach would probably be to give client 2 a "try again later" message, but let's just say that's currently not an option.
I'm using Nodejs with the anyproxy library. According to the documentation, delayed responses are possible by using promises.
However, I don't really see a way to achieve what I want using Promises. From what I can tell, I could do something like this:
module.exports = {
*beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
return new Promise((resolve, reject) => {
setTimeout(() => { // delay
resolve({ response: responseDetail.response });
}, 10000);
});
}
}
};
But that would mean simply waiting for a maximum amount of time and hoping the download finishes by then.
And I don't want that.
I would prefer to be able to do something like this (but with Promises, somehow):
module.exports = {
*beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
var i = 0;
for(i = 0 ; i < 10 ; i++) {
JustSleep(1000);
if(!thereIsADownloadInProgressFor(requestDetail.url))
return { response: responseDetail.response };
}
}
}
};
Is there any way I can achieve this with Promises in Nodejs?
Thanks!
You can use a Map to cache your file downloads.
The mapping in Map would be url -> Promise { file }
// Map { url => Promise { file } }
const cache = new Map()
const thereIsADownloadInProgressFor = url => cache.has(url)
const getCachedFilePromise = url => cache.get(url)
const downloadFile = async url => {/* download file code here */}
const setAndReturnCachedFilePromise = url => {
const filePromise = downloadFile(url)
cache.set(url, filePromise)
return filePromise
}
module.exports = {
beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
return getCachedFilePromise(requestDetail.url).then(file => ({ response: file }))
} else {
return setAndReturnCachedFilePromise(requestDetail.url).then(file => ({ response: file }))
}
}
};
You don't need to send a try again response, simply serve the same data to both requests. All you need to do is store the requests somewhere in the caching system and trigger all of them when the fetching is done.
Here's a cache implementation that does only a single fetch for multiple requests. No delays and no try-laters:
export class class Cache {
constructor() {
this.resultCache = {}; // this object is the cache storage
}
async get(key, cachedFunction) {
let cached = this.resultCache[key];
if (cached === undefined) { // No cache so fetch data
this.resultCache[key] = {
pending: [] // This is the magic, store further
// requests in this pending array.
// This way pending requests are directly
// linked to this cache data
}
try {
let result = await cachedFunction(); // Wait for result
// Once we get result we need to resolve all pending
// promises. Loop through the pending array and
// resolve them. See code below for how we store pending
// requests.. it will make sense:
this.resultCache[key].pending
.forEach(waiter => waiter.resolve(result));
// Store the result of the cache so later we don't
// have to fetch it again:
this.resultCache[key] = {
data: result
}
// Return result to original promise:
return result;
// Note: yes, this means pending promises will get triggered
// before the original promise is resolved but normally
// this does not matter. You will need to modify the
// logic if you want promises to resolve in original order
}
catch (err) { // Error when fetching result
// We still need to trigger all pending promises to tell
// them about the error. Only we reject them instead of
// resolving them:
if (this.resultCache[key]) {
this.resultCache[key].pending
.forEach((waiter: any) => waiter.reject(err));
}
throw err;
}
}
else if (cached.data === undefined && cached.pending !== undefined) {
// Here's the condition where there was a previous request for
// the same data. Instead of fetching the data again we store
// this request in the existing pending array.
let wait = new Promise((resolve, reject) => {
// This is the "waiter" object above. It is basically
// It is basically the resolve and reject functions
// of this promise:
cached.pending.push({
resolve: resolve,
reject: reject
});
});
return await wait; // await response form original request.
// The code above will cause this to return.
}
else {
// Return cached data as normal
return cached.data;
}
}
}
The code may look a bit complicated but it is actually quite simple. First we need a way to store the cached data. Normally I'd just use a regular object for this:
{ key : result }
Where the cached data is stored in the result. But we also need to store additional metadata such as pending requests for the same result. So we need to modify our cache storage:
{ key : {
data: result,
pending: [ array of requests ]
}
}
All this is invisible and transparent to code using this Cache class.
Usage:
const cache = new Cache();
// Illustrated with w3c fetch API but you may use anything:
cache.get( URL , () => fetch(URL) )
Note that wrapping the fetch in an anonymous function is important because we want the Cache.get() function to conditionally call the fetch to avoid multiple fetch being called. It also gives the Cache class flexibility to handle any kind of asynchronous operation.
Here's another example for caching a setTimeout. It's not very useful but it illustrates the flexibility of the API:
cache.get( 'example' , () => {
return new Promise((resolve, reject) => {
setTimeout(resolve, 1000);
});
});
Note that the Cache class above does not have any invalidations or expiry logic for the sake of clarity but it's fairly easy to add them. For example if you want the cache to expire after some time you can just store the timestamp along with the other cache data:
{ key : {
data: result,
timestamp: timestamp,
pending: [ array of requests ]
}
}
Then in the "no-cache" logic simply detect the expiry time:
if (cached === undefined || (cached.timestamp + timeout) < now) ...
Related
Problem:
front-end page make x parallel requests (let's call it first group),
the next group (x request) will be after 5 seconds, the first request (of the first group) set the cache from DB.
the other x-1 requests got empty array insted of wait to first request to done his job.
the second group and the all next requests got proper data from cache.
What is the best practics to lock other threads until the first done (or fail) in stateless mechanism?
EDIT:
The cache module allow use trigger of set chache but it's not work since it stateless mechanism.
const GetDataFromDB= async (req, res, next) => {
var cachedTableName = undefined;
// "lockFlag" uses to prevent parallel request to get into critical section (because its take time to set cache from db)
// to prevent that we uses "lockFlag" that is short-initiation to cache.
//
if ( !myCache.has( "lockFlag" ) && !myCache.has( "dbtable" ) ){
// here arrive first req from first group only
// the other x-1 of first group went to the nest condition
// here i would build mechanism to wait 'till first req come back from DB (init cache)
myCache.set( "lockFlag", "1" )
const connection1 = await odbc.connect(connectionConfig);
const cachedTableName = await connection1.query(`select * from ${tableName}`);
if(cachedTableName.length){
const success = myCache.set([
{key: "dbtable", val: cachedTableName, ttl: 180},
])
if(success)
{
cachedTableName = myCache.get( "dbtable" );
}
}
myCache.take("lockFlag");
connection1.close();
return res.status(200).json(cachedTableName ); // uses for first response.
}
// here comes x-1 of first group went to the nest condition and got nothing, bacause the cache not set yet
//
if ( myCache.has( "dbtable" ) ){
cachedTableName = myCache.get( "dbtable" );
}
return res.status(200).json(cachedTableName );
}
You can try the approach given here, with minor modifications to apply it for your case.
For brevity, I removed comments and shortened variables names.
Code, then explanation:
const EventEmitter = require('events');
const bus = new EventEmitter();
const getDataFromDB = async (req, res, next) => {
var table = undefined;
if (myCache.has("lockFlag")) {
await new Promise(resolve => bus.once("unlocked", resolve));
}
if (myCache.has("dbtable")) {
table = myCache.get("dbtable");
}
else {
myCache.set("lockFlag", "1");
const connection = await odbc.connect(connectionConfig);
table = await connection.query(`select * from ${tableName}`);
connection.close();
if (table.length) {
const success = myCache.set([
{ key: "dbtable", val: table, ttl: 180 },
]);
}
myCache.take("lockFlag");
bus.emit("unlocked");
}
return res.status(200).json(table);
}
This is how it should work:
At first, lockFlag is not present.
Then, some code calls getDataFromDB. That code evaluates the first if block to false, so it continues: it sets lockFlag to true ("1"), then goes on to retrieve the table data from db. In the meantime:
Some other code calls getDataFromDB. That code, however, evaluates the first if block to true, so it awaits on the promise, until an unlocked event will be emitted.
Back to the first calling code: It finishes its logic, caches the table data, sets lockFlag back to false, emits an unlocked event, and returns.
The other code can now continue its execution: it evaluates the second if to true, so it takes the table from the cache, and returns.
As workaround i add "finally" scope to remove lock-key from cache after first initiation, and this:
while(myCache.has( "lockFlag" )){
await wait(1500);
}
And the "wait" function:
function wait(milleseconds) {
return new Promise(resolve => setTimeout(resolve, milleseconds))
}
(source)
This is working, but still could be time (<1500 ms) that there is cache and the thread not aware.
I'ld happy for batter solution.
I am trying to:
Poll a public API every 5 seconds
Store the resulting JSON in a variable
Store the next query to this same API in a second variable
Compare the first variable to the second
Print the second variable if it is different from the first
Else: Print the phrase: 'The objects are the same' if they haven't changed
Unfortunately, the comparison part appears to fail. I am realizing that this implementation is probably lacking the appropriate variable scoping but I can't put my finger on it. Any advice would be highly appreciated.
data: {
chatters: {
viewers: {
},
},
},
};
//prints out pretty JSON
function prettyJSON(obj) {
console.log(JSON.stringify(obj, null, 2));
}
// Gets Users from Twitch API endpoint via axios request
const getUsers = async () => {
try {
return await axios.get("http://tmi.twitch.tv/group/user/sixteenbitninja/chatters");
} catch (error) {
console.error(error);
}
};
//Intended to display
const displayViewers = async (previousResponse) => {
const usersInChannel = await getUsers();
if (usersInChannel.data.chatters.viewers === previousResponse){
console.log("The objects are the same");
} else {
if (usersInChannel.data.chatters) {
prettyJSON(usersInChannel.data.chatters.viewers);
const previousResponse = usersInChannel.data.chatters.viewers;
console.log(previousResponse);
intervalFunction(previousResponse);
}
}
};
// polls display function every 5 seconds
const interval = setInterval(function () {
// Calls Display Function
displayViewers()
}, 5000);```
The issue is that you are using equality operator === on objects. two objects are equal if they have the same reference. While you want to know if they are identical. Check this:
console.log({} === {})
For your usecase you might want to store stringified version of the previousResponse and compare it with stringified version of the new object (usersInChannel.data.chatters.viewers) like:
console.log(JSON.stringify({}) === JSON.stringify({}))
Note: There can be issues with this approach too, if the order of property changes in the response. In which case, you'd have to check individual properties within the response objects.
May be you can use npm packages like following
https://www.npmjs.com/package/#radarlabs/api-diff
i have an array of variable number of urls and i must merge the data get with axion
the problem is then every axios call is relative to the data of the previus
if i have a fixed number of ulrs i can nest axion calls and live with that
i think to use something like this
var urls = ["xx", "xx", "xx"];
mergeData(urls);
function mergeData(myarray, myid = 0, mydata = "none") {
var myurl = "";
if (Array.isArray(mydata)) {
myurl = myarray[myid];
// do my stuff with data and modify the url
} else {
myurl = myarray[myid];
}
axios.get(myurl)
.then(response => {
// do my stuff and get the data i need and put on an array
if (myarray.length < myid) {
mergeData(myarray, myid + 1, data);
} else {
// show result on ui
}
})
.catch(error => {
console.log(error);
});
}
but i dont like it
there is another solution?
(be kind, i'm still learning ^^)
just to be clear
i need to optain
http request to "first url", parse the the json, save some data(some needed for the output)
another http request to "second url" with one or more parameter from previous data, parse the the json, save some data(some needed for the output)
... and so on, for 5 to 10 times
If your goal is to make subsequent HTTP calls based on information you get from previous calls, I'd utilize async/await and for...of to accomplish this instead of relying on a recursive solution.
async function mergeData(urls) {
const data = [];
for (const url of urls) {
const result = await axios.get(url).then(res => res.data);
console.log(`[${result.id}] ${result.title}`);
// here, do whatever you want to do with
// `result` to make your next call...
// for now, I am just going to append each
// item to `data` and return it at the end
data.push(result);
}
return data;
}
const items = [
"https://jsonplaceholder.typicode.com/posts/1",
"https://jsonplaceholder.typicode.com/posts/2",
"https://jsonplaceholder.typicode.com/posts/3"
];
console.log("fetching...")
mergeData(items)
.then(function(result) {
console.log("done!")
console.log("final result", result);
})
.catch(function(error) {
console.error(error);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/axios/0.19.2/axios.min.js"></script>
Using async/await allows you to utilize for...of which will wait for each call to resolve or reject before moving onto the next one.
To learn more about async/await and for...of, have a look here:
developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function
developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of
Hope this helps.
I have a list of promises and currently I am using promiseAll to resolve them
Here is my code for now:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id, authorization, pageNumber);
if (!urlObject.url) {
// throw error
}
const data = await rp.get({
gzip: true,
headers: {
"Accept-Encoding": "gzip,deflate",
},
json: true,
uri: `${urlObject.url}`,
})
const objects = data.objects.filter((object: any) => object.type === "observed-data" && object.created);
return new Promise((resolve, reject) => {
this._resultsDatastore.bulkInsert(
databaseName,
objects
).then(succ => {
resolve(succ)
}, err => {
reject(err)
})
})
})
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
So as you see here I use promise all and it works:
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
However I noticed it affects the database performance wise so I decided to resolve every 3 of them at a time.
for that I was thinking of different ways like cwait, async pool or wrting my own iterator
but I get confused on how to do that?
For example when I use cwait:
let promiseQueue = new TaskQueue(Promise,3);
const all=new Promise.map(pageFutures, promiseQueue.wrap(()=>{}));
I do not know what to pass inside the wrap so I pass ()=>{} for now plus I get
Property 'map' does not exist on type 'PromiseConstructor'.
So whatever way I can get it working(my own iterator or any library) I am ok with as far as I have a good understanding of it.
I appreciate if anyone can shed light on that and help me to get out of this confusion?
First some remarks:
Indeed, in your current setup, the database may have to process several bulk inserts concurrently. But that concurrency is not caused by using Promise.all. Even if you had left out Promise.all from your code, it would still have that behaviour. That is because the promises were already created, and so the database requests will be executed any way.
Not related to your issue, but don't use the promise constructor antipattern: there is no need to create a promise with new Promise when you already have a promise in your hands: bulkInsert() returns a promise, so return that one.
As your concern is about the database load, I would limit the work initiated by the pageFutures promises to the non-database aspects: they don't have to wait for eachother's resolution, so that code can stay like it was.
Let those promises resolve with what you currently store in objects: the data you want to have inserted. Then concatenate all those arrays together to one big array, and feed that to one database bulkInsert() call.
Here is how that could look:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id,
authorization, pageNumber);
if (!urlObject.url) { // throw error }
const data = await rp.get({
gzip: true,
headers: { "Accept-Encoding": "gzip,deflate" },
json: true,
uri: `${urlObject.url}`,
});
// Return here, don't access the database yet...
return data.objects.filter((object: any) => object.type === "observed-data"
&& object.created);
});
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e);
return []; // in case of error, still return an array
}).flat(); // flatten it, so all data chunks are concatenated in one long array
// Don't create a new Promise with `new`, only to wrap an other promise.
// It is an antipattern. Use the promise returned by `bulkInsert`
return this._resultsDatastore.bulkInsert(databaseName, objects);
This uses .flat() which is rather new. In case you have no support for it, look at the alternatives provided on mdn.
First, you asked a question about a failing solution attempt. That is called X/Y problem.
So in fact, as I understand your question, you want to delay some DB request.
You don't want to delay the resolving of a Promise created by a DB request... Like No! Don't try that! The promise wil resolve when the DB will return a result. It's a bad idea to interfere with that process.
I banged my head a while with the library you tried... But I could not do anything to solve your issue with it. So I came with the idea of just looping the data and setting some timeouts.
I made a runnable demo here: Delaying DB request in small batch
Here is the code. Notice that I simulated some data and a DB request. You will have to adapt it. You also will have to adjust the timeout delay. A full second certainly is too long.
// That part is to simulate some data you would like to save.
// Let's make it a random amount for fun.
let howMuch = Math.ceil(Math.random()*20)
// A fake data array...
let someData = []
for(let i=0; i<howMuch; i++){
someData.push("Data #"+i)
}
console.log("Some feak data")
console.log(someData)
console.log("")
// So we have some data that look real. (lol)
// We want to save it by small group
// And that is to simulate your DB request.
let saveToDB = (data, dataIterator) => {
console.log("Requesting DB...")
return new Promise(function(resolve, reject) {
resolve("Request #"+dataIterator+" complete.");
})
}
// Ok, we have everything. Let's proceed!
let batchSize = 3 // The amount of request to do at once.
let delay = 1000 // The delay between each batch.
// Loop through all the data you have.
for(let i=0;i<someData.length;i++){
if(i%batchSize == 0){
console.log("Splitting in batch...")
// Process a batch on one timeout.
let timeout = setTimeout(() => {
// An empty line to clarify the console.
console.log("")
// Grouping the request by the "batchSize" or less if we're almost done.
for(let j=0;j<batchSize;j++){
// If there still is data to process.
if(i+j < someData.length){
// Your real database request goes here.
saveToDB(someData[i+j], i+j).then(result=>{
console.log(result)
// Do something with the result.
// ...
})
} // END if there is still data.
} // END sending requests for that batch.
},delay*i) // Timeout delay.
} // END splitting in batch.
} // END for each data.
My removeObjects function has me stummped.The function is suppose to syncronoulsy get a list of objects in an S3 bucket then asyncronously removes the objects. Repeat if the list was truncated, until the there are no more objects to remove. (AWS doesn't provide the total count of objects in the bucket and listObjects pages the results.)
What am I doing wrong / why doesn't my function work? The solution should exploit single thread and async nature of JS. For the bounty I am hoping for an answer specific to the module. The git repo is public if you want to see the entire module.
export function removeObjects(params: IS3NukeRequest): Promise<S3.Types.DeleteObjectsOutput> {
const requests: Array<Promise<S3.Types.DeleteObjectsOutput>> = [];
let isMore;
do {
listObjectsSync(params)
.then((objectList: S3.Types.ListObjectsV2Output) => {
isMore = objectList.ContinuationToken = objectList.IsTruncated ? objectList.NextContinuationToken : null;
requests.push(params.Client.deleteObjects(listObjectsV2Output2deleteObjectsRequest(objectList)).promise());
})
.catch((err: Error) => { Promise.reject(err); });
} while (isMore);
return Promise.all(requests);
}
export async function listObjectsSync(params: IS3NukeRequest): Promise<S3.Types.ListObjectsV2Output> {
try {
return await params.Client.listObjectsV2(s3nukeRequest2listObjectsRequest(params)).promise();
} catch (err) {
return Promise.reject(err);
}
}
Thanks.
The thing is that listObjectsSync function returns a Promise, so you need to treat it as an async function and can't just use a loop with it. What you need to do is to create a chain of promises while your isMore is true, I've done it using a recursive approach (I'm not pro in TS, so please check the code before using it). I also haven't tried the code live, but logically it should work :)
const requests: Array<Promise<S3.Types.DeleteObjectsOutput>> = [];
function recursive(recursiveParams) {
return listObjectsSync(recursiveParams).then((objectList: S3.Types.ListObjectsV2Output) => {
let isMore = objectList.ContinuationToken = objectList.IsTruncated ? objectList.NextContinuationToken : null;
requests.push(params.Client.deleteObjects(listObjectsV2Output2deleteObjectsRequest(objectList)).promise());
if (isMore) {
//do we need to change params here?
return recursive(recursiveParams)
}
//this is not necessary, just to indicate that we get out of the loop
return true;
});
}
return recursive(params).then(() => {
//we will have all requests here
return Promise.all(requests);
});