Multi-Threaded SubQueries using Entity Framework throws errors - multithreading

I have been trying to update the performance of my code in regards to database queries. The problem I am currently running into is that I cant seem to find a way to get a new context for each subQuery.
Using the below simplified code will inconsitantly generate a "The underlying provider failed on Open."
using (var context = getNewContextObject())
{
var result = new SomeResultObject();
var parentQuery = context.SomeTable.Where(x => x.Name = "asdf");
Parallel.Invoke(() =>
{
result.count1 = parentQuery.Where(x => x.Amount >= 100 & x.Amount < 2000).Count();
}, () =>
{
result.count2 = parentQuery.Where(x => x.Amount < 100).Count();
}
, () =>
{
result.count3 = parentQuery.Where(x => x.Amount >= 2000).Count();
}
);
}
The only way around this so far seems to be to rebuild the entire query for each subQuery with a new context. Is there any way to avoid building each query from the bottom up with a new Context? Can I instead just attach each subquery query to a new context? I am looking for something like the below.
Parallel.Invoke(() =>
{
var subQuery = parentQuery.Where(x => x.Amount >= 100 & x.Amount < 2000).Count();
subQuery.Context = getNewContextObject();
result.count1 = subQuery.Count();
}, () =>
{
var subQuery = parentQuery.Where(x => x.Amount < 100).Count();
subQuery.Context = getNewContextObject();
result.count2 = subQuery.Count();
}
, () =>
{
var subQuery = parentQuery.Where(x => x.Amount >= 2000).Count();
subQuery.Context = getNewContextObject();
result.count3 = subQuery.Count();
}
);
}

I'm not sure how this exactly relates to your problem but none of EF features are thread safe so I expect that running multiple queries on the same context instance in parallel can blow up for any reason. For example they access the same connection property in the context but threads don't know about each other so they can close the connection to other thread or replace the instance with other connection instance (you can try to open the connection manually before you run parallel threads and close it once all threads are done but it doesn't have to be the only problem).

Related

How do I chain a set of functions together using promises and q in node.js?

I have some dynamic data that needs to have work performed on it. The work must happen sequentially. Using the Q Library, I'd like to create an array of functions and execute the code sequentially using sequences. I can't seem to quite figure out the syntax to achieve this.
const fruits = ["apple", "cherry", "blueberry"]
function makeFruitPie (fruit) {
return Q.Promise((resolve, reject) => {
// Do some stuff here
resolve(fruit+" pie")
// Error handling here
reject(new Error(""))
})
}
const fruitFuncs = new Array(fruits.length)
for(var i = 0; i < fruits.length; i++) {
fruitFuncs[i] = makeFruitPie(fruits[i])
}
// Stole this example from the github docs but can't quite get it right.
i = 0
var result = Q(fruits[i++])
fruitFuncs.forEach((f) => {
result = result(fruits[i++]).then(f)
})
With these lines
for(var i = 0; i < fruits.length; i++) {
fruitFuncs[i] = makeFruitPie(fruits[i])
}
you already run the functions and, hence, their processing will begin.
Assuming you want the execution of the functions in sequence, the following would be more appropriate:
// construct the pipeline
const start = Q.defer();
let result = start.promise; // we need something to set the pipeline off
fruits.forEach( (fruit) => {
result = result.then( () => makeFruitPie( fruit ) );
});
// start the pipeline
start.resolve();
Sidenote: There is a native Promise implementation supported by almost all environments. Maybe consider switching from the library backed version.
You can use Promise.all
Promise.all(fruits.map(fruit=>makeFruitPie(fruit).then(res=> res) )).
then(final_res => console.log(final_res))
final_res will give you array of results
you could use for..of and do things sequentially. something like this
const Q = require("q");
const fruits = ["apple", "cherry", "blueberry"];
function makeFruitPie(fruit) {
return Q.Promise((resolve, reject) => {
// Do some stuff here
resolve(`${fruit} pie`);
// Error handling here
reject(new Error(""));
});
}
for (const fruit of fruits) {
const result = await makeFruitPie(fruit);
console.log(result);
}
By the way also worth considering native Promise insteead of using q

How to execute a batch of transactions independently using pg-promise?

We're having an issue in our main data synchronization back-end function. Our client's mobile device is pushing changes daily, however last week they warned us some changes weren't updated in the main web app.
After some investigation in the logs, we found that there is indeed a single transaction that fails and rollback. However it appears that all the transactions before this one also rollback.
The code works this way. The data to synchronize is an array of "changesets", and each changset can update multiple tables at once. It's important that a changset be updated completely or not at all, so each is wrapped in a transaction. Then each transaction is executed one after the other. If a transaction fails, the others shouldn't be affected.
I suspect that all the transactions are actually combined somehow, possibly through the main db.task. Instead of just looping to execute the transactions, we're using a db.task to execute them in batch avoid update conflicts on the same tables.
Any advice how we could execute these transactions in batch and avoid this rollback issue?
Thanks, here's a snippet of the synchronization code:
// Begin task that will execute transactions one after the other
db.task(task => {
const transactions = [];
// Create a transaction for each changeset (propriete/fosse/inspection)
Object.values(data).forEach((change, index) => {
const logchange = { tx: index };
const c = {...change}; // Use a clone of the original change object
transactions.push(
task.tx(t => {
const queries = [];
// Propriete
if (Object.keys(c.propriete.params).length) {
const params = proprietes.parse(c.propriete.params);
const propriete = Object.assign({ idpropriete: c.propriete.id }, params);
logchange.propriete = { idpropriete: propriete.idpropriete };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM propriete WHERE idpropriete = $1`, propriete.idpropriete).then(previous => {
logchange.propriete.previous = previous;
return t.result('UPDATE propriete SET' + qutil.setequal(params) + 'WHERE idpropriete = ${idpropriete}', propriete).then(result => {
logchange.propriete.new = params;
})
}));
}
else delete c.propriete;
// Fosse
if (Object.keys(c.fosse.params).length) {
const params = fosses.parse(c.fosse.params);
const fosse = Object.assign({ idfosse: c.fosse.id }, params);
logchange.fosse = { idfosse: fosse.idfosse };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM fosse WHERE idfosse = $1`, fosse.idfosse).then(previous => {
logchange.fosse.previous = previous;
return t.result('UPDATE fosse SET' + qutil.setequal(params) + 'WHERE idfosse = ${idfosse}', fosse).then(result => {
logchange.fosse.new = params;
})
}));
}
else delete c.fosse;
// Inspection (rendezvous)
if (Object.keys(c.inspection.params).length) {
const params = rendezvous.parse(c.inspection.params);
const inspection = Object.assign({ idvisite: c.inspection.id }, params);
logchange.rendezvous = { idvisite: inspection.idvisite };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM rendezvous WHERE idvisite = $1`, inspection.idvisite).then(previous => {
logchange.rendezvous.previous = previous;
return t.result('UPDATE rendezvous SET' + qutil.setequal(params) + 'WHERE idvisite = ${idvisite}', inspection).then(result => {
logchange.rendezvous.new = params;
})
}));
}
else delete change.inspection;
// Cheminees
c.cheminees = Object.values(c.cheminees).filter(cheminee => Object.keys(cheminee.params).length);
if (c.cheminees.length) {
logchange.cheminees = [];
c.cheminees.forEach(cheminee => {
const params = cheminees.parse(cheminee.params);
const ch = Object.assign({ idcheminee: cheminee.id }, params);
const logcheminee = { idcheminee: ch.idcheminee };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM cheminee WHERE idcheminee = $1`, ch.idcheminee).then(previous => {
logcheminee.previous = previous;
return t.result('UPDATE cheminee SET' + qutil.setequal(params) + 'WHERE idcheminee = ${idcheminee}', ch).then(result => {
logcheminee.new = params;
logchange.cheminees.push(logcheminee);
})
}));
});
}
else delete c.cheminees;
// Lock from further changes on the mobile device
// Note: this change will be sent back to the mobile in part 2 of the synchronization
queries.push(t.result('UPDATE rendezvous SET timesync = now() WHERE idvisite = $1', [c.idvisite]));
console.log(`transaction#${++transactionCount}`);
return t.batch(queries).then(result => { // Transaction complete
logdata.transactions.push(logchange);
});
})
.catch(function (err) { // Transaction failed for this changeset, rollback
logdata.errors.push({ error: err, change: change }); // Provide error message and original change object to mobile device
console.error(JSON.stringify(logdata.errors));
})
);
});
console.log(`Total transactions: ${transactions.length}`);
return task.batch(transactions).then(result => { // All transactions complete
// Log everything that was uploaded from the mobile device
log.log(res, JSON.stringify(logdata));
});
I apologize, this is almost impossible to make a final good answer when the question is wrong on too many levels...
It's important that a change set be updated completely or not at all, so each is wrapped in a transaction.
If the change set requires data integrity, the whole thing must be one transaction, and not a set of transactions.
Then each transaction is executed one after the other. If a transaction fails, the others shouldn't be affected.
Again, data integrity is what a single transaction guarantees, you need to make it into one transaction, not multiple.
I suspect that all the transactions are actually combined somehow, possibly through the main db.task.
They are combined, and not through task, but through method tx.
Any advice how we could execute these transactions in batch and avoid this rollback issue?
By joining them into a single transaction.
You would use a single tx call at the top, and that's it, no tasks needed there. And in case the code underneath makes use of its own transactions, you can update it to allow conditional transactions.
Also, when building complex transactions, an app benefits a lot from using the repository patterns shown in pg-promise-demo. You can have methods inside repositories that support conditional transactions.
And you should redo your code to avoid horrible things it does, like manual query formatting. For example, never use things like SELECT ${Object.keys(params).join()}, that's a recipe for disaster. Use the proper query formatting that pg-promise gives you, like SQL Names in this case.

node js non blocking for loop

Please check if my understanding about the following for loop is correct.
for(let i=0; i<1000; i){
sample_function(i, function(result){});
}
The moment the for loop is invoked, 1000 events of sample_function will be qued in the event loop. After about 5 seconds a user gives a http request, which is qued after those "1000 events".
Usually this would not be a problem because the loop is asynchronous.
But lets say that this sample_function is a CPU intensive function. Therefore the "1000 events" are completed consecutively and each take about 1 second.
As a result, the for loop will block for about 1000 seconds.
Would there be a way to solve such problem? For example would it be possible to let the thread take a "break" every 10 loops? and allow other new ques to pop in between? If so how would I do it?
Try it this:
for(let i=0; i<1000; i++)
{
setTimeout(sample_function, 0, i, function(result){});
}
or
function sample_function(elem, index){..}
var arr = Array(1000);
arr.forEach(sample_function);
There is a technique called partitioning which you can read about in the NodeJs's document, But as the document states:
If you need to do something more complex, partitioning is not a good option. This is because partitioning uses only the Event Loop, and you won't benefit from multiple cores almost certainly available on your machine.
So you can also use another technique called offloading, e.g. using worker threads or child processes which also have certain downsides like having to serialize and deserialize any objects that you wish to share between the event loop (current thread) and a worker thread or a child process
Following is an example of partitioning that I came up with which is in the context of an express application.
const express = require('express');
const crypto = require('crypto');
const randomstring = require('randomstring');
const app = express();
const port = 80;
app.get('/', async (req, res) => {
res.send('ok');
})
app.get('/block', async (req, res) => {
let result = [];
for (let i = 0; i < 10; ++i) {
result.push(await block());
}
res.send({result});
})
app.listen(port, () => {
console.log(`Listening on port ${port}`);
console.log(`http://localhost:${port}`);
})
/* takes around 5 seconds to run(varies depending on your processor) */
const block = () => {
//promisifying just to get the result back to the caller in an async way, this is not part of the partitioning technique
return new Promise((resolve, reject) => {
/**
* https://nodejs.org/en/docs/guides/dont-block-the-event-loop/#partitioning
* using partitioning techinique(using setImmediate/setTimeout) to prevent a long running operation
* to block the eventloop completely
* there will be a breathing period between each time block is called
*/
setImmediate(() => {
let hash = crypto.createHash("sha256");
const numberOfHasUpdates = 10e5;
for (let iter = 0; iter < numberOfHasUpdates; iter++) {
hash.update(randomstring.generate());
}
resolve(hash);
})
});
}
There are two endpoints / and /block, if you hit /block and then hit / endpoint, what happens is that the / endpoint will take around 5 seconds to give back response(during the breathing space(the thing that you call it a "break"))
If setImmediate was not used, then the / endpoint would respond to a request after approximately 10 * 5 seconds(10 being the number of times block function is called in the for-loop)
Also you can do partitioning using a recursive approach like this:
/**
*
* #param items array we need to process
* #param chunk a number indicating number of items to be processed on each iteration of event loop before the breathing space
*/
function processItems(items, chunk) {
let i = 0;
const process = (done) => {
let currentChunk = chunk;
while (currentChunk > 0 && i < items?.length) {
--currentChunk;
syncBlock();
++i;
}
if (i < items?.length) {
setImmediate(process);//the key is to schedule the next recursive call (by passing the function to setImmediate) instead of doing a recursive call (by simply invoking the process function)
}
}
process();
}
And if you need to get back the data processed you can promisify it like this:
function processItems(items, chunk) {
let i = 0;
let result = [];
const process = (done) => {
let currentChunk = chunk;
while (currentChunk > 0 && i < items?.length) {
--currentChunk;
const returnedValue = syncBlock();
result.push(returnedValue);
++i;
}
if (i < items?.length) {
setImmediate(() => process(done));
} else {
done && done(result);
}
}
const promisified = () => new Promise((resolve) => process(resolve));
return promisified();
}
And you can test it by adding this route handler to the other route handlers provided above:
app.get('/block2', async (req, res) => {
let result = [];
let arr = [];
for (let i = 0; i < 10; ++i) {
arr.push(i);
}
result = await processItems(arr, 1);
res.send({ result });
})

Using ElementArrayFinder.filter() with async/await

I've been using the following function to filter element arrays for the past few years with Webdriver's Control Flow enabled:
filterElementsByText (elemList, comparator, locator) {
return elemList.filter((elem) => {
let searchTarget = locator ? elem.element(locator) : elem
return searchTarget.getText().then((text) => text === comparator)
})
}
I am now trying to migrate my repo to using async/await which requires turning off the Control Flow.
This transition has been mostly successful, but I'm having trouble with the function above. Intermittently, I am seeing this error:
Failed: java.net.ConnectException: Connection refused: connect
I am able to reproduce this issue with a test case I've written against https://angularjs.org, although it happens with much higher frequency against my own app.
let todoList = element.all(by.repeater('todo in todoList.todos'))
let todoText = element(by.model('todoList.todoText'))
let todoSubmit = element(by.css('[value="add"]'))
let addItem = async (itemLabel = 'write first protractor test') => {
await todoText.sendKeys(itemLabel)
return todoSubmit.click()
}
let filterElementsByText = (elemList, comparator, locator) => {
return elemList.filter((elem) => {
let searchTarget = locator ? elem.element(locator) : elem
return searchTarget.getText().then((text) => {
console.log(`Element text is: ${text}`)
return text === comparator
})
})
}
describe('filter should', () => {
beforeAll(async () => {
browser.ignoreSynchronization = true
await browser.get('https://angularjs.org')
for (let i = 0; i < 10; i++) {
await addItem(`item${i}`)
}
return addItem()
})
it('work', async () => {
let filteredElements = await filterElementsByText(todoList, 'write first protractor test')
return expect(filteredElements.length).toEqual(1)
})
})
This is being run with the following set in Protractor's conf file:
SELENIUM_PROMISE_MANAGER: false
With the simplified test case it seems to occur on 5-10% of executions (although, anecdotally it does seem to occur more frequently once it occurs the first time)
My problem is, this feels like a bug in Webdriver, but I'm not sure what conditions would cause that error so I'm not sure how to proceed.
For anyone reading and wondering, the problem with my own app was two-fold.
First, as described in the comments to the original question, ElementArrayFinder.filter() causes this error because it runs parallel requests for each element in the array.
Secondly (and not apparent in the original question), rather than passing an ElementArrayFinder as described in this test case, I was actually passing in a chained child of each element in the array such as:
element.all(by.repeater('todo in todoList.todos').$$('span')
Looking at the Webdriver output as this happens I noticed that this then causes all those locators to be retrieved in parallel leading to the same error.
I was able to work around both issues by filtering this way:
let filterElementsByText = async (elemList, comparator, locator) => {
let filteredElements = []
let elems = await elemList
for (let i = 0; i < elems.length; i++) {
let elem = await elems[i]
let searchTarget = locator ? elem.element(locator) : elem
let text = await searchTarget.getText()
if (text === comparator) {
filteredElements.push(elem)
}
}
return filteredElements
}
This unblocks me, but it still feels like an issue that these functions are just unusable with async/await.

Entity Framework 4.3 and Threading

I am getting the error below when running a query in Entity Framework 4.3 in a thread.
The ObjectContext instance has been disposed and can no longer be used for operations that require a connection.
Below is where my thread starts and it errors at var item = _gamesRepository.Get(gameIncludes, q => q.Id == gameId);. Am I doing something wrong or should I use a different approach?
public void ProcessGame(int gameId)
{
new Thread(() =>
{
Expression<Func<Game, object>>[] gameIncludes = {
q => q.DivisionGameTeamResults,
q => q.DivisionGameTeamResults.Select(g => g.DivisionBracketGameParticipant),
q => q.DivisionGameTeamResults.Select(g => g.DivisionTeamPoolGame),
q => q.DivisionGameTeamResults.Select(g => g.DivisionTeamPoolGame.DivisionTeamPool),
e => e.DivisionGameTeamResults.Select(q => q.DivisionBracketGameParticipant.DivisionBracketGame.DivisionBracketGameParticipants.Select(t => t.DivisionBracketGameParticipantTeam.DivisionTeam.Team)),
e => e.DivisionGameTeamResults.Select(q => q.DivisionBracketGameParticipant.DivisionBracketGame.DivisionLoserBracketGameParticipants.Select(d => d.DivisionBracketGameParticipantPool.DivisionPool)),
e => e.DivisionGameTeamResults.Select(q => q.DivisionBracketGameParticipant.DivisionBracketGame.DivisionLoserBracketGameParticipants.Select(d => d.DivisionBracketGameParticipantTeamPool.DivisionTeamPool.DivisionTeam)),
q => q.DivisionGameTeamResults.Select(d => d.DivisionTeamPoolGame.DivisionTeamPool.DivisionPool.Division.Event.Members),
q => q.DivisionGameTeamResults.Select(d => d.DivisionBracketGameParticipant.DivisionBracketGame.BracketPart.DivisionBracketPart.DivisionBracket.Division.Event.Members)
};
var item = _gamesRepository.Get(gameIncludes, q => q.Id == gameId);
if (item != null)
{
if (item.DivisionGameTeamResults.All(d => d.DivisionTeamPoolGame != null))
{
// Pool Game
_divisionBracketsService.ProcessPoolGame(item);
}
else if (item.DivisionGameTeamResults.All(d => d.DivisionBracketGameParticipant != null))
{
// Bracket Game
_divisionBracketsService.ProcessBracketGame(item);
}
UnitOfWork.Commit();
}
}).Start();
}
UPDATE:
I made the required changes to fix that issue.
var gamesRepository = DependencyResolver.Current.GetService<IGamesRepository>();
var divisionBracketsService = DependencyResolver.Current.GetService<IDivisionBracketsService>();
Repository and unit of work should be owned by your thread for two reasons:
EF is not thread safe - sharing context among threads can lead you to false belief that you can do concurrent operations on the same context instance but you in general can't. This may change in EF6 where async support will be implemented but current version is only for single threaded processing.
If you share context between threads you must ensure that the thread owning the context does not dispose the context before any other thread dependent on that context finishes its processing - that is your current problem. It means that your calling thread should wait on your processing thread to complete.

Resources