Updating progress in SharePoint Job - sharepoint

I have a long-running SharePoint timer job and I would like to display it's progress in central administration (so I'm using SPJobDefinition.UpdateProgress(int percentage) method).
Let's say I have 50 000 elements on a list that i want to update in a foreach loop. If I place something like job.UpdateProgress((int) itemNo / itemCount) in a loop, would it send a web request to SharePoint server each time method is called (50 000 times), or only if the percentage actually changed (no more than 100 times)?
I don't want any noticeable performance degradation because of this and I suppose that more requests might slow down the job.
Also, what tool is good to see the requests and responses (Fiddler? Or something else would be better for Sharepoint?).

(in SP2010) Every time you call job.UpdateProgress, the SPJobDefinition class will send an SPRequest. SPJobDefinition does not internally track its' percent complete, so it has no way of knowing whether your new int is an update or not unless it contacts the server, so it just contacts the server. So yes, calling this 50000 times may slow down your code significantly.
The easiest way to figure out stuff like this (since the online MSDN documentation can be very sparse at times) is to use a .Net reflector on the Microsoft.SharePoint.dll. Personally, I tend to use ilSpy.

Related

How to manage concurrent writes to a large (5mb) MongoDB document with Node JS

I built an app that manages sports tournaments using MongoDB, Mongoose on NodeJS. I'd like to know if I am using the best solution to handle multiple concurrent writes to a large document (5Mb) in rapid succession.
Each "Event" (tournament) is a single document that contains a list of teams. There is a maximum number of teams that can register to each Event. So normally, when a team registers, my Node JS server will load the event, check if the max number of teams has not been reached, add the team to sub-documents and save the Event.
The problem is that some tournaments make players frantic to get a spot and you can have 60 teams complete their registration in the opening seconds which would cause concurrency errors.
For example, if 2 teams click on "save" at the same time, 2 threads (requests) will open on the NodeJS server, both threads will load identical copies of the event, modify them and save two different versions of the document over one another. Obviously, you will get a version error for one of the two threads. Now imagine 60 teams registering within the same second.
The second problem is that the Event document is quite large. Let's be dramatic and say it's 5Mb in size (rare but possible). If I have to load, modify, write 5 megs per registration, the registration system is going to grind to a halt (since my MongoDB is on a different server.)
So I need to know if I built the right solution and if you guys foresee problems with this.
On my node server, I built a Singleton class (accessible to all requests) to manage access to documents. So if a request comes along and asks for Document X, the singleton returns a Promise to the request which will be resolved once this document becomes available to edit. The singleton then turns around, loads the document and grants access to the first request by resolving it's promise. When the request is done editing this document, it tells the singleton that it's done. The singleton then checks if there is queue of other requests waiting to edit this document (other teams that want to register). If so, it does NOT save the document but rather resolves the next promise, allowing the next request to edit the document.
When the last request has finished editing the document and there are no more requests in the queue, the singleton saves the document and clears it from memory.
So in short, the singleton allows the system to load the document once, allow modifications from multiple requests and then saves the document at the end of the rush. This is especially useful since the document is rather large (up to 5mb) and minimizes the number of read/writes to the MongoDB server. The other use is that if we're accepting 50 teams and we get 55 requests wanting to append their teams, the last 5 requests in the queue will take into account that the live document has reached it's team limit and return a "sorry we're full" response.
Is this the best way to manage concurrent writes to a large document?
MongoDB provides a multitude of update operators that you should be using on the specific fields instead of modifying the entire document in your application. For example, for adding to arrays use https://docs.mongodb.com/manual/reference/operator/update/push/.
This way you 1) will only be sending the changed data on each write and 2) avoid racing yourself and clobbering your other changes.
This doesn't help you with the time it takes the server to rewrite that 5 mb document each time it's modified - split the document up to fix this (if you find it to be an issue).

Conceptual approach of threads in Delphi

Over 2 years ago, Remy Lebeau gave me invaluable tips on threads in Delphi. His answers were very useful to me and I feel like I made great progress thanks to him. This post can be found here.
Today, I now face a "conceptual problem" about threads. This is not really about code, this is about the approach one should choose for a certain problem. I know we are not supposed to ask for personal opinions, I am merely asking if, on a technical point a view, one of these approach must be avoided or if they are both viable.
My application has a list of unique product numbers (named SKU) in a database. Querying an API with theses SKUS, I get back a JSON file containing details about these products. This JSON file is processed and results are displayed on screen, and saved in database. So, at one step, a download process is involved and it is executed in a worker thread.
I see two different approaches possible for this whole procedure :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. A Tstringlist is then built and, for each element of the list, a thread is launched, downloads the JSON, sends back the result to the main thread and terminates.
This can be pictured like this :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. Instead of sending SKU numbers one after another to the worker thread, the whole list is sent, and the worker thread iterates through the list, sending back results for displaying and saving to the main thread (via a synchronize event). So we only have one worker thread working the whole list before terminating.
This can be pictured like this :
I have coded these two different approaches and they both work... with each their downsides that I have experienced.
I am not a professional developer, this is a hobby and, before working my way further down a path or another for "polishing", I would like to know if, on a technical point of view and according to your knowledge and experience, one of the approaches I depicted should be avoided and why.
Thanks for your time
Mathias
Another thing to consider in this case is latency to your API that is producing the JSON. For example, if it takes 30 msec to go back and forth to the server, and 0.01 msec to create the JSON on the server, then querying a single JSON record per request, even if each request is in a different thread, does not make much sense. In that case, it would make sense to do fewer requests to the server, returning more data on each request, and partition the results up among different threads.
The other thing is that threads are not a solution to every problem. I would question why you need to break each sku into a single thread. how long is each individual thread running and how much processing is each thread doing? In general, creating lots of threads, for each thread to work for a fraction of a msec does not make sense. You want the threads to be alive for as long as possible, processing as much data as they can for the job. You don't want the computer to be using as much time creating/destroying threads as actually doing useful work.

Reporting progress on a million call process

I have a console/desktop application that crawls a lot (think million calls) of data from various webservices. At any given time I have about 10 threads performing these call and aggregating the data into a MySql database. All seeds are also stored in a database.
What would be the best way to report it's progress? By progress I mean:
How many calls already executed
How many failed
What's the average call duration
How much is left
I thought about logging all of them somehow and tailing the log to get the data. Another idea was to offer some kind of output to a always open TCP endpoint where some form of UI could read the data and display some aggregation. Both ways look too rough and too complicated.
Any other ideas?
The "best way" depends on your requirements. If you use a logging framework like NLog, you can plug in a variety of logging targets like files, databases, the console or TCP endpoints.
You can also use a viewer like Harvester as a logging target.
When logging multi-threaded applications I sometimes have an additional thread that writes a summary of progress to the logger once every so often (e.g. every 15 seconds).
since it is a Console Application, just use Writeline, just have the application spit the important stuff out to the Console.
I did something Similar in an application that I created to export PDF's from a SQL Server Database back into PDF Format
you can do it many different ways. if you are counting records and their size you can run a tally of sorts and have it show the total every so many records..
I also wrote out to a Text File, so that I could keep track of all the PDFs and what case numbers they went to and things like that. that information is in the answer that I gave to the above linked question.
you could also write things out to a Text File every so often with the statistics.
the logger that Eric J. mentions is probably going to be a little bit easier to implement, and would be a nice tool for your toolbox.
these options are just as valid depending on your specific needs.

Recurrent workflow stops after a number of iterations

A workflow is started upon instantiation of the the entity Hazaa. It waits for a while and then creates a new instance of Hazaa. After that, it's put to sleep as successful.
I'd expect it to fire perpetually creating a bunch of Hazaas. However, I only get 15 new ones before the procreations cease. Together with the original one that I create manually to set off the workflow-flow, there's 16 instances in total. I've tested with longer delays (up to several hours) but the behavior is consistent.
That's for CRM On-line. On premise, the behavior is similar but limited to 8 instances in grand total.
According to the harvest of links I've found, there's a setting in CRM to control the number of iterations. The problem is that my solution will be mainly deployed for on-line customers so unless I own the cloud, that's a show stopper.
I understand it's CRM protecting against the recurrence. What can I do about it?
The best solution I can think of at the moment is to set up a super workflow, firing the sub workflow 16 times. Then I'd need to have a super super workflow etc. Not a braggable in my view.
A CorrelationToken contains a counter and a one-hour "self-destruct" timer.
When the first workflow runs, a new CorrelationToken is created. The counter is set to 1 and the timer is set to one hour.
When the second workflow is started from the first workflow (even indirectly, such as in your case), this same CorrelationToken is used if its self-destruct timer has not already expired. If it has, a new CorrelationToken is created. If it hasn't, it increments the counter and resets the timer. Lather, rinse, repeat.
The second (and subsequent) workflows will only execute if the counter is 8 or less (On-Premise) or 16 or less (CRM Online)
What this really means is that in practice, if your child workflows are executing sooner than one hour apart, the CorrelationToken never gets a chance to expire, which means eventually the counter increments past the limit. It does not mean that you can execute up to 8 (or 16) of these workflows every hour.
It sounds like you already figured most of this out, but I wanted to give other readers background. So, to answer your question: if your design includes looping workflows that are executed sooner than one hour apart, you will need to consider an alternate design. It will definitely involve an external process or service.
If I'm understanding you correctly, it sounds like you're creating an infinite loop, which is why the CRM kills workflows like these, since otherwise they'll never end. On what condition would you stop making more Hazaa records? You could add a number field and increment that field on each new Hazaa and when it reaches a certain number stop the workflow.

Returning LOTS of items from a MongoDB via Node.js

I'm returning A LOT (500k+) documents from a MongoDB collection in Node.js. It's not for display on a website, but rather for data some number crunching. If I grab ALL of those documents, the system freezes. Is there a better way to grab it all?
I'm thinking pagination might work?
Edit: This is already outside the main node.js server event loop, so "the system freezes" does not mean "incoming requests are not being processed"
After learning more about your situation, I have some ideas:
Do as much as you can in a Map/Reduce function in Mongo - perhaps if you throw less data at Node that might be the solution.
Perhaps this much data is eating all your memory on your system. Your "freeze" could be V8 stopping the system to do a garbage collection (see this SO question). You could Use V8 flag --trace-gc to log GCs & prove this hypothesis. (thanks to another SO answer about V8 and Garbage collection
Pagination, like you suggested may help. Perhaps even splitting up your data even further into worker queues (create one worker task with references to records 1-10, another with references to records 11-20, etc). Depending on your calculation
Perhaps pre-processing your data - ie: somehow returning much smaller data for each record. Or not using an ORM for this particular calculation, if you're using one now. Making sure each record has only the data you need in it means less data to transfer and less memory your app needs.
I would put your big fetch+process task on a worker queue, background process, or forking mechanism (there are a lot of different options here).
That way you do your calculations outside of your main event loop and keep that free to process other requests. While you should be doing your Mongo lookup in a callback, the calculations themselves may take up time, thus "freezing" node - you're not giving it a break to process other requests.
Since you don't need them all at the same time (that's what I've deduced from you asking about pagination), perhaps it's better to separate those 500k stuff into smaller chunks to be processed at the nextTick?
You could also use something like Kue to queue the chunks and process them later (thus not everything in the same time).

Resources