I have application where i am reading data from csv file at every interval of 500ms.
CSV file is changed at every 300ms from another desktop based application.
So which one is better to use fs.watch or setInterval in this case.
In this situation I'l go with fs.watch it is helping me to create a more robust architecture.
Let's assume we are using timers setTimeout|setInterval, we need to hardcode the delay, and meanwhile the front application is scaling up and is updating the csv faster or slower, then you will need to modify your code so using fs.watch you just don't care how many change events occured, your application will not need any changes.
The biggest issue that I see at the moment with fs.watch is if the front will update the csv so fast that you will not finish your import and a new event will be dispatched then you will have hard time to deal with race conditions, but till that moment fs.watch is a good call imo.
Related
I am trying to implement a Django web application (on Python 3.8.5) which allows a user to create “activities” where they define an activity duration and then set the activity status to “In progress”.
The POST action to the View writes the new status, the duration and the start time (end time, based on start time and duration is also possible to add here of course).
The back-end should then keep track of the duration and automatically change the status to “Finished”.
User actions can also change the status to “Finished” before the calculated end time (i.e. the timer no longer needs to be tracked).
I am fairly new to Python so I need some advice on the smartest way to implement such a concept?
It needs to be efficient and scalable – I’m currently using a Heroku Free account so have limited system resources, but efficiency would also be important for future production implementations of course.
I have looked at the Python threading Timer, and this seems to work on a basic level, but I’ve not been able to determine what kind of constraints this places on the system – e.g. whether the spawned Timer thread might prevent the main thread from finishing and releasing resources (i.e. Heroku Dyno threads), etc.
I have read that persistence might be a problem (if the server goes down), and I haven’t found a way to cancel the timer from another process (the .cancel() method seems to rely on having the original object to cancel, and I’m not sure if this is achievable from another process).
I was also wondering about a more “background” approach, i.e. a single process which is constantly checking the database looking for activity records which have reached their end time and swapping the status.
But what would be the best way of implementing such a server?
Is it practical to read the database every second to find records with an end time of “now”? I need the status to change in real-time when the end time is reached.
Is something like Celery a good option, or is it overkill for a single process like this?
As I said I’m fairly new to these technologies, so I may be missing other obvious solutions – please feel free to enlighten me!
Thanks in advance.
To achieve this you need some kind of scheduling tasks functionality. For a fast simpler implementation is a good solution to use the Timer object from the
Threading module.
A more complete solution is tu use Celery. If you are new, deeping in it will give you a good value start using celery as a queue manager distributing your work easily across several threads or process.
You mentioned that you want it to be efficient and scalable, so I guess you will want to implement similar functionalities that will require multiprocessing and schedule so for that reason my recommendation is to use celery.
You can integrate it into your Django application easily following the documentation Integrate Django with Celery.
I'm currently monitoring files in node.js using fs.watch. The problem I have is for example, let say I copy a 1gig file into a folder I'm watching. The moment the file starts copying I get a notification about the file. If I start reading it immediately I end up with bad data since the file has not finished copying. For example a zip file has it's table of contents at the end but I'd end up reading it before it's table of contents has been written.
Off the top of my head I could setup some task to call fs.stat on the file every N seconds and only try to read it when the stats stop changing. That would work but it seems not ideal as I'd like my app to be as responsive as possible and calling stat on a bunch of files every second seems heavy as well as calling stat every 5 or 10 seconds seems unresponsive.
Is there some more robust way to get notified when a file has finished being modified?
So I did a project last year which required doing "file watching". There is a better library out there than fs.watch. Check out npm chokidar.
https://www.npmjs.com/package/chokidar
Underneath it uses fs.watch, but wraps better improvements around it.
There is a property called awaitWriteFinish. Really it's doing some polling on the file to determine whether or not the file is finished writing. I used it and it really works great.
Setting this property will allow you to work against that file, always ensuring that the file has been completely written. And you don't need to go off and implement your own method of determining if the file is complete. Should save a bunch of time.
Aside from that, I don't believe you can really get away from polling with regard to determining if a file is finished writing. Chokidar is still polling, it's just that you don't need to write the logic to do it. And you can configure the polling interval if CPU utilization is deemed to be too high.
Edit: Would also like to add, to just give it a shot and see how it works. I get you want it as responsive as possible... But having something working is better than having something not working at all. It might be that even with a polling solution it's not even an issue for you. If it's deemed a performance problem, then go address it at that time and seek a "better" solution.
I am monitoring multiple log files across multiple directories. I need to trigger an SSIS package when a file has fired an onchange event. Easy enough, but the complication is I don't want to trigger the SSIS package every time there is a change on the file. I want to wait and capture at least 5 minutes worth of changes to a specific file.
Having used FilewSystemWatcher before I know it triggers each onchange event in a new thread - My thought is to pass these events into a TPL block and have it wait for a specified time interval and then trigger an SSIS package. Basically triggering a related SSIS package every 5 minutes if there have been file change events.
If anyone could point me in the right direction as a starting point I would greatly appreciate it!
I can't tell from your question whether TPL Dataflow is a requirement or just an idea.
I'd probably keep it simple and avoid TPL dataflow. Just poll using a System.Threading.Timer and have it check System.IO.File.GetLastWriteTime on the file.
If you want to get fancy you could use Rx, convert the FilewSystemWatcher event to an Observable, and use the Buffer(TimeSpan) method.
TPL Dataflow doesn't have any intrinsic support for time windows you'd have to roll your own, probably using one of the aforementioned two methods to build it out. My experience with TPL Dataflow is that it's too big and cumbersome for small tasks, and too rudimentary for big tasks, so I'd avoid taking that approach.
I'm working on an application that processes (possibly large reaching one or two million lines) text (in tab separated form) files containing detail of items and since the processing time can be long I want to update a progress bar so the user knows that the application didn't just hang, or better, to provide an idea of the remaining time.
I've already researched and I know how to update a simple progress bar but the examples tend to be simplistic as to call something like progressBar.setProgress(counter++, 100) using Timer, there are other examples where the logic is simple and written in the same class. I'm also new to the language having done mostly Java and some JavaScript in the past, among others.
I wrote the logic for processing the file (validation of input and creation of output files). But then, if I call the processing logic in the main class the update will be done at the end of processing (flying by so fast from 0 to 100) no matter if I update variables and try to dispatch events or things like that; the bar won't reflect the processing progress.
Would processing the input by chunks be a valid approach? And then, I'm not sure if the processing delay of one data chunk won't affect the processing of the next chunk and so on, because the timer tick is set to be 1 millisecond and the chunk processing time would be longer than that. Also, if the order of the input won't be affected or the result will get corrupted in some way. I've read multithreading is not supported in the language, so should that be a concern?
I already coded the logic described before and it seems to work:
// called by mouse click event
function processInput():void {
timer = new Timer(1);
timer.addEventListener(TimerEvent.TIMER, processChunk);
timer.start();
}
function processChunk(event:TimerEvent):void {
// code to calculate start and end index for the data chunk,
// everytime processChunk is executed these indexes are updated
var dataChunk:Array = wholeInputArray.splice(index0, index1);
processorObj.processChunk(dataChunk)
progressBar.setProgress(index0, wholeInputArray.length);
progressBar.label = index0 + " processed items";
if(no more data to process) { // if wholeInputArray.length == index1
timer.stop();
progressBar.setProgress(wholeInputArray.length, wholeInputArray.length);
progressBar.label = "Processing done";
// do post processing here: show results, etc.
}
}
The declaration for the progress bar is as follows:
<mx:ProgressBar id="progressBar" x="23" y="357" width="411" direction="right"
labelPlacement="center" mode="manual" indeterminate="false" />
I tested it with an input of 50000 lines and it seems to work generating the same result as the other approach that processes the input at once. But, would that be a valid approach or is there a better approach?
Thanks in advance.
your solution is good, i use it most of time.
But multithreading is now supported on AS3 (for desktop and web only for the moment).
Have a look at: Worker documentation and Worker exemple.
Hope that helps :)
may I ask if this Timer AS IS is the working Timer ??? because IF YES then you are in for a lot of trouble with your Application in the long run! - re loading & getting the Timer to stop, close etc. The EventListener would be incomplete and would give problems for sure!
I would like to recommend to get this right first before going further as I know from experience as in some of my own AIR Applications I need to have several hundred of them running one after another in modules as well as in some of my web Apps. not quiet so intense yet a few!
I'm sure a more smother execution will be the reward! regards aktell
Use Workers. Because splitting data into chunks and then processing it is a valid but quite cumbersome approach and with workers you can simply spawn a background worker, do all the parsing there and return a result, all without blocking GUI. Worker approach should require less time to do parsing, because there is no need to stop parser and wait for the next frame.
Workers would be an ideal solution, but quite complicated to set up. If you're not up to it right now, here's a PseudoThread solution I use in similar situations which you can probably get up and running in 5 minutes:
Pseudo Threads
It uses EnterFrame events for balancing between work and letting the UI does its thing and you can manually update the progress bar within your 'thread' code. I think it would be easily adapted for your needs since your data is easily sliced.
Without using Workers (which it seems you are not yet familiar with) AS3 will behave single threaded. Your timers will not overlap. If one of your chunks takes more than 1s to complete the next timer event will be processed when it can. It will not queue up further events if it takes more than your time period ( assuming your processing code is blocking).
The previous answers show the "correct" solution to this, but this might get you where you need to be faster.
I'm returning A LOT (500k+) documents from a MongoDB collection in Node.js. It's not for display on a website, but rather for data some number crunching. If I grab ALL of those documents, the system freezes. Is there a better way to grab it all?
I'm thinking pagination might work?
Edit: This is already outside the main node.js server event loop, so "the system freezes" does not mean "incoming requests are not being processed"
After learning more about your situation, I have some ideas:
Do as much as you can in a Map/Reduce function in Mongo - perhaps if you throw less data at Node that might be the solution.
Perhaps this much data is eating all your memory on your system. Your "freeze" could be V8 stopping the system to do a garbage collection (see this SO question). You could Use V8 flag --trace-gc to log GCs & prove this hypothesis. (thanks to another SO answer about V8 and Garbage collection
Pagination, like you suggested may help. Perhaps even splitting up your data even further into worker queues (create one worker task with references to records 1-10, another with references to records 11-20, etc). Depending on your calculation
Perhaps pre-processing your data - ie: somehow returning much smaller data for each record. Or not using an ORM for this particular calculation, if you're using one now. Making sure each record has only the data you need in it means less data to transfer and less memory your app needs.
I would put your big fetch+process task on a worker queue, background process, or forking mechanism (there are a lot of different options here).
That way you do your calculations outside of your main event loop and keep that free to process other requests. While you should be doing your Mongo lookup in a callback, the calculations themselves may take up time, thus "freezing" node - you're not giving it a break to process other requests.
Since you don't need them all at the same time (that's what I've deduced from you asking about pagination), perhaps it's better to separate those 500k stuff into smaller chunks to be processed at the nextTick?
You could also use something like Kue to queue the chunks and process them later (thus not everything in the same time).