I have a vba script that extract information from huge text files and does a lot of data manipulation and calculations on the extracted info. I have about 1000 files and each take an hour to finish.
I would like to run the script on as many computers (among others ec2 instances) as possible to reduce the time needed to finish the job. But how do I coordinate the work?
I have tried two ways: I set up a dropbox as a network drive with one txt file with the current last job number thart vba access, start the next job and update the number but there is apparently too much lag between an update on a file on one computer is updated throughout the rest to be practical. The second was to find a simple "private" counter service online that updated for each visit so han would access the page, read the number and the page would update the number for the next visit from another computer. But I have found no such service.
Any suggestions on how to coordinate such tasks between different computers in vba?
First of all if you can use a proper programming language, forexample c# for easy parallel processing.
If you must use vba than first optimize your code first. Can you show us the code?
Edit:
If you must than you could do the following. First you need some sort of fileserver to store all text files in a folder.
Then in the macro, foreach every .txt file in folder,
try to open the first in exclusive mode if the file can be opened, then ("your code" after your code is finished move the file elsewhere) else Next .txt file.
Related
I want to reduce size of hidden AppData sub folders by making adjustments in large data size folders applications as I am falling short of space in drive C. To do this I searched microsoft community and in google. Some VBA code routines do not work as Permission denied message comes up. I have already done disk cleaning of the drive as a part of windows 10 feature. I just want a list of sub folders and files along with their sizes so as to take remedial steps. Manual process is very time consuming. Is there a VBA routine or function which takes due consideration of permissions on hidden folders.
Can any one help in this regard.
I've asked this question in the Azure Logic Apps (LA) forum since I've used LA to implement this process but I may also get some valuable input here.
High-level description: in one specific client, we need to download, daily, dozens of file from a SFTP location to our servers in order to process their data. This workflow was built, in the past, using tools from a different technology than Azure but what we aimed to have was a general process that could be used for different source systems, different files, etc. With that in mind, our process retrieves, in the beginning, from a database, different variables to be applied to each execution of the process such as:
Business date
Remote location path - sftp location
Local location path - internal server location
File extension - .csv, .zip, etc
Number of iterations
Wait time between iterations
Dated files - whether files have business date on their name or not
Once all this is defined in the beginning of the process (there's some extra logic to it, not as straight forward as just getting variables but let's assume this for example purposes), the following logic is applied (the image below may help understand the LA flow):
Search for file in SFTP location
If file is there, get file size, wait X amount of time and check size again.
If file isn't there, try again until reaching maximum number of iterations or file is found
If file's size match, proceed to download the file
If file's size don't match, try again until reaching maximum number of iterations or file is found
LA Flow
In our LA, this process is implemented and working fine, we have 2 parameters in LA which is the filename and source system and based on these parameters, all variables are retrieved in the beginning of the process. Those 2 parameters can be changed from LA to LA and by scripting we can automatically deploy multiple LAs (one for each file we need to download). The process uses a schedule trigger since we want to run it at a specific time each day, we don't want to use the trigger of when a file is placed in a remote location since several files may be placed in the remote location which we aren't interested in.
One limitation that I can see compared to our current process is having multiple LAs grouped under one type of pipeline, where we can group multiple executions and check the state of them all without needing to check multiple LAs. I'm aware that we can do monitoring of LAs with OMS and, potentially, call multiple LAs from a Data Factory pipeline but I'm not exactly sure how that would work in this case.
Anyway, here is where my QUESTION comes in: what would be the best feature in Azure to implement this type of process? LAs works since I have it built, I'm going to take a look at replicate the same process in Data Factory but I'm afraid it may be a bit more complicated to set up this kind of logic there. What else could potentially be used? Really open to all kind of suggestions, just want to make sure I consider all valid options which is hard considering how many different features are offered by Azure and it's hard to keep track of them all.
Appreciate any input, cheers
I developed a vb.net program that uses excel file to generate some reports.
Once the program takes too much time to generate a report, I usually do other things while the program is running. The problem is that sometimes I need to open other excel files and the excel files used in the program are shown to me. I want to still hide those files being processed even when I run other excel files. Is this possible? Thanks
The FileSystem.Lock Method controls access by other processes to all or part of a file opened by using the Open function.
The My feature gives you better productivity and performance in file I/O operations than Lock and Unlock. For more information, see FileSystem.
More information here.
I have a class that does some parsing of two large (~90K rows, 11 columns in the first and around ~20K, 5 columns in the second) CSV files. According to the specification I'm working with the CSV files can be externally changed (removing/adding of new rows; columns remain constant as well as the paths). Such updates can happen at any time (though highly unlikely that an update will be launched in time intervals shorter than a couple of minutes) and an update of any of the two files has to terminate the current processing of all that data (CSV, XML from an HTTP GET request, UDP telegrams), followed by re-parsing the content of each of the two (or just one if only one has changed).
I keep the CSV data (quite reduced since I apply multiple filters to remove unwanted entries) in memory to speed working with it and also to avoid unnecessary IO operations (opening, reading, closing file).
Right now I'm looking into the QFileSystemWatcher, which seems to be exactly what I need. However I'm unable to find any information on how it actually works internally.
Since all I need is to monitor 2 files for changes the number of files shouldn't be an issue. Do I need to run it in a separate thread (since the watcher is part of the same class where the CSV parsing happens) or is it safe to say that it can run without too much fuss (that is it works asynchronously like the QNetworkAccessManager)? My dev environment for now is a 64bit Ubuntu VM (VirtualBox) on a relatively powerful host (a HP Z240 workstation) however the target system is an embedded one. While the whole parsing of the CSV files takes just 2-3 seconds at the most I don't know how much performance impact there will be once the application gets deployed so additional overhead is something of a concern of mine.
Hey! i want to know the best solution for my problem.
I have a signature generator http://www.anitard.org/siggen/siggen_stripes/ where people can upload their own images for the signature. The problem is that my storage will get full pretty fast if i dont somehow have a script that deletes the images when they are done with the signature.
What is the best solution for this?
My initial feeling on this would be to not save the uploaded files at all, but to just delete them as soon as the image is generated. However, some browsers might request the image again when the user tries to save the image -- I know this is true with Firefox's DownloadThemAll extension, for instance. So you'll probably have to store the files for a short amount of time, like #JustLoren suggests.
A quick Google search for "php delete temp files" turns up at least one script explaining exactly how to delete files after a certain amount of time. This would not have to be run as an external script or cron job; it could merely be tacked on to the upload script, for instance.
One flaw in the given script is that someone could rapidly upload many files in a row, exceeding your disk quota. You might want to expand on the linked script by deleting any files older than the last 50 or however many. To do that, just check the count of the matched files, sort by creation time, and delete any with index greater than 50.
Personally, I would have a script that runs every hour (or day, depending on volume) that checks the file's creation date and deletes it if the time is over an hour old. Realistically, users should save their images to their harddrives within 2 minutes of creating them, but you can't count on it. An hour seems like a nice compromise.