I'd like my bash script to perform an action every time new file is downloaded to /Downloads (generate hash of downloaded file and send it to API). So far I've been trying to make use of "inotify-tools", but it works only for newly created file and that won't do.
Script should work like this:
I download a file via browser (normal way)
Script notices new file and is executed automatically
Thanks in advance for help :D
You can use /etc/crontab to check ~/Downloads folder at startup and every n minutes. Script that will run every nth minute can do either
Keep the number of files. If number decreases script updates cache. And if number increases then gets the latest created file (or modified) and sends that file's hash to the api via curl.
Keep the name of files. If a file no longer exists, script then updates the cache of file names. If a new file appears again hashes and sends hash to the api via curl.
You can keep cache of files under /tmp.
If you can provide an example scenario I can write a simple script
Related
I'm talking to a server that creates a new zip file daily, ex: (data-1234.zip). Every day the name of the previous zip is removed and a new one is created with an incremented number, ex: (data-1235.zip). The script will be run sporadically throughout the week but it's on a lab system where the user can't manually update the name with what's on the server.
The server only has one zip file in that directory, it's just a matter of getting the correct naming convention. There is, however a "data.ini" file in the folder as well, so something just searching by first name wouldn't necessarily work. I've seen posts similar to This question using regex but the file is currently on 10,609 and I'd rather not use expansion for potentially thousands of calls depending on access to modify the script in the coming years. I've been searching for something similar to "data-*.zip" but haven't had any luck.
Question was solved by changing commands and running
lftp https://download.companyname.com/product/data/ -e "mget data-*.zip; bye"
since lftp allows wildcards in the filename, unlike curl.
I am currently making a little API which returns jsons according to an input. This API needs to run some local programs on the server and also needs to place some temporary files. It all works if I ask the API once a time and just with one "user".
The problem is, that I only have one temp folder to store the temporary files.so whenever there are multiple API queries, the tmp folder screws up - the data in there mixes up.
What would be a good way to have an API using temp files - and still keep it working if every run needs its own temp data?
The current process is:
server/api/getGeometry?lat=52.5167776&lon=13.4092091&bboxSize=2000&output=glb
server queries zips according to lat/lon
unpacks
converts
does voodoo
generates json
sends back the json
cleans up the tmp folder
next run same story...
so every time the same tmp folder, I guess it's not the way to go.
Thanks a lot for ideas!
When I run the command in the terminal back to back, it doesn't sync the second time. Which is great! It shouldn't. But, if I run my build process and run aws s3 sync programmatically, back to back, it syncs all the files both times, as if my build process is changing something differently the second time.
Can't figure out what might be happening. Any ideas?
My build process is basically pug source/ --out static-site/ and stylus -c styles/ --out static-site/styles/
According to this - http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
S3 sync compares the size of the file and the last modified timestamp to see if a file needs to be synced.
In your case, I'd suspect the build system is resulting in a newer timestamp even though the file size hasn't changed?
AWS CLI sync:
A local file will require uploading if the size of the local file is
different than the size of the s3 object, the last modified time of
the local file is newer than the last modified time of the s3 object,
or the local file does not exist under the specified bucket and
prefix.
--size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.
You want the --size-only option which looks only at the file size not the last modified date. This is perfect for an asset build system that will change the last modified date frequently but not the actual contents of the files (I'm running into this with webpack builds where things like fonts kept syncing even though the file contents were identical). If you don't use a build method that incorporates the hash of the contents into the filename it might be possible to run into problems (if build emits same sized file but with different contents) so watch out for that.
I did manually test adding a new file that wasn't on the remote bucket and it is indeed added to the remote bucket with --size-only.
This article is a bit dated but i'll contribute nonetheless for folks arriving here via google.
I agree with checked answer. To add additional context, AWS S3 functionality is different than standard linux s3 in a number of ways. In Linux, an md5hash can be computed to determine if a file has changed. S3 does not do this, so it can only determine based on size and/or timestamp. What's worse, AWS does not preserve timestamp when transferring either way, so timestamp is ignored when syncing to local and only used when syncing to s3.
I have a process that periodically gets files from a server and copy them with SFTP to a local directory. It should not overwrite the file if it already exists. I know with something like Winston I can automatically rotate the log file when it fills up, but in this case I need a similar functionality to rotate files if they already exist.
An example:
The routine copies a remote file called testfile.txt to a local directory. The next time it's run the same remote file is found and copied. But now I want to rename the first testfile.txt to testfile.txt.0 so it's not overwritten. And so on - after a while I'd have a directory of files with the name testfile.txt.N and the most recent testfile.txt.
What you can do is you can append date and time on the file name that gives every filename a unique name and also helps you archive it.
For example you text.txt can be either 20170202_181921_test.txt or test_20170202_181921.txt
You can use a JavaScript Date Object to get date and time.
P.S show your code of downloading files so that I can add more to that.
I have one file delivered in a ftp daily. This file doesn´t have the same name everyday. It has the date and the hour of the creation. For example, today the file has the name 20130814_XX_YY_20130814152345, created at 15:23:45 and tomorrow the file can name 20130815_XX_YY_20130815152421. The _XX_YY_ is always the same but the hour will change everyday.
I want to create a host jcl that gets this file with variable name and rename it to a host file. How can I do this ?
Thank you
Regards
Chuchito
STEP1: You can use LS in FTP to write to disk, so you can have a file with the file-name in it. Then GET that file.
STEP2: Process the contents of your file to generate the FTP Control Cards (at least for the GET). The GET generated will be of the form GET 20130814_XX_YY_20130814152345 'HLQ.MAINFRAM.DATASET', where the server name has come from the file GETted in STEP1 and the local (Mainframe) file can be hard-coded, or supplied to the generation if flexibility is required.
STEP3: Run FTP again with the Control Card(s) generated.
Isn't there anything in the Spec?
Sometimes we create complexities where an "out of the box" solution simplifies life considerably.
After the post updated, I now understand the problem a bit better.
If the name is required to be so specific, then the other suggested solution (if i understand it) is to have a fixed file name on the server that contains a list of file names to be uploaded.
In fact, the server could create a fixed file name that is really the JCL to run on the mainframe!!! This file would include the //SYSIN DD * and GET commands! The mainframe uploads this file and submits it as-is to the job reader, which then runs on the mainframe. The last step of this job (created by the server, but run on the mainframe) is to FTP an empty JCL file back to the server, in this way the server "knows" that the mainframe has uploaded the files.
Alternatively, why does the non-Z\os system need to name the file with time information? If the mainframe processes the file daily then date should be sufficient.
With this change the mainframe can reliably predict the file name for the day, generate the appropriate GET command and run.
With a job scheduler it would be easy to run the upload to the mainframe twice a day. This might address any concerns that are expressed in the desire to include a time in the file's name.
Run a Rexx step via a Background TSO step:
Background TSO step
You can then run a listcat to get all the files. You could either write the listcat output to a file and read it in or trap the output via the Address command
or the OutTrap function.
Then use the standard TSO Rename command.
Alternatively you could run ISPF background rexx program and use the ISPF equivalents to get the file name
(1) The real solution to this should be through a scheduling tool for Mainframe jobs. These tools provide capabilities to take care of formatting like the one you described.
(2) Alternatives: REXX and COBOL
(3) If you don't want to prefer REXX, here's a little brief into how you could create the JCL dynamically using COBOL:
A COBOL program that would read a "template" JCL.
Using INSPECT / REPLACE, you could substitute the prototypes with the string that is populated with the date of your choice (you could supply this as a simple SYSIN parm too, if you want the COBOL code to be flexible on the date selection)
Now that your formatted JCL is ready, you could write it to the output stream
//OUTFILE DD SYSOUT=(INTRDR,)
or
//OUTFILE DD SYSOUT=(,INTRDR)
Anything that is written to INTRDR (Internal Reader), goes straight to JES to submit your job!
Hope this helps.