I am currently making a little API which returns jsons according to an input. This API needs to run some local programs on the server and also needs to place some temporary files. It all works if I ask the API once a time and just with one "user".
The problem is, that I only have one temp folder to store the temporary files.so whenever there are multiple API queries, the tmp folder screws up - the data in there mixes up.
What would be a good way to have an API using temp files - and still keep it working if every run needs its own temp data?
The current process is:
server/api/getGeometry?lat=52.5167776&lon=13.4092091&bboxSize=2000&output=glb
server queries zips according to lat/lon
unpacks
converts
does voodoo
generates json
sends back the json
cleans up the tmp folder
next run same story...
so every time the same tmp folder, I guess it's not the way to go.
Thanks a lot for ideas!
Related
I am trying to copy files out of a s3 bucket using azure data factory. Firstly I want a list of the directories.
Using the CLI I would use. {aws s3 ls }
From there I can determine from the list in a foreach an push that into a variable.
In adf, I have tried to use 'get metadata', although this works in theory. In practice there are 76 files in each directory and the loop is over 1.5m. This just isn't worth it, it takes far too long, especially as the directories only takes about 20 seconds for 20000 directories.
Is there a method to do this list. When creating the dataset we have a no permissions, however when we use specific location it does.
Many thanks
I have found another way of completing this task.
So to begin with I am using get metadata with the child option. It produces an array.
I push this into a string variable. With this variable you can then create a stored procedure to pick this apart, using openjson to get just the value. This can then be pulled apart further to get the directory names.
I then merge these into a table.
Using lookup I can then run another stored procedure to return the value I require from the table. This whole process runs in a couple of minutes.
Anyone who wants a further explanation, please ask, I will try and create a walk through to assist
I'd like my bash script to perform an action every time new file is downloaded to /Downloads (generate hash of downloaded file and send it to API). So far I've been trying to make use of "inotify-tools", but it works only for newly created file and that won't do.
Script should work like this:
I download a file via browser (normal way)
Script notices new file and is executed automatically
Thanks in advance for help :D
You can use /etc/crontab to check ~/Downloads folder at startup and every n minutes. Script that will run every nth minute can do either
Keep the number of files. If number decreases script updates cache. And if number increases then gets the latest created file (or modified) and sends that file's hash to the api via curl.
Keep the name of files. If a file no longer exists, script then updates the cache of file names. If a new file appears again hashes and sends hash to the api via curl.
You can keep cache of files under /tmp.
If you can provide an example scenario I can write a simple script
I have a list of file paths that I want to check in GCS. It looks like this:
for path in all_paths:
try:
gcs.blob(GCS_BUCKET_NAME, path).exists()
except google.api_core.exceptions.NotFound:
missing_paths.append(path)
This works fine but it takes a lot of time as requests are sent one by one, for each path. Is there a way to send batches of requests in google cloud storage API ? Or any way to speed up this check ?
With Cloud Storage you can only filter by the path prefix (path/to/file.xxx). Then you will receive all the files matching this prefix even the sub path (path/to/sub/path/file.xxx). Therefore, the rest of the processing is to perform by yourselves.
And yes, if you have lot of files, it will take lot of time.
I though this would be simple, but i have been caught by the simplest of puzzles which i can't find the answer to anywhere,
I have some code which reads images and then OpenCV looks for differences.
I read files with the following command
vs = cv2.VideoCapture("/home/andrew/images/image_%6d.jpg")
and this work perfectly with images called image_000000.jpg image_000001.jpg
However i don't want to rename my images so i would like to read files called
MDAlarm_20180921-031140.jpg whcih contain the date then time.
What is the printf format for this ? as what ever I try it does not work i.e no files found or do the files need to start from 0 , so i need to append an index
starting at 000000?
Lastly once i have this working how can i tell which file is being processed ?
Many Thanks
Andrew
When I run the command in the terminal back to back, it doesn't sync the second time. Which is great! It shouldn't. But, if I run my build process and run aws s3 sync programmatically, back to back, it syncs all the files both times, as if my build process is changing something differently the second time.
Can't figure out what might be happening. Any ideas?
My build process is basically pug source/ --out static-site/ and stylus -c styles/ --out static-site/styles/
According to this - http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
S3 sync compares the size of the file and the last modified timestamp to see if a file needs to be synced.
In your case, I'd suspect the build system is resulting in a newer timestamp even though the file size hasn't changed?
AWS CLI sync:
A local file will require uploading if the size of the local file is
different than the size of the s3 object, the last modified time of
the local file is newer than the last modified time of the s3 object,
or the local file does not exist under the specified bucket and
prefix.
--size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.
You want the --size-only option which looks only at the file size not the last modified date. This is perfect for an asset build system that will change the last modified date frequently but not the actual contents of the files (I'm running into this with webpack builds where things like fonts kept syncing even though the file contents were identical). If you don't use a build method that incorporates the hash of the contents into the filename it might be possible to run into problems (if build emits same sized file but with different contents) so watch out for that.
I did manually test adding a new file that wasn't on the remote bucket and it is indeed added to the remote bucket with --size-only.
This article is a bit dated but i'll contribute nonetheless for folks arriving here via google.
I agree with checked answer. To add additional context, AWS S3 functionality is different than standard linux s3 in a number of ways. In Linux, an md5hash can be computed to determine if a file has changed. S3 does not do this, so it can only determine based on size and/or timestamp. What's worse, AWS does not preserve timestamp when transferring either way, so timestamp is ignored when syncing to local and only used when syncing to s3.