amazon cloudfront: find all assets which are cached - amazon-cloudfront

I need to write a task to remove all assets from our cloudfront bucket when we deploy.
Currently, I do not track dynamically created assets (specifically our css files which are created via sass) in my version repository so it is impossible for me to say "diff this commit with that commit and give me all the css files that have changed b/c those are the files we need to invalidate".
Given this, I am thinking I just need to invalidate all assets of a given type, say css, whenever I deploy. However, I do not see how it is I can retrieve any/all assets cloudfront has cached.
Does anybody know how to ask cloudfront for all active assets which they have cached in my bucket? Or even better, all assets of a given type (css, js, png) that they have cached?

Related

Accessing custom metadata through Spark's Auto Loader

Suppose I have multiple ZIP files being uploaded onto some directory within an AWS S3 bucket. Each file also has various custom metadata that have been assigned to it during upload. I want to set up an ETL pipeline where I periodially extract any newly uploaded files for process. Spark's Auto Loader seems like a good tool for this job, though I also need the files' custom metadata. Is this possible through Spark's Auto Loader? I've been searching and searching but can't find anything relating reading custom metadata.

How do I get all files (paths) of a specific commit from gitlab api?

I want to load the file contents from the gitlab api using a specific tag.
This can be achieved by using the blob for each file (https://docs.gitlab.com/ee/api/repository_files.html).
The problem I am facing is that I do not see the way to find out what file paths are available for a given commit (a tag in my case).
I am looking for something similar to github's tree object.
How do I get all the files and their respective paths with a given commit hash from the gitlab api to load their contents?
Thanks :)
You can get a list of repository files and directories in a project by calling this API:
GET /projects/:id/repository/tree
See here for more information such as optional parameters.

How to mantain files in Heroku that are not in git repo after a deploy

This is the scenario: my Node.js app creates a file on the server that stores all user data (nothing sensitive) in a file called database.json.
I currently have two environments:
my local machine, with a database.json,
and my production environment on Heroku, with its own database.json
Every time I push a new version of my app, from my machine to production, I would not like to replace the production database.json file with the one I have in my machine. For that, I used the .gitignore to ignore this file, so this file is not on GitHub, because it is created within the app execution
But every time I push a new version to Heroku, instead of keeping the old database.json file, Heroku erases it. It looks like Heroku completely overwrites all the app folder with the one downloaded from Git instead of "merging" with what is already there.
How can I set up a file on Heroku that I can use to save user data that is not affected by git pushes and deploys?
How can I set up a file on Heroku that I can use to save user data that is not affected by git pushes and deploys?
You can't.
Furthermore, this isn't just about deploys: your file will be lost at other times too.
Your file isn't being overwritten by Git; Heroku's filesystem is ephemeral. Any changes made to it will be lost any time the dyno restarts. This happens frequently (at least once per day) whether you deploy or not.
You will need to change how you're persisting data. I suggest using a real client-server database, but you could also save your file in third-party object storage using something like Amazon S3 or Azure Blob Storage.

Deploy Node.js app to AWS elastic beanstalk that contains static assets

I'm having some trouble visualizing how I should handle static assets with my EB Node.js app. It only deploys whats committed in the git repo when you do eb deploy (correct?) but I don't want to commit all our static files. Currently we are uploading to S3 and the app references those (the.s3.url.com/ourbucket/built.js), but now that we are setting up dev, staging, and prod envs we can reference built.js since there can be up to 3 versions of it.
Also, there's a timespan where the files are uploaded and the app is rolling out and the static assets don't work with the two versions up on the servers (i.e. built.js works with app version 0.0.2 but server one is deploy 0.0.1 and server two is running version 0.0.2)
How do keep track of these mismatches or is there a way to just deploy a static assets to the EB instance directly.
I recommend a deploy script that uploads relevant assets to S3 then perform the Elasticbeanstalk deploy. In that script, upload the S3 assets to a folder with the name of the env, so you'll have
the.s3.url.com/ourbucket/production/
the.s3.url.com/ourbucket/staging/
the.s3.url.com/ourbucket/dev/
Then you have the issue of old assets during deploy - in general, you should probably be CDNing these assets (I recommend CloudFront because its so easy to integrate once you're already on AWS) and you should be worrying about cache invalidation during deploy anyway. One strategy for dealing with that is to assign an ID to each deploy (either the first 7 letters of the git sha1 or a timestamp) and put all assets in a new folder with that name, then reference that on your HTML pages. So lets say you go with timestamp, and you deploy at 20150204-042501 (that's 4:25 and 1 second UTC on February 4, 2015) so you'd upload your assets to the.s3.url.com/ourbucket/production/20150204-042501/. Your HTML would say
<script src="//the.s3.url.com/ourbucket/production/20150204-042501/built.js" />
That solves both the "during deploy" problem and cache invalidation.

should I see all the new added files at my svn at my server repository?

I created a new project in svn svnadmin create /myrepo in my server, with my client I did a checkout and add new files, later a commit, so, if I make a checkout from another computer I get the recently added files, which is perfect, but at my folder /myrepo still is no file, all the new files that were added from my client are not visible there, I know they implement many algoritms to take the version control, my question is, should I be able to see all the new files added from /myrepo in my server, without need to make a checkout with a client or something like that??
I want to know where my files are saved at my server,
Thanks
No. The files are stored in the repository you created, but in a specialized database. If you go to myrepo and look in the db folder, you'll see that there are revision files stored there. Those files contain the structure and data of the repository at specific instances in time. The Subversion book has some information on the structure. You can also look at the documentation in the actual Subversion repository about the structure used to store the data.

Resources