How to download files which are created in last 24 hours using gsutil in GCP console? - linux

I have a directory in a gcp storage bucket. And there are 2 subdirectories in that bucket.
Is there a way to download files which are created in last 24 hours in those subdirectories using gsutil command from console?

gsutil does not support filtering by date.
An option is to create a list of files to download via another tool or script, one object name per line.
Use stdin to specify a list of files or objects to copy. You can use
gsutil in a pipeline to upload or download objects as generated by a
program. For example:
cat filelist | gsutil -m cp -I gs://my-bucket
or:
cat filelist | gsutil -m cp -I ./download_dir
where the output of cat filelist is a one-per-line list of files,
cloud URLs, and wildcards of files and cloud URLs.

I was able to achieve part of it using gcp console and shell.
Steps:
Go to storage directory in browser gcp console.
Click on filter and you'll get options to filter based on created before, created after etc.
Provide the date and apply filter
Click on Download button
Copy the command, Open the gcp shell and run it. The required files will be downloaded there.
Run the zip command in shell and archive the downloaded files.
Select the Download from shell options and provide file path to download.

Related

Error while using zip with -R# options and file-list; "Invalid command arguments (nothing to select from)"

I am trying to deploy different groups of files (of many different types) to different environments using Bitbucket Pipelines and AWS CodeDeploy. In Pipelines, I use the zip command (from apt-get) to package up all the files for the specific environment and upload them to CodeDeploy (using the CodeDeploy pipe, which seems to expect a zip file).
A recent change has made it unwieldy to use a single line command to feed the all the necessary files to zip, so I instead would like to use a file list. The problem is that I have several sub-folders where I need to recursively grab all files, while in others I need to grab specific files. I also need to preserve the folder structure.
I also don't want to have to add every single path by hand if possible, as there are a lot of files in these sub-directories, and I also want to reduce the amount of cases where we forget to add new files to these lists. I also don't want to require the developer running a script locally, but I am okay with creating a script for use in Pipelines.
I tried to use the -R# option with zip, but it gives the error zip error: Invalid command arguments (nothing to select from).
Example folder structure:
file1.txt
ziplist.txt
folder1/file2.js
folder1/file3.txt
folder1/folder2/file4.png
folder1/folder2/file5.jpg
folder1/folder3/file6.tsx
folder1/folder3/file7.mp3
The contents of ziplist.txt:
file1.txt
folder1/file2.js
folder1/folder2/file5.jpg
folder1/folder3/*
Using the command cat ziplist.txt | zip -R# application.zip, I'd expect to have a zip with the following files inside:
file1.txt
folder1/file2.js
folder1/folder2/file5.jpg
folder1/folder3/file6.tsx
folder1/folder3/file7.mp3
Appreciate any help.
I was able to make it work by removing the /* for the folder path in the zip list, and then use xargs to convert the list to command line parameters, like so:
cat ziplist.txt | xargs zip -r application.zip

List all RSYNCed folders in GCP GAE Linux

I set up some folders in GAE to be synced using the command -
gsutil rsync -r gs://sample1bucket1 ./sample1;
But I have forgotten what all places I have done it. How to list all these?
As per my understanding of your question, all your GAE folders are in cloud storage bucket, “sample1 bucket1”, and you are trying to sync them into directory “sample1”.If yes, then while writing the rsync commands you have to mention source and destination. So you should know where you are syncing all your files to, as per public documentation.
However,
you can list the folders in the current directory using the “ ls “
command to check for your destination folder and later cd into those
folders “cd simple1” (for your case) to see if the content has been
copied from your bucket to the file.
You can also list the number of running rsync processes using :
ps -ef| grep rsync | wc -l
I am leaving some information regarding the commands, in case you need them :
You can list all objects in a bucket using :
gsutil ls -r gs://bucket
You can list the directory with detailed information using :
rsync --list-only username#servername:/directoryname
You can list the folder contents using :
rsync --list-only username#servername:/directoryname/
You can also use the following command to parse out exactly what you need :
rsync -i

Linux zip selected folder and create a download link for zipped file

My current directory contains web,api,logs,and some-backup-directory. I want to zip only web and api directory in a single zipped archive and create a direct download link for it, so i will download it over http:// from anywhere because downloading over ftp connection will take more time and also don't allow me to do other tasks on server at the same time. I am using this command to zip the files on server
zip -r mybackup-web.zip /home/projects/web
zip -r mybackup-api.zip /home/projects/api
But it will create two zip files, i need both in one.
I am using windows 7 in my local and Debian 8 on server. I am using putty to connect to the server and execute server commands.
Using zip
What you are doing actually works according to zip's man page:
zip -r <target> <dir1> # Add files from dir1 to archive
zip -r <target> <dir2> # Add files from dir2 to archive
If you execute both commands from the same working directory, the second command updates the existing zip file rather than create a new one.
Using tar
You could also use tar:
tar -zcvf <target>.tar <dir1> <dir2> ...
Flags:
c: Create a new archive containing the specified items
v: Produce verbose output (OPTIONAL)
f: Write the archive to the specified file
z: Compress using gzip
In your case:
tar -zcvf mybackup.zip /home/projects/web /home/projects/api
You can later extract it using:
tar -zxvf mybackup.zip

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!
You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

Google Cloud Storage - GSUtil - Copy files, skip existing, do not overwrite

I want to sync a local directory to a bucket in Google Cloud Storage. I want to copy the local files that do not exist remotely, skipping files that already exist both remote and local. Is this possible to do this with GSUtil? I cant seem to find a "sync" option for GSUtil or a "do not overwrite". Is it possible to script this?
I am on Linux (Ubuntu 12.04)?
gsutil supports the noclobber flag (-n) on the cp command. This flag will skip files that already exist at the destination.
You need to add (-n) to the command, mentioned officially on Google Cloud Platform:
-n: No-clobber. When specified, existing files or objects at the destination will not be overwritten. Any items that are skipped by this option will be reported as being skipped. This option will perform an additional GET request to check if an item exists before attempting to upload the data. This will save retransmitting data, but the additional HTTP requests may make small object transfers slower and more expensive.
Example (Using multithreading):
gsutil -m cp -n -a public-read -R large_folder gs://bucket_name
Using rsync, you can copy missing/modified files/objects:
gsutil -m rsync -r <local_folderpath> gs://<bucket_id>/<cloud_folderpath>
Besides, if you use the -d option, you will also delete files/objects in your bucket that are not longer present locally.
Another option could be to use Object Versioning, so you will replace the files/objects in your bucket with your local data, but you can always go back to the previous version.

Resources