lftp mirror with prefix - ubuntu-14.04

I did a partial download of a directory full of logs from a remote ftp server.
For example mget *201610* this will pull down all the logs with 201610 as apart of the name.
What I would like to do is continue the download since I only have about 100 of the 400 files for that month. Since I already downloaded a lot of the files mirror -c would be the best option as it only downloads the stuff I don't have.
Question:
how do I incorporate mirror -c for only the subset of files that have 201610 in them? I dont want to start downloading all the files for other months just the missing ones from 201610
Thanks

Use -I option of mirror, it allows to specify a wildcard for files to be included.

Related

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!
You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

wget:: How to rename all the already dowloaded files as name given by wget --content-dispostion?

I have downloaded 1200 jpeg files using wget. But the name of the files are based on links from which they are downloaded.
For ex.
http://www.*.*.*/index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk will download the file with name index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk.jpg but its name on server is different. Now I want all the files to be named as the name on the server.
One way is to delete all the files and re-download it with wget option --content-dispostion but total size of the download is 8GB and downloading it again is not a good option.
How can I rename all the downloaded files as names on server?
Edit: Name of the jpeg files downloaded from the links using wget --content-disposition or browser would be like 2014:08:09_18:07:51_IMG_5543.jpg (not created by wget, it's oringinal name on server, uploader's file name). I want all the files to be named as their oringinal names without again downloading them.
If the webserver supports HEAD request, you can use commands like wget --server-response --spider $URL, otherwise you can use range 0-1 to get one byte only. After you have the response heahers, you can write a script to rename. – Wu Yongzheng

What is the command to get the .listing file from SFTP server using cURL command

In wget I am trying to get the list of files and its properties from FTP server using below Wget command,
wget --no-remove-listing ftp://myftpserver/ftpdirectory/
This will generate two files: .listing (this is what I am looking in cURL) and index.html which is the html version of the listing file.
My expectation:
In cURL how to achieve this scenario?
What is the command to get the .listing and index.html file from FTP/SFTP server using CURL.
This is what I found on http://curl.haxx.se/docs/faq.html#How_do_I_get_an_FTP_directory_li
If you end the FTP URL you request with a slash, libcurl will provide you with a directory listing of that given directory. You can also set CURLOPT_CUSTOMREQUEST to alter what exact listing command libcurl would use to list the files.
The follow-up question that tend to follow the previous one, is how a program is supposed to parse the directory listing. How does it know what's a file and what's a dir and what's a symlink etc. The harsh reality is that FTP provides no such fine and easy-to-parse output. The output format FTP servers respond to LIST commands are entirely at the server's own liking and the NLST output doesn't reveal any types and in many cases don't even include all the directory entries. Also, both LIST and NLST tend to hide unix-style hidden files (those that start with a dot) by default so you need to do "LIST -a" or similar to see them.
Thanks & Regards,
Alok
I have checked it on WIN->CygWin and it works for me:
Do not forget to use / at the end of the path.
$ curl -s -l -u test1#test.com:Test_PASSWD --ftp-ssl 'ftps://ftp.test.com/Ent/'

Keep files updated from remote server

I have a server at hostname.com/files. Whenever a file has been uploaded I want to download it.
I was thinking of creating a script that constantly checked the files directory. It would check the timestamp of the files on the server and download them based on that.
Is it possible to check the files timestamp using a bash script? Are there better ways of doing this?
I could just download all the files in the server every 1 hour. Would it therefore be better to use a cron job?
If you have a regular interval at which you'd like to update your files, yes, a cron job is probably your best bet. Just write a script that does the checking and run that at an hourly interval.
As #Barmar commented above, rsync could be another option. Put something like this in the crontab and you should be set:
# min hour day month day-of-week user command
17 * * * * user rsync -av http://hostname.com/ >> rsync.log
would grab files from the server in that location and append the details to rsync.log on the 17th minute of every hour. Right now, though, I can't seem to get rsync to get files from a webserver.
Another option using wget is:
wget -Nrb -np -o wget.log http://hostname.com/
where -N re-downloads only files newer than the timestamp on the local version, -b sends
the process to the background, -r recurses into directories and -o specifies a log file. This works from an arbitrary web server. -np makes sure it doesn't go up into a parent directory, effectively spidering the entire server's content.
More details, as usual, will be in the man pages of rsync or wget.

Can you use tar to apply a patch to an existing web application?

Patches are frequently released for my CMS system. I want to be able to extract the tar file containing the patched files for the latest version directly over the full version on my development system. When I extract a tar file it puts it into a folder with the name of the tar file. That leaves me to manually copy each file over to the main directory. Is there a way to force the tar to extract the files into the current directory and overwrite any files that have the same filenames? Any directories that already exist should not be overwritten, but merged...
Is this possible? If so, what is the command?
Check out the --strip-components (or --strippath) argument to tar, might be what you're looking for.
EDIT: you might want to throw --keep-newer into the mix, so any locally modified files aren't overwritten. And I would suggest testing new releases on a development server, then using rsync or subversion to carry over the changes.
I tried getting --strip-components to work and, while I didn't try that hard, I didn't get it working. It kept flattening the directory structure. In searching, I came across the following command that seems to do exactly what I want:
pax -r -f patch.tar -s'/patch///'
It's not tar, but hey, it works... Replace the words "patch" with whatever your tar file name is.
The option '--strip-components' allows you to trim parts of the embedded filenames. With that it is possible to do what you want.
For more help check out http://www.gnu.org/software/tar/manual/html_section/transform.html
I have just done:
tar -xzf patch.tar.gz
And it overwrites all the files that the patch contains.
I.e., if the patch was created for the contents of the app folder, I would extract it there. Results would be like this:
tar.gz contains: oldfolder/someoldfile.txt, oldfolder/newfolder/newfile.txt
before app looks like:
app/oldfolder/someoldfile.txt
Afterwards, app looks like
app/oldfolder/someoldfile.txt
oldfolder/newfolder/newfile.txt
And the "someoldfile.txt" is actually updated to what was in the tar.gz
Maybe this doesn't work with regular tar, only tar.gz. But I doubt it. I think it should work for everything, as long as user has write permissions.

Resources