Downloading Specific Filenames with FTP - linux

I have about 1,000,000 files and I should do FTP to get some specific files.
in 1,000,000 files with the name of ML0000000-ML1000000 i want specific file starts ML00002222 till ML00899999.
can anyone help me how to edir mget for ftp ?
######login to FTP server:#####
ftp -inv 172.0.0.1
user Codegirl $$$$
#######cd to ftp server#########
cd /root/desktop
######cd to local PC#############
lcd /root/myfile
*mget ML* ??? (how can i change it to specific file name?)*

If it was me I'd use a loop and wget.
cd /root/myfile
for i in $(seq -f "%08g" 2222 899999)
do
wget --username=un --password=pw ftp://172.0.0.1/root/desktop/ML${i}
done
This does require wget to reconnect each time, it's going to take time anyway so go and run the script and grab a cup of tea. I use loops like this all the time and it works well.

Related

wget to download new wildcard files and overwrite old ones

I'm currently using wget to download specific files from a remote server. The files are updated every week, but always have the same file names. e.g new upload file1.jpg will replace local file1.jpg
This is how I am grabbing them, nothing fancy :
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/file1.jpg
This downloads file1.jpg from the remote server if it is newer than the local version then overwrites the local one with the new one.
Trouble is, I'm doing this for over 100 files every week and have set up cron jobs to fire the 100 different download scripts at specific times.
Is there a way I can use a wildcard for the file name and have just one script that fires every 5 minutes for example?
Something like....
wget -N -P /path/to/local/folder/ http://xx.xxx.xxx.xxx/remote/files/*.jpg
Will that work? Will it check the local folder for all current file names, see what is new and then download and overwrite only the new ones? Also, is there any danger of it downloading partially uploaded files on the remote server?
I know that some kind of file sync script between servers would be a better option but they all look pretty complicated to set up.
Many thanks!
You can specify the files to be downloaded one by one in a text file, and then pass that file name using option -i or --input-file.
e.g. contents of list.txt:
http://xx.xxx.xxx.xxx/remote/files/file1.jpg
http://xx.xxx.xxx.xxx/remote/files/file2.jpg
http://xx.xxx.xxx.xxx/remote/files/file3.jpg
....
then
wget .... --input-file list.txt
Alternatively, If all your *.jpg files are linked from a particular HTML page, you can use recursive downloading, i.e. let wget follow links on your page to all linked resources. You might need to limit the "recursion level" and file types in order to prevent downloading too much. See wget --help for more info.
wget .... --recursive --level=1 --accept=jpg --no-parent http://.../your-index-page.html

Debian: Cron bash script every 5 minutes and lftp

We have to run a script every 5 minutes for downloading data from an FTP server. We have arranged the FTP script, but now we want to download automatic every 5 minutes the data.
We can use: "0 * * * * /home/kbroeren/import.ch"
where import the ftp script is for downloading the data files.
The point is, the data files become every 5 minutes available on the FTP server. Sometimes this where will be a minute offset. It would be nice to download the files when they become a couple of seconds be available on the FTP server. Maybe a function that scans the ftp file folder if the file is allready available, and then download the file, if not... the script will retry it again in about 10 seconds.
One other point to fix is the time of the FTP script. there are 12k files in the map. We should only the newest every time we run the script. Now scanning the folder takes about 3 minutes time thats way too long. The filename of all the datafiles contains date and time, is there a possibility to make a dynamic filename to download the right file every 5 minutes ?
Lot os questions, i hope someone could help me out with this!
Thank you
Kevin Broeren
Our FTP script:
#!/bin/bash
HOST='ftp.mysite.com'
USER='****'
PASS='****'
SOURCEFOLDER='/'
TARGETFOLDER='/home/kbroeren/datafiles'
lftp -f "
open $HOST
user $USER $PASS
LCD $SOURCEFOLDER
mirror --newer-than=now-1day --use-cache $SOURCEFOLDER $TARGETFOLDER
bye
"
find /home/kbroeren/datafiles/* -mtime +7 -exec rm {} \;
Perhaps you might want to give a try to curlftpfs. Using this FUSE filesystem you can mount an FTP share into your local filesystem. If you do so, you won't have to download the files from FTP and you can iterate over the files as if they were local. You can give it a try following these steps:
# Install curlftpfs
apt-get install curlftpfs
# Make sure FUSE kernel module is loaded
modprobe fuse
# Mount the FTP Directory to your datafiles directory
curlftpfs USER:PASS#ftp.mysite.com /home/kbroeren/datafiles -o allow_other,disable_eprt
You are now able to process these files as you wish. You'll always have the most recent files in this directory. But be aware of the fact, that this is not a copy of the files. You are working directly on the FTP server. For example removing a file from /home/kbroeren/datafiles will remove it from the FTP server.
If this works foor you, you might want to write this information into /etc/fstab, to make sure the directory is mounted with each start of the mashine:
curlftpfs#USER:PASS#ftp.mysite.com /home/kbroeren/datafiles fuse auto,user,uid=USERID,allow_other,_netdev 0 0
Make sure to change USERID to match the UID of the user who needs access to this files.

Keep files updated from remote server

I have a server at hostname.com/files. Whenever a file has been uploaded I want to download it.
I was thinking of creating a script that constantly checked the files directory. It would check the timestamp of the files on the server and download them based on that.
Is it possible to check the files timestamp using a bash script? Are there better ways of doing this?
I could just download all the files in the server every 1 hour. Would it therefore be better to use a cron job?
If you have a regular interval at which you'd like to update your files, yes, a cron job is probably your best bet. Just write a script that does the checking and run that at an hourly interval.
As #Barmar commented above, rsync could be another option. Put something like this in the crontab and you should be set:
# min hour day month day-of-week user command
17 * * * * user rsync -av http://hostname.com/ >> rsync.log
would grab files from the server in that location and append the details to rsync.log on the 17th minute of every hour. Right now, though, I can't seem to get rsync to get files from a webserver.
Another option using wget is:
wget -Nrb -np -o wget.log http://hostname.com/
where -N re-downloads only files newer than the timestamp on the local version, -b sends
the process to the background, -r recurses into directories and -o specifies a log file. This works from an arbitrary web server. -np makes sure it doesn't go up into a parent directory, effectively spidering the entire server's content.
More details, as usual, will be in the man pages of rsync or wget.

FTP specific files

Can we ftp specific files from a directory. And these specific files that needs to be transferred will be specified in config file.
Can we use a for loop once logged into ftp (in a script) for this purpose.
Will a normal ftp work when transferring files from Unix to win ftp server.
Thanks,
Ravi
You can use straight shell. This assumes your login directory is /home/ravi
Try this one time only:
echo "machine serverB user ravi password ravipasswd" > /home/ravi/.netrc
chmod 600 /home/ravi/.netrc
test that .netrc works - ftp serverB should log you straight in.
Shell script that reads config.file, which is just a list of files to send
while read fname
do
ftp serverB <<EOF
get $fname
bye
EOF # leave the EOF in column #1 of the script file
done < config.file
This gets file from serverB. Change get $fname to put $fname to send files from serverA to serverB
That certainly is possible. You can transfeer files listed in some file by implementing a script using an ftp client (buildin or via calling a cli client). The protocol is system independant, therefore it is possible to transfer files between systems running different operating systems. There is only one catch: remember that MS-Windows uses a case insensitive file system, other systems differ in that.

How to copy all files via FTP in rsync

I have online account with some Host which give me FTP account with username and password .
i have another with copany which gave me FTP and rsync .
Now i want to transfer all my files from old FTP to NEW FTP with rync.
Now is it possible to do it via rsync only because i don't want to first copy on computer and then upload again
Lets call the machine with only FTP src.
Lets call the machine with FTP and SSH dst.
ssh dst
cd destination-direction
wget --mirror --ftp-user=username --ftp-password=password\
--no-host-directories ftp://src/pathname/
Note that running wget with --ftp-password on the command line will give away the password to anyone else on the system. (As well as transferring it over the wire in the clear, but you knew that.)
If you don't have access to wget, then they might have ncftp or lftp or ftp installed. I just happen to know wget the best. :)
Edit To use ftp, you'll need to do something more like:
ftp src
user username
pass password
bin
cd /pathname
ls
At this point, note all the directories on the remote system. Create each one with !mkdir. Then change into the directory both locally and remotely:
lcd <dirname>
cd <dirname>
ls
Repeat for all the directories. Use mget * to get all the files.
If this looks awful, it is because it is. FTP wasn't designed for this, and if your new host doesn't have better tools (be sure to look for ncftp and lftp and maybe something like ftpmirror), then either compile better tools yourself or get good at writing scripts around the bad tools. :)
Or if you could get a shell on src, that'd help immensely too. FTP is just not intended for transferring thousands of files.
Anyway, this avoids bouncing through your local system, which ought to help throughput significantly.
There's always the trusty FUSE filesystems, CurlFtpFS and SSHFS. Mount each server with the appropriate filesystem and copy across using standard utilities. Probably not the fastest way to do it, but quite possibly the least labor-intensive.
I was looking for a simple solution to sync a remote folder to a local folder via FTP while only replacing new files. I got stuck with a little wget script based on sarnold's answer that I thought might be helpful to others, too, so here it is:
#!/bin/bash
HOST="1.2.3.4"
USER="username"
PASS="password"
LDIR="/path/to/local/dir" # can be empty
RDIR="/path/to/remote/dir" # can be empty
cd $LDIR && \ # only start if the cd was successful
wget \
--continue \ # resume on files that have already been partially transmitted
--mirror \ # --recursive --level=inf --timestamping --no-remove-listing
--no-host-directories \ # don't create 'ftp://src/' folder structure for synced files
--ftp-user=$USER \
--ftp-password=$PASS \
ftp://$HOST/$RDIR

Resources