How to copy latest file from sftp to local directory using shell script? - linux

I have multiple file in SFTP server from which I need to copy only latest file. I have written sample code but in that I am passing filename. What logic I need to add that it identify the latest file from sftp and copy it into my local?
In SFTP server -
my_data_20220428.csv
my_data_20220504.csv
my_data_20220501.csv
my_data_20220429.csv
The code which I am running-
datadir="/script/data"
cd ${datadir}
rm -f ${datadir}/my_data*.csv
rm -f ${logfile}
lftp<<END_SCRIPT
open sftp://${sftphost}
user ${sftpuser} ${sftppassword}
cd ${sftpfolder}
lcd $datadir
mget my_data_20220504.csv
bye
END_SCRIPT
what changes I need to do it automatically pick the latest file from server without hardcoding the filename?

You can try this script mainly copied from your sample, so it is expected that the variables have already been created.
#!/usr/bin/env bash
datadir="/script/data"
rm -f "$datadir"/my_data*.csv
rm -f "$logfile"
new=$(echo "ls -halt $sftpfolder" | lftp -u "${sftpuser}","${sftppassword}" sftp://"${sftphost}" | sed -n '/my_data/s/.* \(.*\)/\1/p' | head -1)
lftp -u "${sftpuser}","${sftppassword}" sftp://"${sftphost}" << --EOF--
cd "$sftpfolder"
lcd "$datadir"
get "$new"
bye
--EOF--

You could try:
latest=$(lftp "sftp://$sftpuser:$sftppassword#$sftphost" \
-e "cd $sftpfolder; glob rels -1t *.csv; bye" |
head -1)
lftp "sftp://$sftpuser:$sftppassword#myhost" \
-e "cd $sftpfolder; mget $latest; bye"

Related

Create Directory, download file and execute command from list of URL

I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.
My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.
Example URLs:
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz
Ideally, the file name determines the directory name, for example, TC_3370/.
I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast
Currently I just run the commands in line:
mkdir TC_3370
cd TC_3370/
wget url
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz
crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
Try this Shellcheck-clean program:
#! /bin/bash -p
while read -r url; do
file=${url##*/}
dir=${file%%_chr.*}
mkdir -v -- "$dir"
(
cd "./$dir" || exit 1
wget -- "$url"
crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
)
done <URL.txt
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${url##*/} etc.
The subshell (( ... )) is used to ensure that the cd doesn't affect the main program.
Another implementation
#!/bin/sh
# Read lines as url as long as it can
while read -r url
do
# Get file name by stripping-out anything up to the last / from the url
file_name=${url##*/}
# Get the destination dir name by stripping anything from the first __chr
dest_dir=${file_name%%_chr*}
# Compose the wget output path
fasta_path="$dest_dir/$file_name"
if
# Successfully created the destination directory AND
mkdir -p -- "$dest_dir" &&
# Successfully downloaded the file
wget --output-file="$fasta_path" --quiet -- "$url"
then
# Process the fasta file into fna
fna_path="$dest_dir/TCV2_annot_cds.fna"
crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
else
# Cleanup remove destination directory if any of mkdir or wget failed
rm -fr -- "$dest_dir"
fi
# reading from the URL.txt file for the whole while loop
done < URL.txt
Download files from list is task for -i file option, if you have file named say urls.txt with one URL per line you might simply do
wget -i urls.txt
Note that this will put all files inside current working directory, so if you wish to have them in separate dirs, you would need to move them after wget finish.

How do I rerun something until succeed in the background with pipe in linux command line?

Story: I'm following this guide to setup my geth node:
https://github.com/enthusiastics/bsc-archive-snapshot/blob/master/build_archive_node.sh
However, for this code chunk
# Query S3 for all archives and download them in parallel to a new zfs dataset.
while IFS= read -r FILE_NAME; do
ZFS_NAME=$(echo "$FILE_NAME" | cut -d'.' -f1)
ARCHIVE_NAMES+=("$ZFS_NAME")
zfs create -o "mountpoint=/$ZFS_NAME" "tank/$ZFS_NAME"
bash -c "cd /$ZFS_NAME && aws s3 cp --request-payer=requester '$S3_BUCKET_PATH/bsc/$FILE_NAME' - | /zstd/zstd --long=30 -d | tar -xf -" &
done <<<"$(aws s3 ls --request-payer=requester "$S3_BUCKET_PATH/bsc/" | cut -d' ' -f4)"
To be exact for the following line
bash -c "cd /$ZFS_NAME && aws s3 cp --request-payer=requester '$S3_BUCKET_PATH/bsc/$FILE_NAME' - | /zstd/zstd --long=30 -d | tar -xf -" &
some of instance would results in broken pipe, but a automatic restart could help. So I want to rewrite the code so that it automatically retries until the above line is succeeded. I researched a bit and found the until do done method. Example
until passwd ; do echo "Try again" ; done;
However, I could not succeed in incorporating the above idea into the GitHub code. I try to rewrite it as:
bash -c "cd /$ZFS_NAME && until aws s3 cp --request-payer=requester '$S3_BUCKET_PATH/bsc/$FILE_NAME' - | /zstd/zstd --long=30 -d | tar -xf - ; do echo "Try again" ; done;" &
But it does not work... I test it on the minimum code:
until passwd | echo "randompipe" ; do echo "Try again" ; done;
it didn't run passwd until succeed

Script will not download files when called by cron or udev rules

So I'm trying to make a script that will download my podcasts upon detecting my smart watch connecting and transfer them to it. I've configured the udev rule to detect when the watch is connected and it executes /bin/watch_transfer.sh for which the code is as such:
#!/usr/bin/env sh
echo "Watch connected at $(date)" >>/tmp/scripts.log
# Download new podcasts
cd /home/pi/Scripts/
./bashpodder.shell >>/tmp/pscripts.log
echo "Upodder should've run by now">>/tmp/scripts.log
# Transfer podcasts
for file in /home/pi/Downloads/podcasts/*
do
/usr/bin/mtp-sendfile $file /Podcasts
echo "Processing $file" >>/tmp/scripts.log
done
echo "Sent all files" >>/tmp/scripts.log
I know that the file runs when the watch is connected because /tmp/scripts.log is created and updated, and also bashpodder.shelll creates the podcast.m3u file so the bashpodder script is running but it doesn't download any files to /~/Downloads/podcasts. Bashpodder is a simle podcast downloader (I was using upodder but switched because it didn't seem to work) and mtp-tools is a way to transfer files through MTP. Bashpodder.shell script below:
# By Linc 10/1/2004
# Find the latest script at http://lincgeek.org/bashpodder
# Revision 1.21 12/04/2008 - Many Contributers!
# If you use this and have made improvements or have comments
# drop me an email at linc dot fessenden at gmail dot com
# and post your changes to the forum at http://lincgeek.org/lincware
# I'd appreciate it!
# Make script crontab friendly:
cd $(dirname $0)
# datadir is the directory you want podcasts saved to:
datadir=/home/pi/Downloads/podcasts
# create datadir if necessary:
mkdir -p $datadir
# Delete any temp file:
rm -f temp.log
# Read the bp.conf file and wget any url not already in the podcast.log file:
while read podcast
do
file=$(xsltproc parse_enclosure.xsl $podcast 2> /dev/null || wget -q $podcast -O - | tr '\r' '\n' | tr \' \" | sed -n 's/.*url="\([^"]*\)".*/\1/p')
for url in $file
do
echo $url >> temp.log
if ! grep "$url" podcast.log > /dev/null
then
wget -t 10 -U BashPodder -c -q -O $datadir/$(echo "$url" | awk -F'/' {'print $NF'} | awk -F'=' {'print $NF'} | awk -F'?' {'print $1'}) "$url"
fi
done
done < bp.conf
# Move dynamically created log file to permanent log file:
cat podcast.log >> temp.log
sort temp.log | uniq > podcast.log
rm temp.log
# Create an m3u playlist:
ls $datadir | grep -v m3u > $datadir/podcast.m3u
I think it might be something to do with permissions? As when I run ./watch_transfer.sh from the terminal it runs perfectly. Thanks in advance for your help.
edit:
After connecting my watch:
Ouput of $ cat /tmp/scripts.log:
Watch connected at Thu Jul 16 22:25:47 BST 2020
Upodder should've run by now
Processing /home/pi/Downloads/podcasts/podcast.m3u
Sent all files
$ cat /tmp/psripts.log doesn't output anything but /tmp/pscripts.log does exist.
Output of $ cat ~/Scripts/temp.log:
http://rasterweb.net/raster/audio/rwaudio20060108.mp3
http://rasterweb.net/raster/audio/rwaudio20051020.mp3
http://rasterweb.net/raster/audio/rwaudio20051017.mp3
http://rasterweb.net/raster/audio/rwaudio20050807.mp3
http://rasterweb.net/raster/audio/rwaudio20050719.mp3
http://rasterweb.net/raster/audio/rwaudio20050615.mp3
http://rasterweb.net/raster/audio/rwaudio20050525.mp3
http://rasterweb.net/raster/audio/rwaudio20050323.mp3
This seems to suggest that bashpodder is running through the urls but not actually downloading them?

lftp delete multiples files with Bash

I try to create a script who delete all the olds files except the three more recent files on my backup directory with lftp.
I have try to do this with ls -1tr who return all the files in ascending date order, and after I do a head -$NB_BACKUP_TO_RM ($NB_BACKUP_TO_RM is the numbers of files that I want to delete in my lists), this two commands return the correct files.
After this I want to remove all of them, so I do a xargs rm --, but Bash returns that the files don't exist... I think this command is not running into the remote directory, but in the local directory, and I don't know what I can do for delete this files (of my return lists).
Here is the full code:
MAX_BACKUP=3
NB_BACKUP=$(lftp -e "ls -1tr $REMOTE_DIR/full_backup_ftp* | wc -l ; quit" -u $USER,$PASSWORD $HOST)
if (( $NB_BACKUP > $MAX_BACKUP ))
then
NB_BACKUP_TO_RM=$(($NB_BACKUP-$MAX_BACKUP))
REMOVE=$(lftp -e "ls -1tr $REMOTE_DIR/full_backup_ftp* | head -$NB_BACKUP_TO_RM | xargs rm -- ; quit" -u $USER,$PASSWORD $HOST)
echo $REMOVE
fi
Have you an idea of the problem? How can I delete the files of my lists (after ls -1tr $REMOTE_DIR/full_backup_ftp* and head -$NB_BACKUP_TO_RM)
Thanks for your help
Starting SFTP connection can be time consuming. Slightly modified solution to avoid multiple lftp sessions below. It will perform much better the the alternative solution, especially if large number of files have to be purged.
Basically, leveraging lftp flexibility to mix lftp command with external commands. It creates a command file with a series of 'rm' (leveraging head ,xargs, ...), and executing those commands INSIDE the same lftp session.
Also note that lftp 'ls' does not allow wildcard, use 'cls' instead
Make sure you test this carefully, because of potential removal of important files
lftp -e $USER,$PASSWORD $HOST <<__CMD__
cls -1tr $REMOTE_DIR/full_backup_ftp* | head -$NB_BACKUP_TO_RM | xarg -I{} echo rm {} > rm_list.txt
source rm_list.txt
__CMD__
Or with one liner, using 'lftp' ability to execute dynamically generated command (source -e). It eliminate the temporary file.
lftp -e $USER,$PASSWORD $HOST <<__CMD__
source -e 'cls -1tr $REMOTE_DIR/full_backup_ftp* | head -$NB_BACKUP_TO_RM | xarg -I{} echo rm {}'
__CMD__
Looks xargs is unknown cmd for lftp after man lftp. And xargs rm is deleting local files not remote files.
so please use xargs as below, it works for me.
lftp -e "ls -1tr $REMOTE_DIR/full_backup_ftp*; quit" -u $USER,$PASSWORD $HOST | head -$NB_BACKUP_TO_RM | xargs -I {} lftp -e 'rm '{}'; quit' -u $USER,$PASSWORD $HOST

Commands work on terminal but not in shell script

The following commands work on my terminal but not in my shell script. I later found out that my terminal was /bin/tcsh. Can somebody tell me what changes I need to do for /bin/sh. Here are the commands I need to change:
cp source_dir/*/dir1/*.xml destination_dir/
Error in sh-> cp: cannot stat `source_dir/*/dir1/*.xml': No such file or directory
sed -i "s+${initial_name}+${final_name}+" $file_name
This one does not complain but does not work as well.
I am adding an example for testing. The code tends to rename the names of xml files and also the contents of xml files. For example-
The file name crr.ya.na.aa.xml should be changed to aa.xml
The same name inside crr.ya.na.aa.xml should also be changed from crr.ya.na.aa to aa
Here is the code:
#!/bin/sh
# Create dir structure for testing
rm -rf audience
mkdir audience
mkdir audience/dir1 audience/dir2 audience/dir3
mkdir audience/dir1/ipxact audience/dir2/ipxact audience/dir3/ipxact
touch audience/dir1/ipxact/crr.ya.na.aa.xml
echo "<spirit:name>crr.ya.na.aa</spirit:name>" > audience/dir1/ipxact/crr.ya.na.aa.xml
touch audience/dir2/ipxact/crr.ya.na.bb.xml
echo "<spirit:name>crr.ya.na.bb</spirit:name>" > audience/dir2/ipxact/crr.ya.na.bb.xml
touch audience/dir3/ipxact/crr.ya.na.cc.xml
echo "<spirit:name>crr.ya.na.cc</spirit:name>" > audience/dir3/ipxact/crr.ya.na.cc.xml
# Create a dir for ipxact_drop files if it does not exist
mkdir -p ipxact_drop
rm -rf ipxact_drop/*
cp audience/*/ipxact/*.xml ipxact_drop/
ls ipxact_drop/ > ipxact_drop_files.log
cat ipxact_drop_files.log | \
awk '{ split($0,a,"."); print a[length(a)-1] "." a[length(a)] }' ipxact_drop_files.log > file_names.log
cat ipxact_drop_files.log | \
awk '{ split($0,a,"."); print "mv ipxact_drop/" $0 " ipxact_drop/" a[length(a)-1] "." a[length(a)] }' ipxact_drop_files.log > command.log
chmod +x command.log
./command.log
while read line
do
echo ipxact_drop/$line
initial_name=`grep -m 1 crr ipxact_drop/$line | sed -e 's/<spirit:name>//' | sed -e 's/<\/spirit:name>//' `
final_name="${line%.*}"
echo $initial_name
echo $final_name
sed -i "s+${initial_name}+${final_name}+" ipxact_drop/$line
done < file_names.log
echo " ***** SCRIPT RUN FINISHED *****"
Only the sed command at the end is not working
I was reading some other posts and understood that xml files can have problems with scripts. Here is what that worked for me upto now.
To remove cp error: replace #!/bin/sh -f with #!/bin/sh
To remove sed error for the test input: replace sed -i ...... with sed -i.back ....

Resources