Get clean list of file sizes and names using SFTP in unix - linux

I want to fetch list of files from a server using SFTP one by one only if their size is less than 1 GB.
I am running the following command :
$sftp -oIdentityFile=/home/user/.ssh/id_rsa -oPort=22 user#hostname >list.txt <<EOF
cd upload/Example
ls -l iurygify*.zip
EOF
This results in:
$cat list.txt
sftp> cd upload/Example
sftp> ls -l iurygify*.zip
-rwxrwx--- 0 300096661 300026669 0 Mar 11 16:38 iurygify1.zip
-rwxrwx--- 0 300096661 300026669 0 Mar 11 16:38 iurygify2.zip
I could then use awk to calculate get the size and filename which I can save into logs for reference and then download only those files which meet the 1 GB criteria.
Is there any simpler approach to accomplish getting this file list and size? I want to avoid the junk entires of the prompt and commands in the list.txt and do not want to do this via expect command.
We are using SSH key authentication

You could place your sftp commands in a batch file and filter the output - no need for expect.
echo 'ls -l' > t
sftp -b t -oIdentityFile=/home/user/.ssh/id_rsa -oPort=22 user#hostname | grep -v 'sftp>' >list.txt
Or take it a step further and filter out the "right size" in the same step:
sftp -b t -oIdentityFile=/home/user/.ssh/id_rsa -oPort=22 user#hostname | awk '$1!~/sftp>/&&$5<1000000000' >list.txt

Maybe using lftp instead of sftp ?
$ lftp sftp://xxx > list.txt <<EOF
> open
> ls -l
> EOF
$ cat list.txt
drwxr-xr-x 10 ludo users 4096 May 24 2019 .
drwxr-xr-x 8 root root 4096 Dec 20 2018 ..
-rw------- 1 ludo users 36653 Mar 31 19:28 .bash_history
-rw-r--r-- 1 ludo users 220 Mar 21 2014 .bash_logout
-rw-r--r-- 1 ludo users 362 Aug 16 2018 .bash_profile
...

Related

Identify the latest file from a file list

I have a pretty tricky task (at least for me).
I have an sftp access to a server which I need to get ONLY the latest file in the directory. Since sftp interface is very limited I have come up to list the files in the directory to a text file first.
This is the code
sftp -b - hostname >list.txt <<EOF
ls -l *.xls
EOF
My concern now is from list.txt, how do I identify the latest file?
Sample content of list.txt
cat list.txt
-rw-r--r-- 0 16777221 16777216 52141 Mar 29 08:06 samplefile1.xls
-rw-r--r-- 0 16777221 16777216 2926332 Mar 28 09:48 samplefile2.xls
-rw-r--r-- 0 16777221 16777216 40669 Mar 26 04:38 samplefile3.xls
-rw-r--r-- 0 16777221 16777216 8640 Mar 19 08:02 samplefile4.xls
-rw-r--r-- 0 16777221 16777216 146331 Mar 25 07:27 samplefile5.xls
-rw-r--r-- 0 16777221 16777216 18988 Mar 19 03:53 samplefile6.xls
-rw-r--r-- 0 16777221 16777216 36640 Apr 2 12:52 samplefile7.xls
Use ls -lt
sftp -b - hostname >list.txt <<EOF
ls -lt
EOF
Now the first line in your file will be latest file.
You can manage it like below:-
Maintain a history file in your server like history.txt
Before transferring file create a list of files that you are creating at the moment.
For the first time generate a history.txt file manually and add all the files that you have already transferred. For example samplefile6.xls and samplefile7.xls
sftp -b - hostname >list.txt <<EOF
ls -l *.xls
EOF
Now add a while loop to your above existing script
while read line
do
file=$(echo "$line" | awk '{print $9}')
if grep "$file" history.txt; then
echo "File already existed in history file -- No need to transfer"
else
sftp server_host <<EOF
cd /your/dropLocation
put $file
quit
EOF
echo "$file" >> history.txt
#add the transferred file to history file
fi
done < list.txt
With this approach, even if you have more than one latest files you can transfer them very easily.
Hope this will help you.

how to get uuid of filesystem given a path?

I am handed a path of a directory ( sometimes path of a file ).
Which utility / shell script will reliably give me the UUID of the filesystem on which is this directory ( or file ) located / stored ?
( by UUID of filesystems I mean the "UUID=..." entry as shown by e.g. blkid )
( this is happnening on a redhat linux )
give this line a try:
sudo blkid -o value $(\df --output=source "$file"|tail -1)|head -1
in above line, $file is the variable to save the file/dir. You may want to check if the file/dir exists, before call the line.
And this line needs root permission (sudo)
\df is just for avoiding to use alias if you had one, for example with -T option, it conflicts with --output
Some test :
kent$ file="/home/kent/.vimrc"
kent$ sudo blkid -o value $(\df --output=source "$file"|tail -1)|head -1
9da1040a-4a24-4a00-9c62-bad8cc3c028d
kent$ file="/etc"
kent$ sudo blkid -o value $(\df --output=source "$file"|tail -1)|head -1
2860a386-af71-4a28-86d7-00ccf5d12b4d
Find the device of the mount point of the path,
DEVICE=$(df /path/to/some_file_or_directory | grep "$MOUNTPOINT\$"| cut -f1 -d" ")
and get the UUID of the device:
sudo blkid $DEVICE
Simply, you can type like this,
pchero#mywork:~$ ls -l /dev/disk/by-uuid/
total 0
lrwxrwxrwx 1 root root 10 Jan 23 09:03 0267689b-b929-4f30-b8a4-08c742f0746f -> ../../sda2
lrwxrwxrwx 1 root root 10 Jan 23 09:03 2d682ea1-dab0-49ba-a77a-9335ccd47e58 -> ../../sda3
lrwxrwxrwx 1 root root 10 Jan 23 09:03 64e733e9-2e6a-4d3e-aabe-d0d26fbfc991 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jan 23 09:03 a99fb356-4e01-4a1c-af41-001b0fd8a844 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Jan 23 09:03 f2f7618e-76c5-4e9a-9657-e002d9a66ccf -> ../../sda4

How do I find the latest date folder in a directory and then construct the command in a shell script?

I have a directory in which I will have some folders with date format (YYYYMMDD) as shown below -
david#machineX:/database/batch/snapshot$ ls -lt
drwxr-xr-x 2 app kyte 86016 Oct 25 05:19 20141023
drwxr-xr-x 2 app kyte 73728 Oct 18 00:21 20141016
drwxr-xr-x 2 app kyte 73728 Oct 9 22:23 20141009
drwxr-xr-x 2 app kyte 81920 Oct 4 03:11 20141002
Now I need to extract latest date folder from the /database/batch/snapshot directory and then construct the command in my shell script like this -
./file_checker --directory /database/batch/snapshot/20141023/ --regex ".*.data" > shardfile_20141023.log
Below is my shell script -
#!/bin/bash
./file_checker --directory /database/batch/snapshot/20141023/ --regex ".*.data" > shardfile_20141023.log
# now I need to grep shardfile_20141023.log after above command is executed
How do I find the latest date folder and construct above command in a shell script?
Look, this is one of approaches, just grep only folders that have 8 digits:
ls -t1 | grep -P -e "\d{8}" | head -1
Or
ls -t1 | grep -E -e "[0-9]{8}" | head -1
You could try the following in your script:
pushd /database/batch/snapshot
LATESTDATE=`ls -d * | sort -n | tail -1`
popd
./file_checker --directory /database/batch/snapshot/${LATESTDATE}/ --regex ".*.data" > shardfile_${LATESTDATE}.log
See BashFAQ#099 aka "How can I get the newest (or oldest) file from a directory?".
That being said, if you don't care for actual modification time and just want to find the most recent directory based on name you can use an array and globbing (note: the sort order with globbing is subject to LC_COLLATE):
$ find
.
./20141002
./20141009
./20141016
./20141023
$ foo=( * )
$ echo "${foo[${#foo[#]}-1]}"
20141023

Using sed within "while read" expression

I am pretty stuck with that script.
#!/bin/bash
STARTDIR=$1
MNTDIR=/tmp/test/mnt
find $STARTDIR -type l |
while read file;
do
echo Found symlink file: $file
DIR=`sed 's|/\w*$||'`
MKDIR=${MNTDIR}${DIR}
mkdir -p $MKDIR
cp -L $file $MKDIR
done
I passing some directory to $1 parameter, this directory have three symbolic links. In while statement echoed only first match, after using sed I lost all other matches.
Look for output below:
[artyom#LBOX tmp]$ ls -lh /tmp/imp/
total 16K
lrwxrwxrwx 1 artyom adm 19 Aug 8 10:33 ok1 -> /tmp/imp/sym3/file1
lrwxrwxrwx 1 artyom adm 19 Aug 8 09:19 ok2 -> /tmp/imp/sym2/file2
lrwxrwxrwx 1 artyom adm 19 Aug 8 10:32 ok3 -> /tmp/imp/sym3/file3
[artyom#LBOX tmp]$ ./copy.sh /tmp/imp/
Found symlink file: /tmp/imp/ok1
[artyom#LBOX tmp]$
Can somebody help with that issue?
Thanks
You forgot to feed something to sed. Without explicit input, it reads nothing in this construction. I wouldn't use this approach anyway, but just use something like:
DIR=`dirname "$file"`

How to check syslog in Bash on Linux?

In C we log this way:
syslog( LOG_INFO, "proxying %s", url );
In Linux how can we check the log?
How about less /var/log/syslog?
On Fedora 19, it looks like the answer is /var/log/messages. Although check /etc/rsyslog.conf if it has been changed.
By default it's logged into system log at /var/log/syslog, so it can be read by:
tail -f /var/log/syslog
If the file doesn't exist, check /etc/syslog.conf to see configuration file for syslogd.
Note that the configuration file could be different, so check the running process if it's using different file:
# ps wuax | grep syslog
root /sbin/syslogd -f /etc/syslog-knoppix.conf
Note: In some distributions (such as Knoppix) all logged messages could be sent into different terminal (e.g. /dev/tty12), so to access e.g. tty12 try pressing Control+Alt+F12.
You can also use lsof tool to find out which log file the syslogd process is using, e.g.
sudo lsof -p $(pgrep syslog) | grep log$
To send the test message to syslogd in shell, you may try:
echo test | logger
For troubleshooting use a trace tool (strace on Linux, dtruss on Unix), e.g.:
sudo strace -fp $(cat /var/run/syslogd.pid)
A very cool util is journalctl.
For example, to show syslog to console: journalctl -t <syslog-ident>, where <syslog-ident> is identity you gave to function openlog to initialize syslog.
tail -f /var/log/syslog | grep process_name
where process_name is the name of the process we are interested in
If you like Vim, it has built-in syntax highlighting for the syslog file, e.g. it will highlight error messages in red.
vi +'syntax on' /var/log/syslog
On some Linux systems (e.g. Debian and Ubuntu) syslog is rotated daily and you have multiple log files where two newest files are uncompressed while older ones are compressed:
$ ls -l /var/log/syslog*
-rw-r----- 1 root adm 888238 Aug 25 12:02 /var/log/syslog
-rw-r----- 1 root adm 1438588 Aug 25 00:05 /var/log/syslog.1
-rw-r----- 1 root adm 95161 Aug 24 00:07 /var/log/syslog.2.gz
-rw-r----- 1 root adm 103829 Aug 23 00:08 /var/log/syslog.3.gz
-rw-r----- 1 root adm 82679 Aug 22 00:06 /var/log/syslog.4.gz
-rw-r----- 1 root adm 270313 Aug 21 00:10 /var/log/syslog.5.gz
-rw-r----- 1 root adm 110724 Aug 20 00:09 /var/log/syslog.6.gz
-rw-r----- 1 root adm 178880 Aug 19 00:08 /var/log/syslog.7.gz
To search all the syslog files you can use the following commands:
$ sudo zcat -f `ls -tr /var/log/syslog*` | grep -i error | less
where zcat first decompresses and prints all syslog files (oldest first), grep makes a search and less is paging the results of the search.
To do the same but with the lines prefixed with the name of the syslog file you can use zgrep:
$ sudo zgrep -i error `ls -tr /var/log/syslog*` | less
$ zgrep -V | grep zgrep
zgrep (gzip) 1.6
In both cases sudo is required if syslog files are not readable by ordinary users.

Resources