Linux CP using AWK output - linux

I have been trying to learn more about Linux and have spent this morning focusing on the awk command. the command I have been trying to get to work is below.
ls -lRt lpftp.* | awk '{print $7, $9}' | mkdir -p $(awk '{print $1}') | ls -lRt lpftp.* | cp $(awk '{print $9, $7}')
Essentially I am trying to move each file in a directory into a sub directory based on that files last modified day. The command first prints only the files I want, then uses mkdir to create a folder based on the day of the month it was last modified. What I want to do after that is move each file into its associated directory, however as the command is now it moves every file into the 01 folder and prints out the following text
cp: 0653-436 12 is a directory.
Specify -r or -R to copy.
once for every directory.
does anyone know how I can fix this issue? or if there is a better way to go about it?

ls -lRt lpftp.* | awk '{print $7, $9}' | while read day file ; do mkdir -p "$day"; cp "$file" "$day"; done
The commands between do and done will be executed for each line of output, with the first thing awk prints in the day variable and the second in file (per line). I used quotes here somewhat unnecessarily, as there will not be spaces in the variables given the method by which they are set.

The safest way to do something like this -- and the fastest to execute -- is to use awk on the data to output a shell script. In awk, print the mkdir and cp commands you expect to execute. Pipe the results into head(1) until you're satisfied. Maybe look at the whole thing in less(1). Then execute as follows:
ls -lRg lpfpt.* | awk script.awk | sh -ex
That will echo the commands to standard error, and stop on the first error. If you're super absolutely sure it's right, drop the x option.
The advantage of this approach over a loop or a bunch of subprocesses in awk (with the system function) is:
you can see what's going to happen, and what's happening
speed of execution

Related

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

Loop to filter out lines from apache log files

I have several apache access files that I would like to clean up a bit before I analyze them. I am trying to use grep in the following way:
grep -v term_to_grep apache_access_log
I have several terms that I want to grep, so I am piping every grep action as follow:
grep -v term_to_grep_1 apache_access_log | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > apache_access_log_cleaned
Until here my rudimentary script works as expected! But I have many apache access logs, and I don't want to do that for every file. I have started to write a bash script but so far I couldn't make it work. This is my try:
for logs in ./access_logs/*;
do
cat $logs | grep -v term_to_grep | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > $logs_clean
done;
Could anyone point me out what I am doing wrong?
If you have a variable and you append _clean to its name, that's a new variable, and not the value of the old one with _clean appended. To fix that, use curly braces:
$ var=file.log
$ echo "<$var>"
<file.log>
$ echo "<$var_clean>"
<>
$ echo "<${var}_clean>"
<file.log_clean>
Without it, your pipeline tries to redirect to the empty string, which results in an error. Note that "$file"_clean would also work.
As for your pipeline, you could combine that into a single grep command:
grep -Ev 'term_to_grep|term_to_grep_2|term_to_grep_3|term_to_grep_n' "$logs" > "${logs}_clean"
No cat needed, only a single invocation of grep.
Or you could stick all your terms into a file:
$ cat excludes
term_to_grep_1
term_to_grep_2
term_to_grep_3
term_to_grep_n
and then use the -f option:
grep -vf excludes "$logs" > "${logs}_clean"
If your terms are strings and not regular expressions, you might be able to speed this up by using -F ("fixed strings"):
grep -vFf excludes "$logs" > "${logs}_clean"
I think GNU grep checks that for you on its own, though.
You are looping over several files, but in your loop you constantly overwrite your result file, so it will only contain the last result from the last file.
You don't need a loop, use this instead:
egrep -v 'term_to_grep|term_to_grep_2|term_to_grep_3' ./access_logs/* > "$logs_clean"
Note, it is always helpful to start a Bash script with set -eEuCo pipefail. This catches most common errors -- it would have stopped with an error when you tried to clobber the $logs_clean file.

Unsing integer Variable to process linux cut command fields

The following command below does not succeed.
for i in {1..5} ; do cat /etc/fstab | egrep "(ext3|ext4|xfs)" | awk '{print $2}' | cut -d"/" -f1-$i ; done
It seems that $i is ignored completely. It always returns instead result of
cut -d"/" -f1-
Any idea why it fails?
Thanks in advance!
The command itself is a part of a script that should help me to auto re-arrange fstab lines to match the right mount order (like /test/subfolder must come after /test was mounted and not before).
I tried and it didn't work for zsh shell. BUT I tried it in bash and it does work, so if you are using zsh just run the command with bash and it should work ;)

how to compare output of two ls in linux

So here is the task which I can't solve. I have a directory with .h files and a directory with .i files, which have the same names as the .h files. I want just by typing a command to have all .h files which are not found as .i files. It's not a hard problem, I can do it in some programming language, but I'm just curious how it will look like in cmd :). To be more specific here is the algo:
get file names without extensions from ls *.h
get file names without extensions from ls *.i
compare them
print all names from 1 that are not met in 2
Good luck!
diff \
<(ls dir.with.h | sed 's/\.h$//') \
<(ls dir.with.i | sed 's/\.i$//') \
| grep '$<' \
| cut -c3-
diff <(ls dir.with.h | sed 's/\.h$//') <(ls dir.with.i | sed 's/\.i$//') executes ls on the two directories, cuts off the extensions, and compares the two lists. Then grep '$<' finds the files that are only in the first listing, and cut -c3- cuts off the "< " characters that diff inserted.
ls ./dir_h/*.h | sed -r -n 's:.*dir_h/([^.]*).h$:dir_i/\1.i:p' | xargs ls 2>&1 | \
grep "No such file or directory" | awk '{print $4}' | sed -n -r 's:dir_i/([^:]*).*:dir_h/\1:p'
ls -1 dir1/*.hh dir2/*.ii | awk -F"/" '{print $NF}' |awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
explanation:
ls -1 dir1/*.hh dir2/*.ii
above will list all the files *.hh and *.ii files in both the directories.
awk -F"/" '{print $NF}'
above will just print the file name excluding the complete path of the file.
awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
above will create two associative arrays one with file name and one with excluding the extension.
if both hh and ii files exist the value in the assosciative array will 2 if there is only one file then the value will be 1.so we need array item whose value is 1 and it should be a header file (.hh).
this can be checked using the asso..array b which is done in the END block.
Assuming bash is your shell:
for file in $( ls dir_with_h/*.h ); do
name=${file%\.h}; # trim trailing ".h" file extension
name=${name#dir_with_h/}; # trim leading folder name
if [ ! -e dir_with_i/${name}.i ]; then
echo ${name};
fi
done
Undoubtedly this can be ported to virtually all other shells. I find this less cryptic than some other approaches (although this is surely my problem) but it is a little wordy. As such. a shell script might help recall it.

Give the mount point of a path

The following, very non-robust shell code will give the mount point of $path:
(for i in $(df|cut -c 63-99); do case $path in $i*) echo $i;; esac; done) | tail -n 1
Is there a better way to do this in shell?
Postscript
This script is really awful, but has the redeeming quality that it Works On My Systems. Note that several mount points may be prefixes of $path.
Examples
On a Linux system:
cas#txtproof:~$ path=/sys/block/hda1
cas#txtproof:~$ for i in $(df -a|cut -c 57-99); do case $path in $i*) echo $i;; esac; done| tail -1
/sys
On a Mac OSX system
cas local$ path=/dev/fd/0
cas local$ for i in $(df -a|cut -c 63-99); do case $path in $i*) echo $i;; esac; done| tail -1
/dev
Note the need to vary cut's parameters, because of the way df's output differs; using awk solves this, but even awk is non-portable, given the range of result formatting various implementations of df return.
Answer
It looks like munging tabular output is the only way within the shell, but
df -P "$path" | tail -1 | awk '{ print $NF}'
based on ghostdog74's answer, is a big improvement on what I had. Note two new issues: firstly, df $path insists that $path names an existing file, the script I had above doesn't care; secondly, there are no worries about dereferencing symlinks. This doesn't work if you have mount points with spaces in them, which occurs if one has removable media with spaces in their volume names.
It's not difficult to write Python code to do the job properly.
df takes the path as parameter, so something like this should be fairly robust;
df "$path" | tail -1 | awk '{ print $6 }'
In theory stat will tell you the device the file is on, and there should be some way of mapping the device to a mount point.
For example, on linux, this should work:
stat -c '%m' $path
Always been a fan of using formatting options of a program, as it can be more robust than manipulating output (eg if the mount point has spaces). GNU df allows the following:
df --output=target "$path" | tail -1
Unfortunately there is no option I can see to prevent the printing of a header, so the tail is still required.
i don't know what your desired output is, therefore this is a guess
#!/bin/bash
path=/home
df | awk -v path="$path" 'NR>1 && $NF~path{
print $NF
}'
Using cut with -c is not really reliable, since the output of df will be different , say a 5% can change to 10% and you will miss some characters. Since the mount point is always at the back, you can use fields and field delimiters. In the above, $NF is the last column which is the mount point.
I would take the source code to df and find out what it does besides calling stat as Douglas Leeder suggests.
Line-by-line parsing of the df output will cause problems as those lines often look like
/dev/mapper/VOLGROUP00-logical--volume
1234567 1000000 200000 90% /path/to/mountpoint
With the added complexity of parsing those kinds of lines as well, probably calling stat and finding the mountpoint is less complex.
If you want to use only df and awk to find the filesystem device/remote share or a mount point and they include spaces you can cheat by defining the field separator of awk to be a regular expression that matches the format of the numeric sizes used to display total size, used space, available space and capacity percentage. By defining those columns as the field separator you are then left with $1 representing the filesystem device/remote share and $NF representing the mount path.
Take this for example:
[root#testsystem ~] df -P
Filesystem 1024-blocks Used Available Capacity Mounted on
192.168.0.200:/NFS WITH SPACES 11695881728 11186577920 509303808 96% /mnt/MOUNT WITH SPACES
If you attempt to parse this with the quick and dirty awk '{print $1}' or awk '{print $NF}' you'll only get a portion of the filesystem/remote share path and mount path and that's no good. Now make awk use the four numeric data columns as the field separator.
[root#testsystem ~] df -P "/mnt/MOUNT WITH SPACES/path/to/file/filename.txt" | \
awk 'BEGIN {FS="[ ]*[0-9]+%?[ ]+"}; NR==2 {print $1}'
192.168.0.200:/NFS WITH SPACES
[root#testsystem ~] df -P "/mnt/MOUNT WITH SPACES/path/to/file/filename.txt" | \
awk 'BEGIN {FS="[ ]*[0-9]+%?[ ]+"}; NR==2 {print $NF}'
/mnt/MOUNT WITH SPACES
Enjoy :-)
Edit: These commands are based on RHEL/CentOS/Fedora but should work on just about any distribution.
Just had the same problem. If some mount point (or the mounted device) is sufficent as in my case You can do:
DEVNO=$(stat -c '%d' /srv/sftp/testconsumer)
MP=$(findmnt -n -f -o TARGET /dev/block/$((DEVNO/2**8)):$((DEVNO&2**8-1)))
(or split the hex DEVNO %D with /dev/block/$((0x${DEVNO:0:${#DEVNO}-2})):$((0x${DEVNO:2:2})))
Alternatively the following loop come in to my mind, out of ideas why I cannot find proper basic command..
TARGETPATH="/srv/sftp/testconsumer"
TARGETPATHTMP=$(readlink -m "$TARGETPATH")
[[ ! -d "$TARGETPATHTMP" ]] && TARGETPATHTMP=$(dirname "$TARGETPATH")
TARGETMOUNT=$(findmnt -d backward -f -n -o TARGET --target "$TARGETPATHTMP")
while [[ -z "$TARGETMOUNT" ]]
do
TARGETPATHTMP=$(dirname "$TARGETPATHTMP")
echo "$TARGETPATHTMP"
TARGETMOUNT=$(findmnt -d backward -f -n -o TARGET --target "$TARGETPATHTMP")
done
This should work always but is much more then I expect for such simple task?
(Edited to use readlink -f to allow for non existing files, -m or -e for readlink could be used instead if more components might not exists or all components must exists.)
mount | grep "^$path" | awk '{print $3}'
I missed this when I looked over prior questions: Python: Get Mount Point on Windows or Linux, which says that os.path.ismount(path) tells if path is a mount point.
My preference is for a shell solution, but this looks pretty simple.
I use this:
df -h $path | cut -f 1 -d " " | tail -1
Linux has this, which will avoid problem with spaces:
lsblk -no MOUNTPOINT ${device}
Not sure about BSD land.
f () { echo $6; }; f $(df -P "$path" | tail -n 1)

Resources