Use unix command inside awk commandline to process fields - linux

Is there a way to use unix command inside awk one-liner to do something and output the result on STDIN?
For example:
ls -lrt|awk '$8 !~ /:/ {system(date -d \"$6" "$7" "$8\" +"%Y%m%d")"|\t"$0}'

You are parsing ls, which can cause several problems.
When you are trying to get your filenames order by last modification with yyyymmdd in front of it, you can look at
# not correct for some filenames
stat --format "%.10y %n" * | tr -d '-' | sort
The solution fails for filenames with -. One way to solve that is using
# Still not ok
stat --format "%.10y %n" * | sed -r 's/^(..)-(..)/\1\2/' | sort
This will fail for filenames with newlines.
touch -d "2019-09-01 12:00:00" "two
lines.txt"
shows some of the problems you can also see with ls.
How you should solve this depends on your exact requirements.
Example
find . -maxdepth 1 ! -name "[.]*" -printf '%TY%Tm%Td ' -print0 |
sed 's#[.]/##g'| tr "\n\0" "/\n" | sort
Explanation:
maxdepth 1 Only look in current directory
! -name "[.]*" Ignore hidden files
-printf '%TY%Tm%Td ' YYYYMMDD and space
-print0 Don't use \n but NULL at the end of each result
sed 's#[.]/##g' Remove the path ./
tr "\n\0" "/\n" Replace newlines in a filename with / and NULLs with newlines
After the sort you might want to tr '|' '\n'.

If you want to have the output of the command, you can make use of getline:
kent$ awk 'BEGIN{"date"|getline;print}'
Fri 15 Nov 2019 10:51:10 AM CET
You can also assign the output to an awk variable:
kent$ awk 'BEGIN{"date"|getline v;print v}'
Fri 15 Nov 2019 10:50:20 AM CET

You are trying to format the date output from ls.
The find command has extensive control over date and time output. Using -printf action.
For examples here:
$ ls -l
-rw-rw-r--. 1 cfrm cfrm 41 Nov 15 09:12 input.txt
-rw-rw-r--. 1 cfrm cfrm 67 Nov 15 09:13 script.awk
$ find . -printf "fileName=%f \t formatedDate-UTC=[%a] \t formatedDate-custom=[%AY-%Am-%Ad]\n"
fileName=. formatedDate-UTC=[Fri Nov 15 09:43:32.0222415982 2019] formatedDate-custom=[2019-11-15]
fileName=input.txt formatedDate-UTC=[Fri Nov 15 09:12:33.0117279463 2019] formatedDate-custom=[2019-11-15]
fileName=script.awk formatedDate-UTC=[Fri Nov 15 09:13:38.0743189896 2019] formatedDate-custom=[2019-11-15]
For sorting by timestamp, we can mark the sorting to start on the timestamp marker ([ in the following example)
$ find . -printf "%f timestamp=[%AY%Am%Ad:%AT]\n" |sort -t [
22114 timestamp=[20190511:10:32:22.6453184660]
5530 timestamp=[20190506:01:03:01.2225343480]
5764 timestamp=[20190506:01:03:34.7107944450]
.font-unix timestamp=[20191115:13:27:01.8699219890]
hsperfdata_artemis timestamp=[20191115:13:27:01.8699219890]
hsperfdata_cfrm timestamp=[20191115:13:27:01.8709219730]
hsperfdata_elasticsearch timestamp=[20191115:13:27:01.8699219890]
.ICE-unix timestamp=[20191115:13:27:01.8699219890]
input.txt timestamp=[20191115:09:12:33.1172794630]
junk timestamp=[20191115:09:43:32.2224159820]
script.awk timestamp=[20191115:09:13:38.7431898960]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-chronyd.service-AoZvZM timestamp=[20190516:05:09:51.1884573210]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-vgauthd.service-f2m9rt timestamp=[20190516:05:09:51.1884573210]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-vmtoolsd.service-0CJ32C timestamp=[20190516:05:09:51.1884573210]
.Test-unix timestamp=[20191115:13:27:01.8699219890]
. timestamp=[20191115:13:26:56.8770048750]
.X11-unix timestamp=[20191115:13:27:01.8699219890]
.XIM-unix timestamp=[20191115:13:27:01.8699219890]

Related

UNIX how to parse list of files in a directory more than once

I am working on a list of files in a directory. I have already used awk to snip out specific fields. Next I want to further cut-down the files.
My commands are
ls /archive/gtx_rec_full | grep '2019-05-1' | awk '{print $5,$6,$7,$8}' | more
which returns a list like
9636502 2019-05-10 00:40 /archive/gtx_rec_full/GTX_20190608_1967_40431_236965.dat.gz
15915297 2019-05-10 01:39 /archive/gtx_rec_full/GTX_20190608_1967_40432_382768.dat.gz
10672671 2019-05-10 01:39 /archive/gtx_rec_full/GTX_20190608_1967_40433_261926.dat.gz
17362746 2019-05-10 02:41 /archive/gtx_rec_full/GTX_20190608_1967_40434_418702.dat.gz
13355381 2019-05-10 03:40 /archive/gtx_rec_full/GTX_20190608_1967_40435_323201.dat.gz
I want to keep the file sizes and timestamps, and then snip out the unique file IDs, like 40431, 40432, 40433, etc. So, my new result set would look like:
9636502 05/10/2019 00:40 /archive/gtx_rec_full/40431
15915297 05/10/2019 01:39 /archive/gtx_rec_full/40432
10672671 05/10/2019 01:39 /archive/gtx_rec_full/40433
17362746 05/10/2019 02:41 /archive/gtx_rec_full/40434
13355381 05/10/2019 03:40 /archive/gtx_rec_full/40435
It is not clear to me how to do this. Can anyone offer some suggestions?
Thank you!
I am working on Red Hat Enterprise Linux Server, 7.5
As I understand from your comment, you wanted
| sed -r 's/GTX.*_(.*)_[^_]*/\1/;s/\// /g'
(or, when you get confused by the slashes)
| sed -r 's#GTX.*_(.*)_[^_]*#\1#;s#/# #g'
When the solution you found is different, please post that one and accept your own answer.
Then everybody sees that the question is "finished".
try this:
#!/bin/bash
while IFS= read -r -d '' line; do
id=$(awk -F_ '{ print $4 }' <<< "${line##*/}")
path=$(awk '{ print $4 }' <<< "${line%/*}")
newpath="$path/$id"
echo "${newpath//\/*/}"
done < <(find /archive/gtx_rec_full -type f -name 'GTX_*' -printf "%-10s %Am/%Ad/%AY %AH:%AM %p\n" -print0)

Extracting filename & modification time from ls output

I have a set of files in a directory. I want to extract just the filename without the absolute path & the modification timestamp from the ls output.
/apps/dir/file1.txt
/apps/dir/file2.txt
now from the ls output i extract out the fields for filename & timestamp
ls -ltr /apps/dir | awk '{print $6 $7 $8 $9}'
Sep 25 2013 /apps/dir/file1.txt
Dec 20 2013 /apps/dir/file2.txt
Dec 20 2013 /apps/dir/file3.txt
whereas i want it to be like
Sep 25 2013 file1
Dec 20 2013 file2
Dec 20 2013 file3
one solution can be to cd to that directory and run the command from there, but is there a solution possible without cd? I also used substr() but since filenames are not of equal length so passing a constant value to substr() function didn't work out.
With GNU find, you can do the following to get the filenames without path:
find /apps/dir -type f -printf "%f\n"
and as Kojiro mentioned in the comments, you can use %t or %T(format) to get modification time.
or do as BroSlow suggested
find /apps/dir -printf "%Ab %Ad %AY %f\n"
Do not try to do the following (It will break on filenames with spaces and even across different OS where ls -l representation has fewer/more columns:
ls -ltr /apps/dir | awk '{n=split($9,f,/\//);print $6,$7,$8,f[n]}'
Dont parse output of ls command, rather use stat:
stat -c%y filename
This will print the last modification time in human readable format
Or if using GNU date you could use date with a format parameter and the reference flag
date '+%b %d %Y' -r filename
You can use basename to get just the filename portion of the path:
basename /path/to/filename
Or as Kojiro suggested with parameter expansion:
To get just the filename:
filename="${filename##*/}"
And then to strip of extension, if any:
filename="${filename%.*}"
Putting it all together:
#!/usr/bin/env bash
for filename in *; do
timestamp=$(stat -c%y "$filename")
#Uncomment below for a neater timestamp
#timestamp="${timestamp%.*}"
filename="${filename##*/}"
filename="${filename%.*}"
echo "$timestamp $filename"
done
#!/bin/bash
files=$(find /apps/dir -maxdepth 1 -type f)
for i in $files; do
file=$(basename $i)
timestamp=$(stat -c%y $i)
printf "%-50s %s\n" "$timestamp" "$file"
done
If you want to reproduce the ls time format:
dir=/apps/dir
now=$(date +%s)
sixmo=$(date -d '6 months ago' +%s)
find "$dir" -maxdepth 1 -print0 |
while read -d '' -r filename; do
mtime=$(stat -c %Y "$filename")
if (( sixmo <= mtime && mtime <= now )); then
fmt="%b %d %H:%M"
else
fmt="%b %d %Y"
fi
printf "%12s %s\n" "$(date -d "#$mtime" "+$fmt")" "$(basename "$filename")"
done |
sort -k 4
Assuming the GNU set of tools

Linux Shell - String manipulation then calculating age of file in minutes

I am writing a script that calculates the age of the oldest file in a directory. The first commands run are:
OLDFILE=`ls -lt $DIR | grep "^-" | tail -1 `
echo $OLDFILE
The output contains a lot more than just the filename. eg
-rwxrwxr-- 1 abc abc 334 May 10 2011 ABCD_xyz20110510113817046.abc.bak
Q1/. How do I obtain the output after the last space of the above line? This would give me the filename. I realise some sort of string manipulation is required but am new to shell scripting.
Q2/. How do I obtain the age of this file in minutes?
To obtain just the oldest file's name,
ls -lt | awk '/^-/{file=$NF}END{print file}'
However, this is not robust if you have files with spaces in their names, etc. Generally, you should try to avoid parsing the output from ls.
With stat you can obtain a file's creation date in machine-readable format, expressed as seconds since Jan 1, 1970; with date +%s you can obtain the current time in the same format. Subtract and divide by 60. (More Awk skills would come in handy for the arithmetic.)
Finally, for an alternate solution, look at the options for find; in particular, its printf format strings allow you to extract a file's age. The following will directly get you the age in seconds and inode number of the oldest file:
find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1
Using the inode number avoids the issues of funny file names; once you have a single inode, converting that to a file name is a snap:
find . -maxdepth 1 -inum "$number"
Tying the two together, you might want something like this:
# set -- Replace $# with output from command
set -- $(find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1)
# now $1 is the timestamp and $2 is the inode
oldest_filename=$(find . -maxdepth 1 -inum "$2")
age_in_minutes=$(date +%s | awk -v d="$1" '{ print ($1 - d) / 60 }')
an awk solution, giving you how old the file is in minutes, (as your ls output does not contain the min of creation, so 00 is assumed by default). Also as tripleee pointed out, ls outputs are inherently risky to be parsed.
[[bash_prompt$]]$ echo $l; echo "##############";echo $l | awk -f test.sh ; echo "#############"; cat test.sh
-rwxrwxr-- 1 abc abc 334 May 20 2013 ABCD_xyz20110510113817046.abc.bak
##############
File ABCD_xyz20110510113817046.abc.bak is 2074.67 min old
#############
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
month=date[$6];
old_time=mktime($8" "month" "$7" "00" "00" "00);
curr_time=systime();
print "File " $NF " is " (curr_time-old_time)/60 " min old";
}
For Q2 a bash one-liner could be:
let mtime_minutes=\(`date +%s`-`stat -c%Y "$file_to_inspect"`\)/60
You probably want ls -t. Including the '-l' option explicitly asks for ls to give you all that other info that you actually want to get rid of.
However what you probably REALLY want is find, first:
NEWEST=$(find . -maxdepth 1 -type f | cut -c3- |xargs ls -t| head -1)
this will get you the plain filename of the newest item, no other processing needed.
Directories will be correctly excluded. (thanks to tripleee for pointing out that's what you were aiming for.)
For the second question, you can use stat and bc:
TIME_MINUTES=$(stat --format %Y/60 "$NEWEST" |bc -l)
remove the '-l' option from bc if you only want a whole number of minutes.

using Find/Grep to search files between specific time of day

I can use the following command to search log files over the past 10 days:
find . -type f -mtime -10 |xargs grep -i -n 'exception' 2> /dev/null
But i want to further limit the search for lines in the file that are logged between 6am and 6pm. I'm wondering how i can modify the grep command to filter these if the lines look like this:
2012-09-04 03:50:41,658 [MainLogger: ] EXCEPTION AppLog - some exception 1
2012-09-04 10:01:32,902 [MainLogger: ] EXCEPTION AppLog - some exception 2
2012-09-04 15:39:51,901 [MainLogger: ] EXCEPTION AppLog - some exception 3
2012-09-04 18:12:51,901 [MainLogger: ] EXCEPTION AppLog - some exception 4
In the above case on lines 2 and 3 should be returned since they are between 6am and 6pm.
any help would be appreciated
One easy ugly way to do it could be adding a lot of greps, like this :
find . -type f -mtime -10 |xargs grep -i -n 'exception' | grep -v " 00" | grep -v " 01" | ... | grep -v " 18" | grep -v " 19" ... 2> /dev/null
Or more concisely :
find . -type f -mtime -10 |xargs grep -i -n 'exception' | grep -v -e " \(0[012345]\|18\|19\|2[0123]\)" 2> /dev/null
You can hack it like this:
grep ' 0[6789]:\| 1[01234567]\| 18:00:00,000'
But if you will need some more time handling, I recommend switching to a more powerful language (e.g. Perl and DateTime).
In any language with a proper datetime library, converting all dates to a canonical representation makes the problem trivial. The default canonicalization is to convert to seconds since midnight, Jan 1, 1970. Then just see if the canonical number of the input line is bigger than the start time and smaller than the end time.

AWK output formatting and syntax

Ive been trying to sort out output using AWK, and have been pretty successful going through some of the stuff on stack overflow until i hit the last part of the command below.
-bash-3.2$ find /home/username/www/devdir -mindepth 2 -maxdepth 2 -type d -printf "%TY %Tm %Tb %Td,%TH:%TM,%p,\n" | grep "^$r" | grep Aug | sort -r | awk -F '/' '{print $1,$6","$7}' | awk -F " " '$1, { for (i=3; i<=NF; i++) printf("%s ", $i); printf("\n"); }' | head -10
awk: $1, { for (i=3; i<=NF; i++) printf("%s ", $i); printf("\n"); }
awk: ^ syntax error
The output looks like the below:
2010 08 Aug 28,11:51, Directory Tom,005,
2010 08 Aug 28,11:50, Directory Smith,004,
2010 08 Aug 28,11:46, Directory Jon,003,
I want it to look like:
2010 Aug 28,11:51, Directory Tom,005,
2010 Aug 28,11:50, Directory Smith,004,
2010 Aug 28,11:46, Directory Jon,003,
I woud like to cut the "08" out of it, and sometimes without losing the sorting done earlier. This will change to 09 next month and 10 the following, I believe I can use sed to solve this, however I am not an expert with it. Can someone shed some light as to what I should do to overcome this obstacle?
I've referenced this question to get an idea of what I needed to do: Sorting output with awk, and formatting it
What do want do accomplish exactly with this part?
awk -F " " '$1, { }'
I mean the $1, ...
Regarding the update:
sed 's/^\([0-9]\+\) 08 \(.\+\)$/\1 \2/'
should cut this out.
Or more generic:
sed 's/^\([0-9]\+\) [0-9][0-9] \(.\+\)$/\1 \2/'
You can combine your greps into one and all your awks, seds and head all into one awk script. Since you're grepping only one month and one year, you don't need to include the %Tm in the find and you're effectively only sorting by the day of month and time. I'm assuming $r is the year.
Approximately:
find /home/username/www/devdir -mindepth 2 -maxdepth 2 -type d -printf "%TY %Tb %Td,%TH:%TM,%p,\n" | grep "^$r.*Aug" | sort -r | awk -F '/' 'NR<=10{print $1,$6","$7}'
By the way, here's how you would do what you set out to do:
printf($1" ");for (i=3; i<=NF; i++) printf("%s ", $i); printf("\n")
You would print the first field explicitly rather than include it by reference. However, using my suggestions above you shouldn't need to do this.

Resources