AWK output formatting and syntax - linux

Ive been trying to sort out output using AWK, and have been pretty successful going through some of the stuff on stack overflow until i hit the last part of the command below.
-bash-3.2$ find /home/username/www/devdir -mindepth 2 -maxdepth 2 -type d -printf "%TY %Tm %Tb %Td,%TH:%TM,%p,\n" | grep "^$r" | grep Aug | sort -r | awk -F '/' '{print $1,$6","$7}' | awk -F " " '$1, { for (i=3; i<=NF; i++) printf("%s ", $i); printf("\n"); }' | head -10
awk: $1, { for (i=3; i<=NF; i++) printf("%s ", $i); printf("\n"); }
awk: ^ syntax error
The output looks like the below:
2010 08 Aug 28,11:51, Directory Tom,005,
2010 08 Aug 28,11:50, Directory Smith,004,
2010 08 Aug 28,11:46, Directory Jon,003,
I want it to look like:
2010 Aug 28,11:51, Directory Tom,005,
2010 Aug 28,11:50, Directory Smith,004,
2010 Aug 28,11:46, Directory Jon,003,
I woud like to cut the "08" out of it, and sometimes without losing the sorting done earlier. This will change to 09 next month and 10 the following, I believe I can use sed to solve this, however I am not an expert with it. Can someone shed some light as to what I should do to overcome this obstacle?
I've referenced this question to get an idea of what I needed to do: Sorting output with awk, and formatting it

What do want do accomplish exactly with this part?
awk -F " " '$1, { }'
I mean the $1, ...
Regarding the update:
sed 's/^\([0-9]\+\) 08 \(.\+\)$/\1 \2/'
should cut this out.
Or more generic:
sed 's/^\([0-9]\+\) [0-9][0-9] \(.\+\)$/\1 \2/'

You can combine your greps into one and all your awks, seds and head all into one awk script. Since you're grepping only one month and one year, you don't need to include the %Tm in the find and you're effectively only sorting by the day of month and time. I'm assuming $r is the year.
Approximately:
find /home/username/www/devdir -mindepth 2 -maxdepth 2 -type d -printf "%TY %Tb %Td,%TH:%TM,%p,\n" | grep "^$r.*Aug" | sort -r | awk -F '/' 'NR<=10{print $1,$6","$7}'
By the way, here's how you would do what you set out to do:
printf($1" ");for (i=3; i<=NF; i++) printf("%s ", $i); printf("\n")
You would print the first field explicitly rather than include it by reference. However, using my suggestions above you shouldn't need to do this.

Related

Use unix command inside awk commandline to process fields

Is there a way to use unix command inside awk one-liner to do something and output the result on STDIN?
For example:
ls -lrt|awk '$8 !~ /:/ {system(date -d \"$6" "$7" "$8\" +"%Y%m%d")"|\t"$0}'
You are parsing ls, which can cause several problems.
When you are trying to get your filenames order by last modification with yyyymmdd in front of it, you can look at
# not correct for some filenames
stat --format "%.10y %n" * | tr -d '-' | sort
The solution fails for filenames with -. One way to solve that is using
# Still not ok
stat --format "%.10y %n" * | sed -r 's/^(..)-(..)/\1\2/' | sort
This will fail for filenames with newlines.
touch -d "2019-09-01 12:00:00" "two
lines.txt"
shows some of the problems you can also see with ls.
How you should solve this depends on your exact requirements.
Example
find . -maxdepth 1 ! -name "[.]*" -printf '%TY%Tm%Td ' -print0 |
sed 's#[.]/##g'| tr "\n\0" "/\n" | sort
Explanation:
maxdepth 1 Only look in current directory
! -name "[.]*" Ignore hidden files
-printf '%TY%Tm%Td ' YYYYMMDD and space
-print0 Don't use \n but NULL at the end of each result
sed 's#[.]/##g' Remove the path ./
tr "\n\0" "/\n" Replace newlines in a filename with / and NULLs with newlines
After the sort you might want to tr '|' '\n'.
If you want to have the output of the command, you can make use of getline:
kent$ awk 'BEGIN{"date"|getline;print}'
Fri 15 Nov 2019 10:51:10 AM CET
You can also assign the output to an awk variable:
kent$ awk 'BEGIN{"date"|getline v;print v}'
Fri 15 Nov 2019 10:50:20 AM CET
You are trying to format the date output from ls.
The find command has extensive control over date and time output. Using -printf action.
For examples here:
$ ls -l
-rw-rw-r--. 1 cfrm cfrm 41 Nov 15 09:12 input.txt
-rw-rw-r--. 1 cfrm cfrm 67 Nov 15 09:13 script.awk
$ find . -printf "fileName=%f \t formatedDate-UTC=[%a] \t formatedDate-custom=[%AY-%Am-%Ad]\n"
fileName=. formatedDate-UTC=[Fri Nov 15 09:43:32.0222415982 2019] formatedDate-custom=[2019-11-15]
fileName=input.txt formatedDate-UTC=[Fri Nov 15 09:12:33.0117279463 2019] formatedDate-custom=[2019-11-15]
fileName=script.awk formatedDate-UTC=[Fri Nov 15 09:13:38.0743189896 2019] formatedDate-custom=[2019-11-15]
For sorting by timestamp, we can mark the sorting to start on the timestamp marker ([ in the following example)
$ find . -printf "%f timestamp=[%AY%Am%Ad:%AT]\n" |sort -t [
22114 timestamp=[20190511:10:32:22.6453184660]
5530 timestamp=[20190506:01:03:01.2225343480]
5764 timestamp=[20190506:01:03:34.7107944450]
.font-unix timestamp=[20191115:13:27:01.8699219890]
hsperfdata_artemis timestamp=[20191115:13:27:01.8699219890]
hsperfdata_cfrm timestamp=[20191115:13:27:01.8709219730]
hsperfdata_elasticsearch timestamp=[20191115:13:27:01.8699219890]
.ICE-unix timestamp=[20191115:13:27:01.8699219890]
input.txt timestamp=[20191115:09:12:33.1172794630]
junk timestamp=[20191115:09:43:32.2224159820]
script.awk timestamp=[20191115:09:13:38.7431898960]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-chronyd.service-AoZvZM timestamp=[20190516:05:09:51.1884573210]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-vgauthd.service-f2m9rt timestamp=[20190516:05:09:51.1884573210]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-vmtoolsd.service-0CJ32C timestamp=[20190516:05:09:51.1884573210]
.Test-unix timestamp=[20191115:13:27:01.8699219890]
. timestamp=[20191115:13:26:56.8770048750]
.X11-unix timestamp=[20191115:13:27:01.8699219890]
.XIM-unix timestamp=[20191115:13:27:01.8699219890]

UNIX how to parse list of files in a directory more than once

I am working on a list of files in a directory. I have already used awk to snip out specific fields. Next I want to further cut-down the files.
My commands are
ls /archive/gtx_rec_full | grep '2019-05-1' | awk '{print $5,$6,$7,$8}' | more
which returns a list like
9636502 2019-05-10 00:40 /archive/gtx_rec_full/GTX_20190608_1967_40431_236965.dat.gz
15915297 2019-05-10 01:39 /archive/gtx_rec_full/GTX_20190608_1967_40432_382768.dat.gz
10672671 2019-05-10 01:39 /archive/gtx_rec_full/GTX_20190608_1967_40433_261926.dat.gz
17362746 2019-05-10 02:41 /archive/gtx_rec_full/GTX_20190608_1967_40434_418702.dat.gz
13355381 2019-05-10 03:40 /archive/gtx_rec_full/GTX_20190608_1967_40435_323201.dat.gz
I want to keep the file sizes and timestamps, and then snip out the unique file IDs, like 40431, 40432, 40433, etc. So, my new result set would look like:
9636502 05/10/2019 00:40 /archive/gtx_rec_full/40431
15915297 05/10/2019 01:39 /archive/gtx_rec_full/40432
10672671 05/10/2019 01:39 /archive/gtx_rec_full/40433
17362746 05/10/2019 02:41 /archive/gtx_rec_full/40434
13355381 05/10/2019 03:40 /archive/gtx_rec_full/40435
It is not clear to me how to do this. Can anyone offer some suggestions?
Thank you!
I am working on Red Hat Enterprise Linux Server, 7.5
As I understand from your comment, you wanted
| sed -r 's/GTX.*_(.*)_[^_]*/\1/;s/\// /g'
(or, when you get confused by the slashes)
| sed -r 's#GTX.*_(.*)_[^_]*#\1#;s#/# #g'
When the solution you found is different, please post that one and accept your own answer.
Then everybody sees that the question is "finished".
try this:
#!/bin/bash
while IFS= read -r -d '' line; do
id=$(awk -F_ '{ print $4 }' <<< "${line##*/}")
path=$(awk '{ print $4 }' <<< "${line%/*}")
newpath="$path/$id"
echo "${newpath//\/*/}"
done < <(find /archive/gtx_rec_full -type f -name 'GTX_*' -printf "%-10s %Am/%Ad/%AY %AH:%AM %p\n" -print0)

Linux Shell - String manipulation then calculating age of file in minutes

I am writing a script that calculates the age of the oldest file in a directory. The first commands run are:
OLDFILE=`ls -lt $DIR | grep "^-" | tail -1 `
echo $OLDFILE
The output contains a lot more than just the filename. eg
-rwxrwxr-- 1 abc abc 334 May 10 2011 ABCD_xyz20110510113817046.abc.bak
Q1/. How do I obtain the output after the last space of the above line? This would give me the filename. I realise some sort of string manipulation is required but am new to shell scripting.
Q2/. How do I obtain the age of this file in minutes?
To obtain just the oldest file's name,
ls -lt | awk '/^-/{file=$NF}END{print file}'
However, this is not robust if you have files with spaces in their names, etc. Generally, you should try to avoid parsing the output from ls.
With stat you can obtain a file's creation date in machine-readable format, expressed as seconds since Jan 1, 1970; with date +%s you can obtain the current time in the same format. Subtract and divide by 60. (More Awk skills would come in handy for the arithmetic.)
Finally, for an alternate solution, look at the options for find; in particular, its printf format strings allow you to extract a file's age. The following will directly get you the age in seconds and inode number of the oldest file:
find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1
Using the inode number avoids the issues of funny file names; once you have a single inode, converting that to a file name is a snap:
find . -maxdepth 1 -inum "$number"
Tying the two together, you might want something like this:
# set -- Replace $# with output from command
set -- $(find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1)
# now $1 is the timestamp and $2 is the inode
oldest_filename=$(find . -maxdepth 1 -inum "$2")
age_in_minutes=$(date +%s | awk -v d="$1" '{ print ($1 - d) / 60 }')
an awk solution, giving you how old the file is in minutes, (as your ls output does not contain the min of creation, so 00 is assumed by default). Also as tripleee pointed out, ls outputs are inherently risky to be parsed.
[[bash_prompt$]]$ echo $l; echo "##############";echo $l | awk -f test.sh ; echo "#############"; cat test.sh
-rwxrwxr-- 1 abc abc 334 May 20 2013 ABCD_xyz20110510113817046.abc.bak
##############
File ABCD_xyz20110510113817046.abc.bak is 2074.67 min old
#############
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
month=date[$6];
old_time=mktime($8" "month" "$7" "00" "00" "00);
curr_time=systime();
print "File " $NF " is " (curr_time-old_time)/60 " min old";
}
For Q2 a bash one-liner could be:
let mtime_minutes=\(`date +%s`-`stat -c%Y "$file_to_inspect"`\)/60
You probably want ls -t. Including the '-l' option explicitly asks for ls to give you all that other info that you actually want to get rid of.
However what you probably REALLY want is find, first:
NEWEST=$(find . -maxdepth 1 -type f | cut -c3- |xargs ls -t| head -1)
this will get you the plain filename of the newest item, no other processing needed.
Directories will be correctly excluded. (thanks to tripleee for pointing out that's what you were aiming for.)
For the second question, you can use stat and bc:
TIME_MINUTES=$(stat --format %Y/60 "$NEWEST" |bc -l)
remove the '-l' option from bc if you only want a whole number of minutes.

Separating Awk input in Unix

I am trying to write an Awk program that takes two dates separated by / so 3/22/2013 for example and breaks them into the three separate numbers so that I could work with the 3 the 22 and the 2013 separately.
I would like the program to be called like
awk -f program_file 2/23/2013 4/15/2013
so far I have:
BEGIN {
d1 = ARGV[1]
d2 = ARGV[2]
}
This will accept both dates, but I am not sure how to break them up. Additionally, the above program must be called with nawk, with awk says it cannot open 2/23/2013.
Thanks in advance.
you cannot do it in your way. since awk thinks you have two files as input. that is, your date strings were looked as filenames. That's why you got that error message.
if the two dates are stored in shell variables, you could:
awk -vd1="$d1" -vd2="$d2" BEGIN{split(d1,one,"/");split(d2,two,"/");...}{...}'
the ... part is your logic, in the line above, the splitted parts are stored in array one and two. for example, you just want to print the elements of one:
kent$ d1=2/23/2013
kent$ d2=4/15/2013
kent$ awk -vd1="$d1" -vd2="$d2" 'BEGIN{split(d1,one,"/");split(d2,two,"/"); for(x in one)print one[x]}'
2
23
2013
or as other suggested, you could use FS of awk, but you have to do in this way:
kent$ echo $d1|awk -F/ '{print $1,$2,$3}'
2 23 2013
if you pass the two vars in one short, the -F/ won't work, unless they(the two dates) are in different lines
hope it helps
How about it?
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $1}'
2
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $2}'
23
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $3}'
2013
You could decide to use / as a field separator, and pass -F / to GNU awk (or to nawk)
If you're on a machine with nawk and awk, there's a chance you're on Solaris and using /bin/awk or /usr/bin/awk, both of which are old, broken awk which must never be used. Use /usr/xpg4/bin/awk on Solaris instead.
Anyway, to your question:
$ cat program_file
BEGIN {
d1 = ARGV[1]
d2 = ARGV[2]
split(d1,array,/\//)
print array[1]
print array[2]
print array[3]
exit
}
$ awk -f program_file 2/23/2013 4/15/2013
2
23
2013
There may be better approaches though. Post some more info about what you're trying to do if you'd like help.

How to print third column to last column?

I'm trying to remove the first two columns (of which I'm not interested in) from a DbgView log file. I can't seem to find an example that prints from column 3 onwards until the end of the line. Note that each line has variable number of columns.
...or a simpler solution: cut -f 3- INPUTFILE just add the correct delimiter (-d) and you got the same effect.
awk '{for(i=3;i<=NF;++i)print $i}'
awk '{ print substr($0, index($0,$3)) }'
solution found here:
http://www.linuxquestions.org/questions/linux-newbie-8/awk-print-field-to-end-and-character-count-179078/
Jonathan Feinberg's answer prints each field on a separate line. You could use printf to rebuild the record for output on the same line, but you can also just move the fields a jump to the left.
awk '{for (i=1; i<=NF-2; i++) $i = $(i+2); NF-=2; print}' logfile
awk '{$1=$2=$3=""}1' file
NB: this method will leave "blanks" in 1,2,3 fields but not a problem if you just want to look at output.
If you want to print the columns after the 3rd for example in the same line, you can use:
awk '{for(i=3; i<=NF; ++i) printf "%s ", $i; print ""}'
For example:
Mar 09:39 20180301_123131.jpg
Mar 13:28 20180301_124304.jpg
Mar 13:35 20180301_124358.jpg
Feb 09:45 Cisco_WebEx_Add-On.dmg
Feb 12:49 Docker.dmg
Feb 09:04 Grammarly.dmg
Feb 09:20 Payslip 10459 %2828-02-2018%29.pdf
It will print:
20180301_123131.jpg
20180301_124304.jpg
20180301_124358.jpg
Cisco_WebEx_Add-On.dmg
Docker.dmg
Grammarly.dmg
Payslip 10459 %2828-02-2018%29.pdf
As we can see, the payslip even with space, shows in the correct line.
What about following line:
awk '{$1=$2=$3=""; print}' file
Based on #ghostdog74 suggestion. Mine should behave better when you filter lines, i.e.:
awk '/^exim4-config/ {$1=""; print }' file
awk -v m="\x0a" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
This chops what is before the given field nr., N, and prints all the rest of the line, including field nr.N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line, which is the problem with daisaa's answer.
Define a function:
fromField () {
awk -v m="\x0a" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
Works well for files where '/n' is the record separator so you don't have that new-line char inside the lines. If you want to use it with other record separators then use:
awk -v m="\x01" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
for example. Works well with almost all files as long as they don't use hexadecimal char nr. 1 inside the lines.
awk '{a=match($0, $3); print substr($0,a)}'
First you find the position of the start of the third column.
With substr you will print the whole line ($0) starting at the position(in this case a) to the end of the line.
The following awk command prints the last N fields of each line and at the end of the line prints a new line character:
awk '{for( i=6; i<=NF; i++ ){printf( "%s ", $i )}; printf( "\n"); }'
Find below an example that lists the content of the /usr/bin directory and then holds the last 3 lines and then prints the last 4 columns of each line using awk:
$ ls -ltr /usr/bin/ | tail -3
-rwxr-xr-x 1 root root 14736 Jan 14 2014 bcomps
-rwxr-xr-x 1 root root 10480 Jan 14 2014 acyclic
-rwxr-xr-x 1 root root 35868448 May 22 2014 skype
$ ls -ltr /usr/bin/ | tail -3 | awk '{for( i=6; i<=NF; i++ ){printf( "%s ", $i )}; printf( "\n"); }'
Jan 14 2014 bcomps
Jan 14 2014 acyclic
May 22 2014 skype
Perl solution:
perl -lane 'splice #F,0,2; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,2 cleanly removes columns 0 and 1 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
If your input file is comma-delimited, rather than space-delimited, use -F, -lane
Python solution:
python -c "import sys;[sys.stdout.write(' '.join(line.split()[2:]) + '\n') for line in sys.stdin]" < file
Well, you can easily accomplish the same effect using a regular expression. Assuming the separator is a space, it would look like:
awk '{ sub(/[^ ]+ +[^ ]+ +/, ""); print }'
awk '{print ""}{for(i=3;i<=NF;++i)printf $i" "}'
A bit late here, but none of the above seemed to work. Try this, using printf, inserts spaces between each. I chose to not have newline at the end.
awk '{for(i=3;i<=NF;++i) printf("%s ", $i) }'
awk '{for (i=4; i<=NF; i++)printf("%c", $i); printf("\n");}'
prints records starting from the 4th field to the last field in the same order they were in the original file
In Bash you can use the following syntax with positional parameters:
while read -a cols; do echo ${cols[#]:2}; done < file.txt
Learn more: Handling positional parameters at Bash Hackers Wiki
If its only about ignoring the first two fields and if you don't want a space when masking those fields (like some of the answers above do) :
awk '{gsub($1" "$2" ",""); print;}' file
awk '{$1=$2=""}1' FILENAME | sed 's/\s\+//g'
First two columns are cleared, sed removes leading spaces.
In AWK columns are called fields, hence NF is the key
all rows:
awk -F '<column separator>' '{print $(NF-2)}' <filename>
first row only:
awk -F '<column separator>' 'NR<=1{print $(NF-2)}' <filename>

Resources