Change date format in first column using awk/sed - linux

I have a shell script which is automatically ran each morning which appends that days results to a text file. The file should have todays date on the first column followed by results separated by commas. I use the command date +%x to get the day in the required format (dd/mm/yy). However on one computer date +%x returns mm/dd/yyyy ( any idea why this is the case?). I then sort the data in the file in date order.
Here is a snippet of such a text file
29/11/12,9654.80,194.32,2.01,7.19,-7.89,7.65,7.57,3.98,9625.27,160.10,1.66,4.90,-4.79,6.83,4.84,3.54
03/12/12,5184.22,104.63,2.02,6.88,-6.49,7.87,6.67,4.10,5169.52,93.81,1.81,5.29,-5.45,7.87,5.37,4.10
04/12/12,5183.65,103.18,1.99,6.49,-6.80,8.40,6.66,4.38,5166.04,95.44,1.85,6.04,-6.49,8.40,6.28,4.38
11/07/2012,5183.65,102.15,1.97,6.78,-6.36,8.92,6.56,4.67,5169.48,96.67,1.87,5.56,-6.10,8.92,5.85,4.67
07/11/2012,5179.39,115.57,2.23,7.64,-6.61,8.83,7.09,4.62,5150.17,103.52,2.01,7.01,-6.08,8.16,6.51,4.26
11/26/2012,5182.66,103.30,1.99,7.07,-5.76,7.38,6.37,3.83,5162.81,95.47,1.85,6.34,-5.40,6.65,5.84,3.44
11/30/2012,5180.82,95.19,1.84,6.51,-5.40,7.91,5.92,4.12,5163.98,91.82,1.78,5.58,-5.07,7.05,5.31,3.65
Is it possible to change the date format for the latter four lines to the correct date format using awk or sed? I only wish to change the date format for those in the form mm/dd/yyyy to dd/mm/yy.

It looks like you're using two different flavors (versions) of date. To check which versions you've got, I think GNU date accepts the --version flag whereas other versions, like BSD/OSX will not accept this flag.
Since you may be using completely different systems, it's probably safest to avoid date completely and use perl to print the current date:
perl -MPOSIX -e 'print POSIX::strftime("%d/%m/%y", localtime) . "\n"'
If you are sure you have GNU awk on both machines, you could use it like this:
awk 'BEGIN { print strftime("%d/%m/%y") }'
To fix the file you've got, here's my take using GNU awk:
awk '{ print gensub(/^(..\/)(..\/)..(..,)/, "\\2\\1\\3", "g"); next }1' file
Or using sed:
sed 's/^\(..\/\)\(..\/\)..\(..,\)/\2\1\3/' file
Results:
29/11/12,9654.80,194.32,2.01,7.19,-7.89,7.65,7.57,3.98,9625.27,160.10,1.66,4.90,-4.79,6.83,4.84,3.54
03/12/12,5184.22,104.63,2.02,6.88,-6.49,7.87,6.67,4.10,5169.52,93.81,1.81,5.29,-5.45,7.87,5.37,4.10
04/12/12,5183.65,103.18,1.99,6.49,-6.80,8.40,6.66,4.38,5166.04,95.44,1.85,6.04,-6.49,8.40,6.28,4.38
07/11/12,5183.65,102.15,1.97,6.78,-6.36,8.92,6.56,4.67,5169.48,96.67,1.87,5.56,-6.10,8.92,5.85,4.67
11/07/12,5179.39,115.57,2.23,7.64,-6.61,8.83,7.09,4.62,5150.17,103.52,2.01,7.01,-6.08,8.16,6.51,4.26
26/11/12,5182.66,103.30,1.99,7.07,-5.76,7.38,6.37,3.83,5162.81,95.47,1.85,6.34,-5.40,6.65,5.84,3.44
30/11/12,5180.82,95.19,1.84,6.51,-5.40,7.91,5.92,4.12,5163.98,91.82,1.78,5.58,-5.07,7.05,5.31,3.65

This should work: sed -re 's/^([0-9][0-9])\/([0-9][0-9])\/[0-9][0-9]([0-9][0-9])(.*)$/\2\/\1\/\3\4/'
It can be made smaller but I made it so it would be more obvious what it does (4 groups, just switching month/day and removing first two chars of the year).
Tip: If you don't want to cat the file you could to the changes in place with sed -i. But be careful if you put a faulty expression in you might end up breaking your source file.
NOTE: This assumes that IF the year is specified with 4 digits, the month/day is reversed.

This below command will do it.
Note:No matter how many number of lines are present in the file.this will just change the last 4 lines.
tail -r your_file| awk -F, 'NR<5{split($1,a,"/");$1=a[2]"/"a[1]"/"a[3];print}1'|tail -r
Well i could figure out some way without using pipes and using a single awk statement and this solution does need a tail command:
awk -F, 'BEGIN{cmd="wc -l your_file";while (cmd|getline tmp);split(tmp,x)}x[1]-NR<=4{split($1,a,"/");$1=a[2]"/"a[1]"/"a[3];print}1' your_file

Another solution:
awk -F/ 'NR<4;NR>3{a=$1;$1=$2;$2=a; print $1"/"$2"/" substr($3,3,2) substr($3,5)}' file

Using awk:
$ awk -F/ 'NR>3{x=$1;$1=$2;$2=x}1' OFS="/" file
By using the / as the delimiter, all you need to do is swap the 1st and 2nd fields which is done here using a temporary variable.

Related

Using Sed or Awk to divide a file into two based on whether a line contains a numeric value

I have used sed and awk for little while now, but I am having a challenge with the below problem. I am asking for an experienced sed/awk guru to help.
I have a file where some lines have numbers and some lines do not, like:
afjjdjfj.uihuihi
trfg.rtyhd
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
rtygfd.ijhniuh
etc.
I would like to have exactly two files out of this one, where every line is represented in one of the two files (none are deleted).
One containing all lines with any numbers 0-9 on them so given above file result would be:
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
and another file containing the rest of the lines that do not have any numbers 0-9 on them, so given the above, file it would be:
afjjdjfj.uihuihi
trfg.rtyhd
rtygfd.ijhniuh
I've tried different strategies in both sed and awk and nothing is giving me exactly what I need.
What would be the best sed or awk one liner to solve this problem?
Thank you for your time,
Tom
Easily with Awk:
awk '/[0-9]/{print > file1; next} {print > file2}' inputfile
With single GNU sed command:
sed -ne '/[0-9]/w with_digits.txt' -e '//!w no_digits.txt' input
Results:
> cat no_digits.txt
afjjdjfj.uihuihi
trfg.rtyhd
rtygfd.ijhniuh
> cat with_digits.txt
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
w filename Write the pattern space to filename.
If you don't mind running twice over the input, you can use just grep:
grep '[0-9]' input > with_digits
grep -v '[0-9]' input > without_digits
perl -MFile::Slurp -lpe '/\d/ ? append_file("digits.txt",$_) : append_file("no_digits.txt",$_)' input.txt

Cut number from string

I want to cut several numbers from a .txt file to add them later up. Here is an abstract from the .txt file:
anonuser pts/25 127.0.0.1 Mon Nov 16 17:24 - crash (10+23:07)
I want to get the "10" before the "+" and I only want the number, nothing else. This number should be written to another .txt file. I used this code, but it only works if the number has one digit:
awk ' /^'anonuser' / {split($NF,k,"[(+0:)][0-9][0-9]");print k[1]} ' log2.txt > log3.txt
With GNU grep:
grep -Po '\(\K[^+]*' file > new_file
Output to new_file:
10
See: PCRE Regex Spotlight: \K
What if you use the match() function in awk?
$ awk '/^anonuser/ && match($NF,/^\(([0-9]*)/,a) {print a[1]}' file
10
How does this work?
/^anonuser/ && match() {print a[1]} if the line starts with anonuser and the pattern is found, print it.
match($NF,/^\(([0-9]*)/,a) in the last field ((10+23:07)), look for the string ( + digits and capture these in the array a[].
Note also that this approach allows you to store the values you capture, so that you can then sum them as you indicate in the question.
The following uses the same approach as the OP, and has a couple of advantages, e.g. it does not require anything special, and it is quite robust (with respect to assumptions about the input) and maintainable:
awk '/^anonuser/ {split($NF,k,/+/); gsub(/[^0-9]/,"",k[1]); print k[1]}'
for anything more complex use awk but for simple task sed is easy enough
sed -r '/^anonuser/{s/.*\(([0-9]+)\+.*/\1/}'
find the number between a ( and + sign.
I am not sure about the format in the file.
Can you use simple cut commands?
cut -d"(" -f2 log2.txt| cut -d"+" -f1 > log3.txt

View logs from yesterday/today

I'm looking for a way to view multiple log files for the last two days into a single pass.
At first, I tried with GREP:
#!/bin/bash
yesterday=$(date --date="yesterday" +"%Y-%m-%d")
today=$(date +"%Y-%m-%d")
grep "$yesterday\|$today" *.log | less
This is nice but it doesn't output lines in between matches (lines that don't have the date in them, like error stack traces - which is what I'm really interested in)...
So I found this:
#!/bin/bash
yesterday=$(date --date="yesterday" +"%Y-%m-%d")
sed -ne '/$yesterday/,$p' *.log | less
For each file, it outputs everything from the first match to the end of the file. That's just perfect... except for one thing... When reading it, I don't know which file's content I'm looking at. I would like to see the file name at the start of each line, just like with a grep.
How can I prefix the file name to each line in my sed command?
Would there be a nicer / better way to do this?
Thanks ;-)
Not a sed solution but as you asked for a nicer / better way to do this... If you have GNU awk somewhere,
awk -v day="$yesterday" 'BEGINFILE {run=0} $0 ~ day {run=1} run == 1 {print FILENAME, $0}' *.log
should make it.
Explanation:
GNU awk processes all files in sequence. The GNU awk variable day is initialized to the shell expression "$yesterday" GNU awk executes the BEGINFILE rule before processing a new file. This rule clears the run variable. Whenever a line ($0) matches GNU awk variable day ("$yesterday") the run variable is set. And when the run variable is set, the name of the current file is printed (FILENAME), followed by the current line ($0).

renaming files using loop in unix

I have a situation here.
I have lot of files like below in linux
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaac
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaag
I want to remove the $line and make a counter from 0001 to 6000 for my 6000 such files in its place.
Also i want to remove the trailer 3 characters after this is done for each file.
After fix file should be like
SIPTV_FIPTV_ID0000001_T20141003195717_C0000001000_FWD148_IPV_001.DAT
SIPTV_FIPTV_ID0000002_T20141003195717_C0000001000_FWD148_IPV_001.DAT
Please help.
With some assumption, I think this should do it:
1. list of the files is in a file named input.txt, one file per line
2. the code is running in the directory the files are in
3. bash is available
awk '{i++;printf "mv \x27"$0"\x27 ";printf "\x27"substr($0,1,16);printf "%05d", i;print substr($0,22,47)"\x27"}' input.txt | bash
from the command prompt give the following command
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}'
%
and check the output, if it looks OK
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}' | sh
%
A commentary: echo *.DAT??? is meant to give as input to awk a list of all the filenames that you want to modify, you may want something more articulated if the example names you gave aren't representative of the whole spectrum... regarding the awk script itself, I used sprintf to generate a string with the correct number of zeroes for the replacement of $line, the idiom `"\\$..." with two backslashes to quote the dollar sign is required by gawk and does no harm in mawk, and as a last remark I have to say that in similar cases I prefer to make at least a dry run before passing the commands to the shell...

How to do something like grep -B to select only one line?

Everything is in the title. Basicaly let's say I have this pattern
some text lalala
another line
much funny wow grep
I grep funny and I want my output to be "lalala"
Thank you
One possible answer is to use either ed or ex to do this (it is trivial in them):
ed - yourfile <<< 'g/funny/.-2p'
(Or replace ed with ex. You might have red, the restricted editor, too; it can't modify files.) This looks for the pattern /funny/ globally, and whenever it is found, prints the line 2 before the matching line (that's the .-2p part). Or, if you want the most recent line containing 'lalala' before the line matching 'funny':
ed - yourfile <<< 'g/funny/?lalala?p'
The only problem is if you're trying to process standard input rather than a file; then you have to save the standard input to a file and process that file, which spoils the concurrency.
You can't do negative offsets in sed (though GNU sed allows you to do positive offsets, so you could use sed -n '/lalala/,+2p' file to get the 'lalala' to 'funny' lines (which isn't quite what you want) based on finding 'lalala', but you cannot find the 'lalala' lines based on finding 'funny'). Standard sed does not allow offsets at all.
If you need to print just the IP address found on a line 8 lines before the pattern-matching line, you need a slightly more involved ed script, but it is still doable:
ed - yourfile <<< 'g/funny/.-8s/.* //p'
This uses the same basic mechanism to find the right line, then runs a substitute command to remove everything up to the last space on the line and print the modified version. Since there isn't a w command, it doesn't actually modify the file.
Since grep -B only prints each full number of lines before the match, you'll have to pipe the output into something like grep or Awk.
grep -B 2 "funny" file|awk 'NR==1{print $NF; exit}'
You could also just use Awk.
awk -v s="funny" '/[[:space:]]lalala$/{n=NR+2; o=$NF}NR==n && $0~s{print o}' file
For the specific example of an IP address 8 lines before the match as mentioned in your comment:
awk -v s="funny" '
/[[:space:]][0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/ {
n=NR+8
ip=$NF
}
NR==n && $0~s {
print ip
}' file
These Awk solutions first find the output field you might want, then print the output only if the word you want exists in the nth following line.
Here's an attempt at a slightly generalized Awk solution. It maintains a circular queue of the last q lines and prints the line at the head of the queue when it sees a match.
#!/bin/sh
: ${q=8}
e=$1
shift
awk -v q="$q" -v e="$e" '{ m[(NR%q)+1] = $0 }
$0 ~ e { print m[((NR+1)%q)+1] }' "${#--}"
Adapting to a different default (I set it to 8) or proper option handling (currently, you'd run it like q=3 ./qgrep regex file) as well as remembering (and hence printing) the entire line should be easy enough.
(I also didn't bother to make it work correctly if you see a match in the first q-1 lines. It will just print an empty line then.)

Resources