Using awk to add one month to a date [duplicate] - linux

This question already has answers here:
Increment date with AWK for few days and months
(3 answers)
Closed 4 years ago.
I have a file 1.txt like below:
"15227962157615645"$"2018-12-04 06:55:43"
"15227525816721347"$"2018-12-03 18:48:11"
I can get the date using:
awk -F\" '{print $4}' 1.txt
Additionally I need add one month to the date. For the above input my desired output would be:
2019-01-04 06:55:43
2019-01-03 18:48:11
I tried to use
awk -F\" '{print date -d "$4 +1 month"+%Y-%m-%d}' 1.txt
but it does not work.

Awk has limited support for date calculation, so here is a bash only solution relying on the date command:
IFS='$';
while read n t; do
printf '%s$"%s"\n' "$n" "$(date -d "${t//\"/} +1 month" '+%F %T')"
done <file
The input field separator is set to $ to get the time into $t variable.
The double quote of the date field are removed using bash parameter expansion ${t//\"/}.
This allows to pass the +1 month key word to date.
Then the printf prints back to the original format of the input file.

Related

Find the next nearest value (bash)

Let's say I have some holiday data (holiday_master.csv) in columns, something like
...
20200320 Vernal Equinox Day
20200429 Showa Day
20200503 Constitution Day
20200505 Green Day
20200720 Children's Day
20200811 Sea Day
...
Given this set of data, I want to find the next closest holiday from the given date.
For example if the input is 20200420, 20200429 Showa Day is expected.
If the input is 20200620, 20200720 Children's Day is expected.
I have a feeling that awk has the necessary functionality to do this, but any solution that works in a bash script is welcome.
Would you please try the bash script:
#!/bin/bash
input="20200428" # or assign to whatever
< "holiday_master.csv" sort -nk1,1 | # sort the csv file by date and pass to the while loop
while read -r date desc; do
if (( date >= input )); then # if the date is greater than or equal to the input
echo "$date" "$desc" # then print the line
break # and exit the loop
fi
done
Assuming no two days will ever have the same date...
DATE=<some desired input date>
awk "{print (\$1 - $DATE"' "\t" $0)}' calendar.txt | sed '/^-/d' | sort | head -n 1 | awk '{$1=""; print $0}'
Explanation
awk "{print (\$1 - $DATE"' "\t" $0)}' calendar.txt: Prepend a column to the input.txt file describing the difference between the desired input date and the date column
sed '/^-/d': Remove all lines beginning with -. Dates with negative differences have already passed.
sort: Sort the remaining entries from least to greatest (based upon the difference column)
head -n 1: Select only the first row (The lowest difference)
awk '{$1=""; print $0}': Print all but the first column
Prettier script version
#!/bin/bash
# Usage: script <Date> <Calendar file>
DATE=${1:--1}
CAL=${2:-calendar.txt}
# Arg check and execute
if[ ! -f $CAL ]
then
echo "File not found: $CAL"
echo "Usage: script <Date> <Calendar file>"
elif [ $DATE -le 0 ]
then
echo "Invalid date: $DATE"
echo "Usage: script <Date> <Calendar file>"
elif [ $(echo "$DATE" | grep -Ewo -- '-?[0-9]+' | wc -l) -eq 0 ]
then
echo "Invalid date: $DATE"
echo "Usage: script <Date> <Calendar file>"
else
awk '{print ($1 - '"$DATE"' "\t" $0)}' $CAL | sed '/^-/d' | sort | head -n 1 | awk '{$1=""; print $0}'
fi
As you use YYYYMMDD format we might compare it just like numbers (note: year is greater than month, month is greater than day). So you can use AWK following way, let:
20200320 Vernal Equinox Day
20200429 Showa Day
20200503 Constitution Day
20200505 Green Day
20200720 Children's Day
20200811 Sea Day
be file named holidays.txt then:
awk 'BEGIN{inputdate=20200420}{if($1>inputdate){print $2;exit}}' holidays.txt
output:
Showa
Explanation: in BEGIN I set inputdate to 20200420 then when line with greater number in 1st column is found I print content of 2nd column and exit (otherwise later dates would be printed too). Note that AWK does automatically parse number when asked to do comparison (> in this case) so you do not have to care about conversion yourself - you could even do inputdate="20200420" and it would work too.
This solution assumes that all dates in file are already sorted.
Using awk and assuming the source data is comma separated:
awk -F, -v dayte="20200420" '
BEGIN {
"date -d "dayte" +%s" | getline dat1
{
{
"date -d "$1" +%s" | getline dat2;
dat3=dat2-dat1;
if (dat3 > 0 )
{
hols[dat3]=$2
}
}
END {
asorti(hols,hols1,"#ind_num_asc");
print hols[hols1[1]]
}
' holiday_master.csv
One liner:
awk -F, -v dayte="20200420" 'BEGIN { "date -d "dayte" +%s" | getline dat1 } { "date -d "$1" +%s" | getline dat2;dat3=dat2-dat1;if (dat3 > 0 ) { hols[dat3]=$2 } } END { asorti(hols,hols1,"#ind_num_asc");print hols[hols1[1]] }' holiday_master.csv
Set the field separator to , and set a variable dayte to the date we wish to check. In the BEGIN block, we pass the dayte variable through to date command via an awk pipe/getline and read the epoch result into the variable dat1. We do the same with the first column on the master file ($1) and read this into dat2. We take the difference between the epoch dates and read the result into dat3. Only if the result is positive (in the future) do we then use dat3 for an index in a "hols" array, with the holiday description as the value. In the END block, we sort the indexes of hols into a news hols1 array basing the sort on ascending, numeric indexes. We then take the first index of the new hols1 array to attain the holiday that is closest to the dayte variable.
Assuming the holiday list file is sorted by date as you have given, the below would work
$ awk -v dt="20200420" ' (dt-$1)<0 { print;exit } ' holiday.txt
20200429 Showa Day
$ awk -v dt="20200620" ' (dt-$1)<0 { print;exit } ' holiday.txt
20200720 Children's Day
$
If the holiday file is not sorted, then you can use below
$ shuf holiday.txt | awk -v dt="20200420" ' dt-$1<0 { a[(dt-$1)*-1]=$0 } END { asort(a); print a[1] } '
20200429 Showa Day
$ shuf holiday.txt | awk -v dt="20200620" ' dt-$1<0 { a[(dt-$1)*-1]=$0 } END { asort(a); print a[1] } '
20200720 Children's Day

How to get Date Month values in linux date command as integers to work on

I want to convert date and month as integers.
for example.
if the current date as per the command "Date +%m-%d-%y" output, is this
09-11-17
Then I am storing
cur_day=`date +%d`
cur_month=`date +%m`
the $cur_day will give me 11 and $cur_month will give me 09.
I want to do some operations on the month as 09. like i want to print all the numbers up to 09.
like this 01,02,03,04,05,06,07,08,09
Same way I want to display all the numbers up to cur_day
like 01,02,03,04,05,06,07,08,09,10,11
Please tell me how can i do it.
Thanks in Advance.
For months:
$ printf ',%02d' $(seq 1 $(date +%m)) | sed 's/,/like this /; s/$/\n/'
like this 01,02,03,04,05,06,07,08,09
For days:
$ printf ',%02d' $(seq 1 $(date +%d)) | sed 's/,/like /; s/$/\n/'
like 01,02,03,04,05,06,07,08,09,10,11
printf will print according to a format. In this case, the format ,%02d formats the numbers with commas and leading zeros.
The sed command puts the string you want at the beginning of the line and adds a newline at the end.

How to read a .csv file with shell command? [duplicate]

This question already has answers here:
Bash: Parse CSV with quotes, commas and newlines
(10 answers)
Closed 2 years ago.
I have a .csv file which I need to extract values from. It is formatted like this :
First line of the file (no data)
1;Jack;Daniels;Madrid;484016;
2;Alice;Morgan;London;564127;
etc...
I would need a shell command that read all lines of a specific column within a .csv, compare each with a string and return a value whenever it finds a matching line. In Java i would define it something like :
> boolean findMatchInCSV(String valueToFind, int colNumber, String
> colSeparator)
The separator between columns may indeed change that is why I would like a something quite generic if possible :)
But I need it as a shell command, is that possible ?
Thanks
I would need a shell command that read all lines
cat 1.csv # read the file
of a specific column within a .csv
cat 1.csv | cut -f5 -d';' # keep only the field #5 (use ';' as separator)
compare each with a string
# keep only the row where the value of the field is exactly 'foo'
cat 1.csv | cut -f5 -d';' | grep '^foo$'
return a value whenever it finds a matching line.
This last one request is unclear.
The code above displays the searched string (foo) once for each row where it is the value of column #5 (start counting from 1). The columns are separated by ;.
Unfortunately, it doesn't handle quoted strings. If the value in any field contains the separator (;), the CSV format allows enclosing the field value into double quotes (") to prevent the separator character be interpreted as a separator (forcing its literal value).
I assume you're looking for something like
FILE=data.csv
VALUE="$1"
COLNUM=$2
IFS="$3"
while read -r -a myArray
do
if "$myArray[$COLNUM]"=="$VALUE"; then
exit 0
fi
done < tail -n +2 $FILE
exit 1
grep "my_string" file |awk -F ";" '{print $5}'
or
awk -F ";" '/my_string/ {print $5}' file
For 2nd column:
awk -F ";" '$2 ~ /my_string/ {print $5}' file
For exact matching:
awk -F ";" '$2 == "my_string" {print $5}' file

Remove duplicates, but keeping only the last occurrence in linux file [duplicate]

This question already has answers here:
Eliminate partially duplicate lines by column and keep the last one
(4 answers)
Closed 6 years ago.
INPUT FILE :
5,,OR1,1000,Nawras,OR,20160105T05:30:17+0400,20181231T23:59:59+0400,,user,,aaa8016058f008ddceae6329f0c5d551,50293277591,,,30001,C
5,,OR1,1000,Nawras,OR,20160105T05:30:17+0400,20181231T23:59:59+0400,20160217T01:45:18+0400,,user,aaa8016058f008ddceae6329f0c5d551,50293277591,,,30001,H
5,,OR2,2000,Nawras,OR,20160216T06:30:18+0400,20191231T23:59:59+0400,,user,,f660818af5625b3be61fe12489689601,50328589469,,,30002,C
5,,OR2,2000,Nawras,OR,20160216T06:30:18+0400,20191231T23:59:59+0400,20160216T06:30:18+0400,,user,f660818af5625b3be61fe12489689601,50328589469,,,30002,H
5,,OR1,1000,Nawras,OR,20150328T03:00:13+0400,20171230T23:59:59+0400,,user,,22bf18b024e1d4f42ac79943062cf576,50212935879,,,10001,C
5,,OR1,1000,Nawras,OR,20150328T03:00:13+0400,20171230T23:59:59+0400,20150328T03:00:13+0400,,user,22bf18b024e1d4f42ac79943062cf576,50212935879,,,10001,H
0,,OR5,5000,Nawras,OR,20160421T02:45:16+0400,20191231T23:59:59+0400,,user,,c7c501ac92d85a04bb26c575929e9317,50329769192,,,11001,C
0,,OR5,5000,Nawras,OR,20160421T02:45:16+0400,20191231T23:59:59+0400,20160421T02:45:16+0400,,user,c7c501ac92d85a04bb26c575929e9317,50329769192,,,11001,H
0,,OR1,1000,Nawras,OR,20160330T02:00:14+0400,20181231T23:59:59+0400,,user,,d4ea749306717ec5201d264fc8044201,50285524333,,,11001,C
DESIRED OUTPUT :
5,,OR1,1000,UY,OR,20160105T05:30:17+0400,20181231T23:59:59+0400,20160217T01:45:18+0400,,user,aaa8016058f008ddceae6329f0c5d551,50293277591,,,30001,H
5,,OR2,2000,UY,OR,20160216T06:30:18+0400,20191231T23:59:59+0400,20160216T06:30:18+0400,,user,f660818af5625b3be61fe12489689601,50328589469,,,30002,H
5,,OR1,1000,UY,OR,20150328T03:00:13+0400,20171230T23:59:59+0400,20150328T03:00:13+0400,,user,22bf18b024e1d4f42ac79943062cf576,50212935879,,,10001,H
0,,OR5,5000,UY,OR,20160421T02:45:16+0400,20191231T23:59:59+0400,20160421T02:45:16+0400,,user,c7c501ac92d85a04bb26c575929e9317,50329769192,,,11001,H
0,,OR1,1000,UY,OR,20160330T02:00:14+0400,20181231T23:59:59+0400,,user,,d4ea749306717ec5201d264fc8044201,50285524333,,,11001,C*
CODE USED :
for i in `cat file | awk -F, '{print $13}' | sort | uniq`
do
grep $i file | tail -1 >> TESTINGGGGGGG_SV
done
This took much time as the file has 300 million records and which has 65 million uniq records at 13th column .
So i would require a output which can traverse 13th column value - last occurrence in file as the output .
awk to the rescue!
awk -F, 'p!=$13 && p0 {print p0} {p=$13; p0=$0} END{print p0}' file
expects sorted input.
Please post the timing if you can successfully run the script.
If sorting is not possible, another option is
tac file | awk -F, '!a[$13]++' | tac
reverse the file, take the first entry for $13 and reverse the results back.
Here's a solution that should work:
awk -F, '{rows[$13]=$0} END {for (i in rows) print rows[i]}' file
Explanation:
rows is an associative array indexed by field 13 $13, the element of the array indexed by $13 gets overwritten every time there's a duplicate of field 13; its value is the whole line $0.
But this is inefficient in terms of memory because of the space needed to save the array.
An improvement to the above solution that's still not using sorting is to just save the line numbers in the associative array:
awk -F, '{rows[$13]=NR}END {for(i in rows) print rows[i]}' file|while read lN; do sed "${lN}q;d" file; done
Explanation:
rows as before but the values are the line numbers and not the whole lines
awk -F, '{rows[$13]=NR}END {for(i in rows) print rows[i]}' file outputs a list of row numbers containing the sought lines
sed "${lN}q;d" fetches line number lN from file

Filter Linux logs based on Unix Timestamp

I have a log on a linux server. The entries are in the format:
[timestamp (seconds since jan 1 1970)] log data entry
I need a bash script that will take the name of the log file and output only yesterdays entries (from 12:00 to 23:59:59 of previous day) and output those lines to a new file.
I've seen various scripts that filter logs based on dates but all of them so far deal with date stamps in more human readable formats, or are not dynamic. They rely on hard coded dates. I want a script that is going to run in a cron job daily so it has to be aware of what the current date is each time it runs.
Thanks.
Update: This is what I have so far. It just never seems to do the evaluation of the date. It prints 00 for the date so everything gets through.
head -5 logfile.log | awk '{
if($1 >= (date -d "today 00:00:00" +"%s"))
print $1 (date -d "today 00:00:00" +"%s");
}'
I'm confused though, even if the date evaluates properly, $1 is going to have numbers inside square brackets, and my date will be just numbers. Will it do the comparison properly if the strings are formatted differently like that? I haven't figured out how to shove the date number returned by date into a string with brackets yet.
Well, maybe using the dates as Dale said. But using a little trick to extract the "[" and "]", and after compare the dates. Something like this:
YESTERDAY=$(date -d "yesterday 00:00:00" +"%s")
TODAY=$(date -d "today 00:00:00" +"%s")
# Combine the processing in awk
awk -v MIN=${YESTERDAY} -v MAX=${TODAY} -F["]""["] '{ if ( $2 >= MIN && $2 <= MAX) print $0}' logfile.log
Combining tips and tricks from Glenn, Dale, and Davison:
awk -v today=$(date -d "today 00:00:00" +"%s") -v yesterday=$(date -d "yesterday 00:00:00" +"%s") -F'[\\[\\] ]' '{ if($2 >= yesterday && $2 < today) print }' logfile.log
Uses the shell's $() command substitution to feed variables to awk's -v argument parser
-F'[\\[\\] ]' sets the field separator to be either [, ], or
input data:
[1300000000 log1 data1 entry1]
[1444370000 log2 data2 entry2]
[1444374000 log3 data3 entry3]
[1444460399 log4 data4 entry4]
[1500000000 log5 data5 entry5]
output:
[1444370000 log2 data2 entry2]
You might try something like this:
YESTERDAY=$(date -d "yesterday 00:00:00" +"%s")
TODAY=$(date -d "today 00:00:00" +"%s")
cat your_log.log | \
awk -v MIN=${YESTERDAY} -v MAX=${TODAY} \
'{if($1 >= MIN && $1 < MAX) print}'
:)
Dale

Resources