awk: format date string from YYYYMMDD to YYYY-MM-DD - linux

I have a CSV file which I parse using awk because I don't need all columns.
The problem I have is that one column is a date but in the format YYYYMMDD but I need it in YYYY-MM-DD and I don't know how to achieve that.
I already tried with split($27, a) but it doesn't split it - so a[0] returns the whole string.

Use your awk output as input to date -d, e.g.
$ date -d 20140918 +'%Y-%m-%d'
2014-09-18

You could use substr:
printf "%s-%s-%s", substr($27,0,4), substr($27,5,2), substr($27,7,2)
Assuming that the 27th field was 20140318, this would produce 2014-03-18.

Related

How to edit the lines in text file in Linux - format the date to YYYY-MM-DD and then grep the line by time period

Can anyone help to format this text file(YYYYMMDD) as a date formatted(YYYY-MM-DD) text file using bash script or in Linux command line? I am not sure how to start editing 23millon lines!!!
I have YYYYMMDD format textfile :-
3515034013|50008|20140601|20240730
and I want to edit like YYYY-MM-DD formatted text file(Only 3rd and 4th fields need to be changed for 23million lines):-
3515034013|50008|2014-06-01|2024-07-30
I Want to convert from YYYYMMDD formatted text file to the YYYY-DD-MM format and I want to get specific lines from the text file based on the time period after this file manipulation which is the end goal.
The end goal is to format the 3rd field and 4th field as YYYY-MM-DD and also want to grep the line by date from that formatted text file:- 03rd field is the start date and the 04th field is the end date Let's say for example I need,
(01). The end date(04th field) before today i.e 2022-08-06 - all the old lines
(02). The end date(04th field) is 2 years from now i.e lines in between 2022-08-06th <-> 2024-08-06th?
Please note:- There are more than a 23million lines to edit and analyze based on the date.
How to approach this problem statement? which method is time efficient awk or sed or Bash line-by-line editing?
$ awk '
BEGIN { FS=OFS="|" }
{
for ( i=3; i<=4; i++ ) {
$i = substr($i,1,4) "-" substr($i,5,2) "-" substr($i,7)
}
print
}
' file
3515034013|50008|2014-06-01|2024-07-30
Here is a way to do it with sed. It has the same restrictions as steffens answer: | as fieldseparator and that all dates have the same format i.e. leading zeros in the month and date part.
sed -E 's/^(.*[|])([0-9]{4})([0-9]{2})([0-9]{2})[|]([0-9]{4})([0-9]{2})([0-9]{2})$/\1\2-\3-\4|\5-\6-\7/g'
Here is what the regular expression does:
^(.*[|]) captures the first part of the string from linestart (^) to a | into \1, this captures the first two columns, because the remaining part of the re matches the remaining part of the line up until lineend!
([0-9]{4})([0-9]{2})([0-9]{2})[|] captures the first date field parts into \2 to \4, notice the [|]
([0-9]{4})([0-9]{2})([0-9]{2})$ does the same for the second date column anchored at lineend ($) and captures the parts into \5 to \7, notice the $
the replacement part \1\2-\3-\4|\5-\6-\7 inserts - at the different places
the capturing into \n happens because of the use of (...) parens in the regular expression.
Here's one way to change the format with awk:
awk '{$3=substr($3,1,4) "-" substr($3,5,2) "-" substr($3,7,2); $4=substr($4,1,4) "-" substr($4,5,2) "-" substr($4,7,2); print}' FS='|' OFS='|'
It should work given that
| is only used for field separation
all dates have the same format
You can pipe the transformed lines to a new file or change it in place. Of course you can do the same with sed or ed. I'd go for awk because you'd be able to extract your specific lines just in the same run to an extra file.
This might work for you (GNU sed):
sed -E 's/^([^|]*\|[^|]*\|....)(..)(..\|....)(..)/\1-\2-\3-\4-/' file
Pattern match and insert - where desired.
Or if the file is only 4 columns:
sed -E 's/(..)(..\|....)(..)(..)$/-\1-\2-\3-\4/' file

Change date format from dd/mm/yyyy to yyyy-mm-dd in a file using shell scripting

I have a source file with 18 columns in which columns 10 , 11 and 15 are in the format dd/mm/yyyy and all these needs to be converted to yyyy-mm-dd and written to target file along with other columns.
I am aware of date formatting functions on Variables but do not know how to apply the same on few columns in a file.
I don’t have a machine available to test, but consider using awk with a little function since you are doing the same thing 3 times. It will look something like this:
awk ‘
function dodate(in){
split(in,/\//,a) # split existing date into elements of array “a”
return a[3] “-“ a[2] “-“ a[1]
}
{ $10=dodate($10); $11=dodate($11); $15=dodate($15); print }’ yourFile
Reference for awk functions, and split.
If the fields on each line are separated by commas, tell awk that with:
awk -F, ...
Maybe you could use command awk to solve it.
As you have 3 cols contain date (col 10, 11, 15), here I assume a sample string which field seperator is |, col contains date is the 4th col
aa|bb|cc|29/09/2017|dd|ee|ff
use String-Manipulation Functions to extract date, then format it with getline to format it to expected syntax.
command is
echo 'aa|bb|cc|2017-09-29|dd|ee|ff' | awk -F\| 'BEGIN{OFS="|"}{$4=gensub(/([0-9]{1,2})\/([0-9]{1,2})\/([0-9]{4})/,"\\3\\2\\1","g",$4); "date --date=\""$4"\" +\"%F\"" | getline a; $4=a; print $0}'
output is
aa|bb|cc|2017-09-29|dd|ee|ff
Hope to help you.
If you have the dateutils package installed, you can use dateutils.dconv
cat file | dateutils.dconv -S -i "%d/%m/%Y"
-i specify input date format
-S sed mode, process only the matched string and copy the rest
Input File
aa|bb|cc|29/09/2017|dd|ee|ff|02/10/2017|gg
Output
aa|bb|cc|2017-09-29|dd|ee|ff|2017-10-02|gg
I'd use the date command:
while read fmtDate
do
date -d ${fmtDate} "+%Y-%m-%d"
done

How to get Date Month values in linux date command as integers to work on

I want to convert date and month as integers.
for example.
if the current date as per the command "Date +%m-%d-%y" output, is this
09-11-17
Then I am storing
cur_day=`date +%d`
cur_month=`date +%m`
the $cur_day will give me 11 and $cur_month will give me 09.
I want to do some operations on the month as 09. like i want to print all the numbers up to 09.
like this 01,02,03,04,05,06,07,08,09
Same way I want to display all the numbers up to cur_day
like 01,02,03,04,05,06,07,08,09,10,11
Please tell me how can i do it.
Thanks in Advance.
For months:
$ printf ',%02d' $(seq 1 $(date +%m)) | sed 's/,/like this /; s/$/\n/'
like this 01,02,03,04,05,06,07,08,09
For days:
$ printf ',%02d' $(seq 1 $(date +%d)) | sed 's/,/like /; s/$/\n/'
like 01,02,03,04,05,06,07,08,09,10,11
printf will print according to a format. In this case, the format ,%02d formats the numbers with commas and leading zeros.
The sed command puts the string you want at the beginning of the line and adds a newline at the end.

How to get logs of last hour in linux using awk command

I have a logs file named source.log having time format like :-
Fri, 09 Dec 2016 05:03:29 GMT 127.0.0.1
and i am using script to get logs from a logs file for last 1 hour.
Script:-
awk -vDate=`date -d'now-1 hour' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print Date, $0}' source.log > target.log
But this script gives the result same as like the source file.
There is something wrong in time format matching, due to which it is not giving last hour records.
I know I'm late to help the OP, but maybe this answer can help anyone else in this situation.
First it's necessary to compare the whole date and not only the time part, because times near midnight.
Note that awk can only compare strings and numbers. Some awk implementations have the mktime() function that converts a specifically formatted string into UNIX timestamp, in order to make datetime comparisons, but it doesn't support any datetime format, so we can't use it.
The best way would be changing (if possible) the datetime format of the log entries, using 'YYMMDDhhmmss' datetime format or ISO format. In this way, comparing two datetimes is simple as compare strings or numbers.
But let's assume that we can't change log entries date format, so we'll need to convert ourselves inside awk:
awk -vDate="`date -d'now-1 hour' +'%Y%m%d%H%M%S'`" '
BEGIN{
for(i=0; i<12; i++)
MON[substr("JanFebMarAprMayJunJulAugSepOctNovDec", i*3+1, 3)] = sprintf("%02d", i+1);
}
toDate() > Date
function toDate(){
time = $5; gsub(/:/, "", time);
return $4 MON[$3] $2 time;
}' source.log
Explanation
-vDate=... sets the Date awk variable with the initial datetime (one hour ago).
BEGIN section creates an array indexed by the month abbreviation (it's especific to english)
toDate() function converts the line's fields into a string with the same format as Date variable (YYYMMDDhhmmss).
Finally when the condition toDate() > Date is true, awk prints the current line (log entry).

Convert specific date format to Epoch in linux

I need to convert this date format to epoch : 03/Apr/2016 14:22:59
the command
date -d "03/Apr/2016 14:22:59" +"%s"
will return :
date: invalid date ‘03/Apr/2016 14:22:59
Anyone can help me format it in a way it become recognizable by date -d ?
Thanks in advance.
Perl to the rescue:
perl -MTime::Piece -e 'print Time::Piece
->strptime("03/Apr/2016 14:22:59", "%d/%b/%Y %H:%M:%S")
->epoch'
Info page for date input formats can be shown with following command:
info date "Date input formats"
Unfortunately your date format is not supported by date. You can however convert your string into format that is supported for example like this:
date -d "$(echo '03/Apr/2016 14:22:59' | tr -s "/" "-")" +"%s"
To provide information about which kind of input strings can be used for date command I will write here a short summary:
Show date from epoch
date -d "#1513936964"
Words like today, tomorrow, month names (January), AM/PM, time zone names are supported
date -d "tomorrow 10:00:30PM"
date -d "03 April 2016 14:22:59"
Calendar formats
date -d "04/03/2016 14:22:59"
date -d "03-Apr-2016 14:22:59"
date -d "2016-04-03 14:22:59.00"
Timezones
date -d "PDT now"
Which day was Christmas last year?
date -d "$(date -d "2017-12-24") -1 year" +"%A"
Using Python:
python -c 'from time import strftime, strptime;print strftime("%s", strptime("03/Apr/2016 14:22:59", "%d/%b/%Y %H:%M:%S"))'

Resources