Change date format from dd/mm/yyyy to yyyy-mm-dd in a file using shell scripting - linux

I have a source file with 18 columns in which columns 10 , 11 and 15 are in the format dd/mm/yyyy and all these needs to be converted to yyyy-mm-dd and written to target file along with other columns.
I am aware of date formatting functions on Variables but do not know how to apply the same on few columns in a file.

I don’t have a machine available to test, but consider using awk with a little function since you are doing the same thing 3 times. It will look something like this:
awk ‘
function dodate(in){
split(in,/\//,a) # split existing date into elements of array “a”
return a[3] “-“ a[2] “-“ a[1]
}
{ $10=dodate($10); $11=dodate($11); $15=dodate($15); print }’ yourFile
Reference for awk functions, and split.
If the fields on each line are separated by commas, tell awk that with:
awk -F, ...

Maybe you could use command awk to solve it.
As you have 3 cols contain date (col 10, 11, 15), here I assume a sample string which field seperator is |, col contains date is the 4th col
aa|bb|cc|29/09/2017|dd|ee|ff
use String-Manipulation Functions to extract date, then format it with getline to format it to expected syntax.
command is
echo 'aa|bb|cc|2017-09-29|dd|ee|ff' | awk -F\| 'BEGIN{OFS="|"}{$4=gensub(/([0-9]{1,2})\/([0-9]{1,2})\/([0-9]{4})/,"\\3\\2\\1","g",$4); "date --date=\""$4"\" +\"%F\"" | getline a; $4=a; print $0}'
output is
aa|bb|cc|2017-09-29|dd|ee|ff
Hope to help you.

If you have the dateutils package installed, you can use dateutils.dconv
cat file | dateutils.dconv -S -i "%d/%m/%Y"
-i specify input date format
-S sed mode, process only the matched string and copy the rest
Input File
aa|bb|cc|29/09/2017|dd|ee|ff|02/10/2017|gg
Output
aa|bb|cc|2017-09-29|dd|ee|ff|2017-10-02|gg

I'd use the date command:
while read fmtDate
do
date -d ${fmtDate} "+%Y-%m-%d"
done

Related

Linux Command to get fields from CSV files

In csv files on Linux server, I have thousands of rows in below csv format
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
I need to get output from all the files on below format (2nd field ie 20221208195546466 and 5th field but value after Above as: and before first | ie 2 in above example )
output :
20221208195546466 , 2
Can anyone help me with linux command ?
Edit :
my attempts
I tried but it give field 5th value. How to add field 2 as well ?
cat *.csv | cut -d, -f5|cut -d'|' -f1|cut -d':' -f2|
EDIT : sorted result
Now I am using this command (based on Dave Pritlove answer ) awk -F'[,|:]' '{print $2", "$6}' file.csv. However, I have one more query, If I have to sort the output based on $6 ( value 2 in your example ) then how can i do it ? I want result should be displayed in sorted order based on 2nd output field.
for ex :
20221208195546366, 20
20221208195546436, 16
20221208195546466, 5
2022120819536466, 2
Gnu awk allows multiple field separators to be set, allowing you to delimit each record at ,, |, and : at the same time. Thus, the following will fish out the required fields from file.csv:
awk -F'[,|:]' '{print $2", "$6}' file.csv
Tested on the single record example:
echo "0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|" | awk -F'[,|:]' '{print $2", "$6}'
output:
20221208195546466, 2
Assumptions:
starting string of the 5th comma-delimited field can vary from line to line (ie, not known before hand)
the item of interest in the 5th comma-delimited field occurs between the first : and the first |
Sample data:
$ cat test.csv
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
1,20230124123456789,10,1730,Total ts:7|stuff:HAER:0|morestuff:FON:0|yetmorestuff:ION:0|
One awk approach:
awk '
BEGIN { FS=OFS="," } # define input/output field delimiter as ","
{ split($5,a,"[:|]") # split 5th field on dual delimiters ":" and "|", store results in array a[]
print $2,a[2] # print desired items to stdout
}
' test.csv
This generates:
20221208195546466,2
20230124123456789,7
You can use awk for this:
awk -F',' '{gsub(/Above as:/,""); gsub(/\|.*/, ""); print($2, $5)}'
Probably need to adopt regexp a bit.
You might change : to , and | to , then extract 2nd and 6th field using cut following way, let file.txt content be
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
then
tr ':|' ',,' < file.txt | cut --delimiter=',' --output-delimiter=' , ' --fields=2,6
gives output
20221208195546466 , 2
Explanation: tr translates i.e. replace : using , and replace | using , then I inform cut that delimiter in input is , output delimiter is , encased in spaces (as stipulated by your desired output) and want 2th and 6th column (not 5th, as it is now Above as)
(tested using GNU coreutils 8.30)

How to edit the lines in text file in Linux - format the date to YYYY-MM-DD and then grep the line by time period

Can anyone help to format this text file(YYYYMMDD) as a date formatted(YYYY-MM-DD) text file using bash script or in Linux command line? I am not sure how to start editing 23millon lines!!!
I have YYYYMMDD format textfile :-
3515034013|50008|20140601|20240730
and I want to edit like YYYY-MM-DD formatted text file(Only 3rd and 4th fields need to be changed for 23million lines):-
3515034013|50008|2014-06-01|2024-07-30
I Want to convert from YYYYMMDD formatted text file to the YYYY-DD-MM format and I want to get specific lines from the text file based on the time period after this file manipulation which is the end goal.
The end goal is to format the 3rd field and 4th field as YYYY-MM-DD and also want to grep the line by date from that formatted text file:- 03rd field is the start date and the 04th field is the end date Let's say for example I need,
(01). The end date(04th field) before today i.e 2022-08-06 - all the old lines
(02). The end date(04th field) is 2 years from now i.e lines in between 2022-08-06th <-> 2024-08-06th?
Please note:- There are more than a 23million lines to edit and analyze based on the date.
How to approach this problem statement? which method is time efficient awk or sed or Bash line-by-line editing?
$ awk '
BEGIN { FS=OFS="|" }
{
for ( i=3; i<=4; i++ ) {
$i = substr($i,1,4) "-" substr($i,5,2) "-" substr($i,7)
}
print
}
' file
3515034013|50008|2014-06-01|2024-07-30
Here is a way to do it with sed. It has the same restrictions as steffens answer: | as fieldseparator and that all dates have the same format i.e. leading zeros in the month and date part.
sed -E 's/^(.*[|])([0-9]{4})([0-9]{2})([0-9]{2})[|]([0-9]{4})([0-9]{2})([0-9]{2})$/\1\2-\3-\4|\5-\6-\7/g'
Here is what the regular expression does:
^(.*[|]) captures the first part of the string from linestart (^) to a | into \1, this captures the first two columns, because the remaining part of the re matches the remaining part of the line up until lineend!
([0-9]{4})([0-9]{2})([0-9]{2})[|] captures the first date field parts into \2 to \4, notice the [|]
([0-9]{4})([0-9]{2})([0-9]{2})$ does the same for the second date column anchored at lineend ($) and captures the parts into \5 to \7, notice the $
the replacement part \1\2-\3-\4|\5-\6-\7 inserts - at the different places
the capturing into \n happens because of the use of (...) parens in the regular expression.
Here's one way to change the format with awk:
awk '{$3=substr($3,1,4) "-" substr($3,5,2) "-" substr($3,7,2); $4=substr($4,1,4) "-" substr($4,5,2) "-" substr($4,7,2); print}' FS='|' OFS='|'
It should work given that
| is only used for field separation
all dates have the same format
You can pipe the transformed lines to a new file or change it in place. Of course you can do the same with sed or ed. I'd go for awk because you'd be able to extract your specific lines just in the same run to an extra file.
This might work for you (GNU sed):
sed -E 's/^([^|]*\|[^|]*\|....)(..)(..\|....)(..)/\1-\2-\3-\4-/' file
Pattern match and insert - where desired.
Or if the file is only 4 columns:
sed -E 's/(..)(..\|....)(..)(..)$/-\1-\2-\3-\4/' file

How to get Date Month values in linux date command as integers to work on

I want to convert date and month as integers.
for example.
if the current date as per the command "Date +%m-%d-%y" output, is this
09-11-17
Then I am storing
cur_day=`date +%d`
cur_month=`date +%m`
the $cur_day will give me 11 and $cur_month will give me 09.
I want to do some operations on the month as 09. like i want to print all the numbers up to 09.
like this 01,02,03,04,05,06,07,08,09
Same way I want to display all the numbers up to cur_day
like 01,02,03,04,05,06,07,08,09,10,11
Please tell me how can i do it.
Thanks in Advance.
For months:
$ printf ',%02d' $(seq 1 $(date +%m)) | sed 's/,/like this /; s/$/\n/'
like this 01,02,03,04,05,06,07,08,09
For days:
$ printf ',%02d' $(seq 1 $(date +%d)) | sed 's/,/like /; s/$/\n/'
like 01,02,03,04,05,06,07,08,09,10,11
printf will print according to a format. In this case, the format ,%02d formats the numbers with commas and leading zeros.
The sed command puts the string you want at the beginning of the line and adds a newline at the end.

Linux Script to find string containing specific formatting & manipulate the data

I need to create a linux script to search for lines in a file that are formatted like this:
text:text:text:text:number:number
so 6 text/number strings divided by 5 semicolon
For example:
2f0d:011a0000:07f8:0002:1:0
I want to treat the semicolon as column divider
e.g.
Column1:Column2:Column3:Column4:Column5:Column6
I then want to rearrange the data like so:
Column1:Column3:Column4:Column2 discarding column5 & column6
For example:
2f0d:07f8:0002:011a0000
I then want to replace semicolon with underscore, remove leading Zeros from each column & convert to UPERCASE
For example:
2F0D_7F8_2_11A0000
End Result
in file1, an entry like this
2f0d:011a0000:07f8:0002:1:0
E4+1
p:BSkyB,C:0000
will be converted to this:
2F0D_7F8_2_11A0000
E4+1
p:BSkyB,C:0000
Please note also, there are 100's if not 1000s of these 3 line entries in file1
kent$ awk -F: -v OFS="_" 'NF==6{for(i=1;i<=4;i++){sub(/^0*/,"",$i);$i=toupper($i)};print $1,$3,$4,$2;next}7' file
2F0D_7F8_2_11A0000
E4+1
p:BSkyB,C:0000
you may want to know that, in awk:
sub(pat, rep,input) will do replacement;
toupper(string) will change string into upper case (yes, there is tolower() too)
print $1,$2 will print col1 and col2 separated by OFS
the command much more important than the above one-liner:
man gawk
a solution using sed:
sed -r 's/^0*([a-f0-9]+):0*([a-f0-9]+):0*([a-f0-9]+):0*([a-f0-9]+):[a-f0-9]+:[a-f0-9]+$/\1_\3_\4_\2/'
see DEMO
With sed:
sed -r 's/^0*([[:alnum:]]+):0*([[:alnum:]]+):0*([[:alnum:]]+):0*([[:alnum:]]+):0*([[:digit:]]+):0*([[:digit:]]+)$/\U\1_\3_\4_\2/' foo

awk: format date string from YYYYMMDD to YYYY-MM-DD

I have a CSV file which I parse using awk because I don't need all columns.
The problem I have is that one column is a date but in the format YYYYMMDD but I need it in YYYY-MM-DD and I don't know how to achieve that.
I already tried with split($27, a) but it doesn't split it - so a[0] returns the whole string.
Use your awk output as input to date -d, e.g.
$ date -d 20140918 +'%Y-%m-%d'
2014-09-18
You could use substr:
printf "%s-%s-%s", substr($27,0,4), substr($27,5,2), substr($27,7,2)
Assuming that the 27th field was 20140318, this would produce 2014-03-18.

Resources