converting 4 digit year to 2 digit in shell script - linux

I have file as:
$cat file.txt
1981080512 14 15
2019050612 17 18
2020040912 19 95
Here the 1st column represents dates as YYYYMMDDHH
I would like to write the dates as YYMMDDHH. So the desire output is:
81080512 14 15
19050612 17 18
20040912 19 95
My script:
while read -r x;do
yy=$(echo $x | awk '{print substr($0,3,2)}')
mm=$(echo $x | awk '{print substr($0,5,2)}')
dd=$(echo $x | awk '{print substr($0,7,2)}')
hh=$(echo $x | awk '{print substr($0,9,2)}')
awk '{printf "%10s%4s%4s\n",'$yy$mm$dd$hh',$2,$3}'
done < file.txt
It is printing
81080512 14 15
81080512 17 18
Any help please. Thank you.

Please don't kill me for this simple answer, but what about this:
cut -c 3- file.txt
You simply cut the first two digits by showing character 3 till the end of every line (the -c switch indicates that you need to cut characters (not bytes, ...)).

You can do it using single GNU AWK's substr as follows, let file.txt content be then
1981080512 14 15
2019050612 17 18
2020040912 19 95
then
awk '{$1=substr($1,3);print}' file.txt
output
81080512 14 15
19050612 17 18
20040912 19 95
Explanation: I used substr function to get 3rd and onward characters from 1st column and assign it back to said column, then I print such changed line.
(tested in gawk 4.2.1)

Related

how to use shell to split string into correct format?

I have this file with time duration. Some have days but mostly in hh:mm form. The entire form is dd+hh:mm
I was trying to "tr -s '+:' ':'" them into dd:hh:mm form and then split($1,tm,":")calculate them into seconds.
However, the problem I am facing is that after this operation, the form with hh:mm would have hh in tm[1] but if its dd:hh:mm then the tm[1] would be dd.
Is there a way to put the hh in form of hh:mm into tm[2] and put tm[1] to be 0 Please?
4+11:26
10+06:54
20:27
is the input
the output I wanted would be(in form of tm[1], tm[2], tm[3]):
4 11 26
10 06 54
0 20 27
I would first preprocess it with sed (to add missing 0+ in lines that don't have a plus sign) and then tr +: to spaces:
cat a.txt | sed 's/^\([^+]\+\)$/0+\1/g' | tr '+:' ' '
Or as suggested by Lars, shorter sed version:
cat a.txt | sed '/+/! s/^/0+/;' | tr '+:' ' '
awk to the rescue!
You can do the conversion and computation in awk, using your input file the values are converted to minutes
$ awk -F: '{if($1~/+/){split($1,f,"+");h=f[1]*24+f[2]}
else h=$1; m=h*60+$2; print $0 " --> " m}' file
4+11:26 --> 6446
10+06:54 --> 14814
20:27 --> 1227

add new column of a number

I am writing the following codes to extract data from an existing file using awk within a for loop.
for c in {1..300}
do
awk '{if($28==1) print $12,$26,$28}' file1.txt > file2.txt
done
This is ok and I have file2.txt
Now, I want to add a new column containing c generated from the for loop above.
I do this but it does not work.
awk '{if($28==1) print $12,$26,$28, paste c}' file1.txt > file2.txt
This only works when I replace c by a real number such as 1,2,...
Finally, I want to append file1.txt to file2.txt, i.e. every time the loop runs it will add new data to file2.txt.
I have this but it seems not the best:
awk <file2.txt>> file_final.txt
Can you please give me some advice? Thank you!
Phuong
Using the approach recommended by William below, I am able to do produce outputs that I want. Thank you very much!
rm file2.txt
for c in $(seq 1 300); do
awk '$28==1{print $12,$26,$28,c}' c=$c file1.txt >> file2.txt
done
You need to transfer the shell variable c to awk:
awk -v c="$c" '{if($28==1) print $12,$26,$28,c}' OFS=', ' file1.txt
-v c="$c" creates an awk variable, called c, which has the value of the shell variable $c.
Example
Using fake input data:
$ c=2; echo {1..28} | awk -v c="$c" '{if($28==1) print $12,$26,$28,c}' OFS=', ' file1.txt
12, 26, 1, 2
Example in a loop
Let's start with this sample file:
$ cat file1.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1
Now, let's run the awk command in a loop and look at the output:
$ for c in {1..3}; do awk -v c="$c" '{if($28==1) print $12,$26,$28,c}' OFS=', ' file1.txt; done >file2.txt
$ cat file2.txt
12, 26, 1, 1
12, 26, 1, 2
12, 26, 1, 3

extract header if pattern in a column matches

I am trying to extract and print header of a file if the pattern in that particular column matches.
Here is a example :
[user ~]$ cal |sed 's/July 2014//'
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Expected output :
if input date =31 then print the day on 31st.
Just to be clear, I cannot use date -d flag as its not supported by my OS.Probably would need awk here to crack the question.
[user ~]$ date -d 20140731 +%A
Thursday
I hope I am able to convey my question and concern clearly.
Using awk:
cal | awk -v date=31 'NR == 2 { split($0, header) } NR > 2 { for (i = 1; i <= NF; ++i) if ($i == date) { print header[NR == 3 ? i + 7 - NF : i]; exit } }'
Output:
Th
Here is a gnu awk solution:
cal | awk -v date=31 -v FIELDWIDTHS="3 3 3 3 3 3 3 3" 'NR==2 {split($0,a)} {for (i=1;i<=NF;i++) if ($i==date) print a[i]}'
Th
You set the date that you like to be displayed as a variable, so it can be change to what you like.
Or it could be written like this:
cal | awk 'NR==2 {split($0,a)} {for (i=1;i<=NF;i++) if ($i==date) print a[i]}' FIELDWIDTHS="3 3 3 3 3 3 3 3" date=31
PS FIELDWIDTH was introduced in gnu awk 2.31
Parsing the output of cal isn't really that advisable...
Can your OS's date handle -j?
date -j 073100002014 "+%a"
Thu
How is your OS at perl?
perl -MDateTime -E '$dt=DateTime->new(year=>2014,month=>7,day=>31);say $dt->day_name'
Thursday
Or, if it doesn't do perl -E, you could do
perl -MDateTime -e '$dt=DateTime->new(year=>2014,month=>7,day=>31);print $dt->day_name'
Thursday
How is your OS at php?
php -r '$jd=cal_to_jd(CAL_GREGORIAN,7,31,2014);echo(jdk($jd,2));'
Thu

Linux-Change line between two awk scripts

I have two awk scripts to run in Linux. The output of each one is in one line.
How can I separate the two output into two lines?
For example:
awk '{printf $1}' f.txt >> a.txt
awk '{printf $3}' f.txt >> a.txt
The output of the first script is:
35 56 40 28 57
And the second output is:
29 48 73 26
If I run them one after another, the output will become:
35 56 40 28 57 29 48 73 26
Is there any way to get the result to:
35 56 40 28 57
29 48 73 26
Thank you!~
Although I don't understand how you manage to get the spaces between fields the way you do it, you can add an END statement to the first script:
awk '{printf $1} END{print "\n"}'
You can also do this with a single awk command:
awk -v ORS=" " 'BEGIN{ARGV[ARGC++] = ARGV[1]; i = 1 }
NR!=FNR && FNR==1 { printf "\n"; i=3 }
{ print $i }
END { printf "\n" }' f.txt

how to subset a file - select a numbers of rows or columns

I would like to have your advice/help on how to subset a big file (millions of rows or lines).
For example,
(1)
I have big file (millions of rows, tab-delimited). I want to a subset of this file with only rows from 10000 to 100000.
(2)
I have big file (millions of columns, tab-delimited). I want to a subset of this file with only columns from 10000 to 100000.
I know there are tools like head, tail, cut, split, and awk or sed. I can use them to do simple subsetting. But, I do not know how to do this job.
Could you please give any advice? Thanks in advance.
Filtering rows is easy, for example with AWK:
cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'
Filtering columns is easier with CUT:
cat largefile | cut -d '\t' -f 10000-100000
As Rahul Dravid mentioned, cat is not a must here, and as Zsolt Botykai added you can improve performance using:
awk 'NR > 100000 { exit } NR >= 10000 && NR <= 100000' largefile
cut -d '\t' -f 10000-100000 largefile
Some different solutions:
For row ranges:
In sed :
sed -n 10000,100000p somefile.txt
For column ranges in awk:
awk -v f=10000 -v t=100000 '{ for (i=f; i<=t;i++) printf("%s%s", $i,(i==t) ? "\n" : OFS) }' details.txt
For the first problem, selecting a set of rows from a large file, piping tail to head is very simple. You want 90000 rows from largefile starting at row 10000. tail grabs the back end of largefile starting at row 10000 and then head chops off all but the first 90000 rows.
tail -n +10000 largefile | head -n 90000 -
Was beaten to it for the sed solution, so I'll post a perl dito instead.
To print selected lines.
$ seq 100 | perl -ne 'print if $. >= 10 && $. <= 20'
10
11
12
13
14
15
16
17
18
19
20
To print selective columns, use
perl -lane 'print $F[1] .. $F[3] '
-F is used in conjunction with -a, to choose the delimiter on which to split lines.
To test, use seq and paste to get generate some columns
$ seq 50 | paste - - - - -
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45
46 47 48 49 50
Lets's print everything except the first and the last column
$ seq 50 | paste - - - - - | perl -lane 'print join " ", $F[1] .. $F[3]'
2 3 4
7 8 9
12 13 14
17 18 19
22 23 24
27 28 29
32 33 34
37 38 39
42 43 44
47 48 49
In the join statement above, there is a tab, you get it by doing a ctrl-v tab.

Resources