Fetch latest matching string value - linux

I have a file which contains two values for initial... keyword. I want to grab the latest date for matching initial... string. After getting the date I also need to format the date by replacing / with -
---other data
INFO | abc 1 | 2018/01/04 20:04:35 | initial...
INFO | abc 1 | 2018/02/05 17:01:42 | INFO | new| InitialLauncher | c.t.s.s.setup.launch | initial...
---other data
In the above example, my output should be 2018-02-05. Here, I am fetching the line which contains initial... value and only getting the line with latest date value. Then, I need to strip out the remaining string and fetch only the date value.
I am using the following grep but it is not yet as per the requirement.
grep -q -iF "initial..." /tmp/file.log

Using the knowledge that later dates appear later in the file, it's only necessary to print the date from the last line containing initial....
First step (drop the -q from grep — you don't want it to be quiet):
grep -iF 'initial...' /tmp/file.log |
tail -n 1 |
sed -e 's/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/' -e 's%/%-%g'
The (first) s/// command matches a series of non-pipes followed by a pipe, another series of non-pipes followed by a pipe, a blank, then captures a series of non-blanks, and finally matches a blank and anything; it replaces all that with just the captured string, which is the date field after the second pipe on the input line. The (second) s%%% command replaces slashes with dashes, using % to avoid the confusion that the equivalent s/\//-/g might engender, thereby reformatting the date in ISO 8601-style format.
But we can lose the tail with:
grep -iF 'initial...' /tmp/file.log |
sed -n -e '$ { s/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/; s%/%-%gp; }'
The -n suppresses normal output; the $ matches only the last line; the p after the second s/// operation prints the result.
The case-insensitive fixed-pattern search is more conveniently written in grep than in sed. Although it could be done in a single sed command, you have to work fairly hard, saving matching rows in the hold space, then swapping the hold and pattern space at the end, and doing the substitution and printing:
sed -n \
-e '/[Ii][Nn][Ii][Tt][Ii][Aa][Ll]\.\.\./h' \
-e '$ { x; s/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/; s%/%-%gp; }' /tmp/file.log
Each of these produces the output 2018-02-05 on the sample data. If fed an input with no initial... in it, they output nothing.

Grep for only (-o) the string you want, sort it, and cut for the first word:
grep -o '2[0-9]\{3\}/[0-9][0-9]/[0-9][0-9] [0-2][0-9]:[0-5][0-9]:[0-9][0-9] .* | initial' file.txt | sort | cut -d' ' -f1 | tai -1

something like this...
$ awk -F'|' '$NF~/initial\.\.\./ {if(max<$3) max=$3}
END {gsub("/","-",max);
split(max,dt," "); print dt[1]}' file

Related

How can I simplify this script?

Can you help me simplify this script?
This works but I think that there is a easier way to do it, but I can't find it.
The file:
Car Brand:Mercedes | Country:Germany | Car Model:300 SL | Year:04-1960
Car Brand:Lamborghini | Country:Italy | Car Model:Miura | Year:10-1970
Car Brand:Aston Martin | Country:UK | Car Model:DBS | Year:12-1965
Car Brand:Ford | Country:United States of America | Car Model:GT40 | Year:09-1966
Output:
1:Mercedes:Germany:300 SL:61:xxx
2:Lamborghini:Italy:Miura:51:xxx
3:Aston Martin:UK:DBS:56:xxx
4:Ford:United States of America:GT40:55:xxx
1,2,3,4 is the number of the line; 61, 52, 56, 55 (current year - year, ignoring the month), xxx insurance company (always the same, this part stopped working)
Script:
line=$(awk '{print NR}' file.txt)
brand=$(sed 's/.*Brand:\(.*\) | Country.*/\1/' file.txt)
country=$(sed 's/.*Country:\(.*\) | Year.*/\1/' file.txt)
sed 's/.*Year:\(.*\) | Car.*/\1/; s/^...//' file.txt > cars.txt
age=$(awk -v age="$(date +%Y)" '{print age - $1}' cars.txt)
model=$(sed 's/.*Model:\(.*\)*/\1/' file.txt)
echo "$(paste <(echo "$line") <(echo "$brand") <(echo "$country") <(echo "$age") <(echo "$model") -d ':')" > cars.txt
# sed -i 's/$/:xxx/' cars.txt
cat cars.txt
Thank you
Assuming there is no dash - elsewhere apart from last item, you can do :
awk -v year="$(date +%Y)" -F '(-|:| \\| )' '{print NR":"$2":"$4":"$6":"(year-$9)":xxx"}' file.txt
-F take three field separators, - : and |
Pipe | (a single character) is the separator for the 3 regular expressions. |(a space followed by |, followed by another space) is one of the separator, to distinguish the pipe in your data file from the pipe as regex separator, we need to escape it with \\.
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS predefined variable).
For more inforamtion : https://www.gnu.org/software/gawk/manual/gawk.html#Regexp-Field-Splitting
How about this:
sed 's/ *|[^:]*: */:/g' file.txt |
awk -F: -v OFS=: -v year="$(date +%Y)" '{$1=NR; sub("^.*-","",$NF); $NF=year-$NF; print $0, "xxx"}'
Explanation: the sed command replaced all of the "| Fieldlabel:" bits with just ":", giving lines like this:
Car Brand:Mercedes:Germany:300 SL:04-1960
The awk command then splits it into colon-delimited fields, replaces the first one with the line number, removes the month from the last one (the date) and subtracts it from the current year, and finally it's printed with an extra fixed field added at the end.
This might work for you (GNU sed):
sed -E 's/^/ | /;s/ | [^:]*//g;s/(.*:)..(.*)/\1$(($(date +%Y)\2)):xxx/;=' file |
sed 'N;s/\n//;s/.*/echo "&"/e'
Prepend a pipe delimiter in readiness for the line number to be prepended later.
Globally remove text between the pipe delimiter and the next occurrence of the : character.
Replace the last field (date) with a bash expression that calculates the years difference from the current year and also append a dummy field xxx.
Prepend the current line number to the output.
Pass the contents of the result to a second sed invocation that combines the line number with the contents of that line and evaluates the bash expression by means of the prepended echo command.

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311
Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"
Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}
I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.
Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

Command 'cut' doesn't show last column CSV

I've created a CSV from shell. Then I need to filter the information by column. I used this command:
$cut -d ';' -f 12,22 big_file.csv
The input looks like:
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;2;000008608008602;AAAAAAAAAAA;0051;;;;;;093505;
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;3;000008608008602;AAAAAAAAAAA;0051;;;;;;085000;anl#mail.com
The output is:
ID CLIENT;email
00000xxxxxxxxx
00000000xxxxxx;anl#mail.com
As you can see, the last column does not appear (note, that the semicolon is missing in the first line). I want this:
ID CLIENT;email
00000xxxxxxxxx;
00000000xxxxxx;anl#mail.com
I have another CSV file with information and it works. I've reviewed the csv and the columns exist.
There doesn't seem to be a way to make cut do this. The next step up in expressivity is awk, which does it easily:
$ cat testfile
one;two;three;four
1;2;3
first;second
only
$ awk -F';' '{ OFS=FS; print $1, $3 }' < testfile
one;three
1;3
first;
only;
$
You don't get the semicolon in the output of your second line, because your second line contains just 21 fields (the first contains 23 fields).
You can check that using:
(cat bigfile.csv | tr -d -c ";\n" ; echo "1234567890123456789012") | cat -n | grep -v -E ";{22}"
This will output all lines from bigfile.txt with less than 22 semicolons along with the corresponding line numbers.
To fix that, you can add a bunch of empty fields at the end of each line and pipe the result to cut like this:
sed -e's|^\(.*\)|\1;;;;;;;;;;;;;;;;;;;;;;;;|g' bigfile.csv | cut -d ';' -f 12,22 | cut -d ';' -f 12,22
The result is:
XXXXXXXXYYY;XXXNNN
XXXXYYYYXXXXX;

Using sed to fetch date

I have a file which contains two values for abc... keyword. I want to grab the latest date for matching abc... string. After getting the date I also need to format the date by replacing / with -
---other data
2018/01/15 01:56:14.944+0000 INFO newagent.bridge BridgeTLSAssetector::setupACBContext() - abc...
2018/02/14 01:56:14.944+0000 INFO newagent.bridge BridgeTLSAssetector::setupACBContext() - abc...
---other data
In the above example, my output should be 2018-02-14. Here, I am fetching the line which contains abc... value and only getting the line with latest date value. Then, I need to strip out the remaining string and fetch only the date value.
I am using the following sed but it is not working
grep -iF "abc..." file.txt | tail -n 1 | sed -e 's/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/' -e 's%/%-%g'
With awk:
$ awk '/abc\.\.\./{d=$1} END{gsub("/", "-", d); print d}' file.txt
2018-2-14
Something with sed:
tac file.txt | grep -Fi 'abc...' | sed 's/ .*//;s~/~-~g;q'
This does what you want:
grep -iF "abc..." file.txt | tail -n 1 | awk '{print $1}' | sed 's#/#-#g'
Outputs this:
2018-02-14
Since you asked for sed -
$: sed -nE ' / abc[.]{3}/x; $ { x; s! .*!!; s!/([0-9])/!/0\1/!g; s!/([0-9])$!/0\1!g; s!/!-!g; p; }' in
2018-02-14
arguments
-n says don't print by default
-E says use extended regexes
the script
/ abc[.]{3}/x; say on each line with abc... swap the line for the buffer
$ { x; s! .*!!; s!/([0-9])/!/0\1/!g; s!/([0-9])$!/0\1!g; s!/!-!g; p; } says on the LAST line($) do the set of commands inside the {}.
x swaps the buffer to get the last saved record back.
s! .*!!; deletes everything from the first space (after the date)
s!/([0-9])/!/0\1/!g; adds a zero to the month if needed
s!/([0-9])$!/0\1!g; adds a zero to the day if needed
s!/!-!g; converts the /'s to dashes
p prints the resulting record.
When you use sed for matching a part of the date, you can have it match year. month, date and abc... in one command.
sed -rn 's#([0-9]{4})/([0-9]{2})/([0-9]{2}).*abc[.]{3}.*#\1-\2-\3#p' file.txt | tail -1
Easy and more simple try this.
cat filename.txt | grep 'abc' | awk -F' ' '{print $1}'
As pattern abc always fix as per the given logs. So this will be more easier way to get desire output.

return all lines that match String1 in a file after the last matching String2 in the same file

I figured out how to get the line number of the last matching word in the file :
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
It gave me the value of 1787. So, I passed it manually to the sed command to search for the lines that contains the sentence "blades are down" after that line number and it returned all the lines successfully
sed -n '1787,$s/blades are down/&/p' myfile.txt
Is there a way that I can pass the line number from the first command to the second one through a variable or a file so I can but them in the script to be executed automatically ?
Thank you.
You can do this by just connecting your two commands with xargs. 'xargs -I %' allows you to take the stdin from a previous command and place it whenever you want in the next command. The '%' is where your '1787' will be written:
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/blades are down/&/p' myfile.txt
You can use:
command substitution to capture the result of the first command in a variable.
simple string concatenation to use the variable in your sed comand
startLine=$(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)
sed -n ${startLine}',$s/blades are down/&/p' myfile.txt
You don't strictly need the intermediate variable - you could simply use:
sed $(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)',$s/blades are down/&/p' myfile.txt`
but it may make sense to do error checking on the result of the command substitution first.
Note that I've streamlined the first command by using grep's -n option, which puts the line number separated with : before each match.
First we can get "half" of the file after the last match of string2, then you can use grep to match all the string1
tac your_file | awk '{ if (match($0, "string2")) { exit ; } else {print;} }' | \
grep "string1"
but the order is reversed if you don't care about the order. But if you do care, just add another tac at the end with a pipe |.
This might work for you (GNU sed):
sed -n '/\n/ba;/ b /h;//!H;$!d;x;//!d;s/$/\n/;:a;/\`.*blades are down.*$/MP;D' file
This reads through the file storing all lines following the last match of the first string (" b ") in the hold space.
At the end of file, it swaps to the hold space, checks that it does indeed have at least one match, then prints out those lines that match the second string ("blades are down").
N.B. it makes the end case (/\n/) possible by adding a new line to the end of the hold space, which will eventually be thrown away. This also caters for the last line edge condition.

Resources