Parsing file for a number on UNIX

Parsing file for a number on UNIX - string

Looking for a better way of extracting a number from the last line in a file.
Sample content:
# Provider added by blah
#
security.provider.3=org.bouncycastle145.jce.provider.BouncyCastleProvider
#
# Provider added by blah
#
security.provider.4=org.bouncycastle145.jce.provider.BouncyCastleProvider
#
# Provider added by blah
#
security.provider.79=org.bouncycastle145.jce.provider.BouncyCastleProvider
I would like to parse the last line in the file and return the number after:
security.provider.
This is what I'm using and it seems to only fine the first digit after:
security.provider.:
tail -1 filename | cut -c19
I know I can use:
tail -1 filename | cut -c19,20
but I wouldn't know if the number is a single digit or double, etc.

You can use sed as:
tail -1 file | sed -r 's/security\.provider\.([0-9]+).*/\1/'

You can do that using just a sed one liner:
sed -ne '$s/security\.provider\.\([0-9]\+\).*/\1/p' <file>

Assuming your final line is always exactly "security.provider.xxx=" and you want xxx:
$ sed -n \$p file | cut -d= -f 1 | cut -d. -f 3

Related

Searching specific lines of files using GREP

I have a directory with many text files. I want to search a given string in specific lines in the files(like searching for 'abc' in only 2nd and 3rd line of each file). Then When I find A match I want to print line 1 of the matching file.
My Approach - I'm doing a grep search with -n option and storing the output in a different file and then searching that file for the line number. Then I'm trying to get the file name and then print out it's first line.
Using the approach I mentioned above I'm not able to get the file name of the right file and even if I get that this approach is very lengthy.
Is there a better and fast solution to this?
Eg.
1.txt
file 1
one
two
2.txt
file 2
two
three
I want to search for "two" in line 2 of each file using grep and then print the first line of the file with match. In this example that would be 2.txt and the output should be "file 2"
I know it is easier using sed/awk but is there any way to do this using grep?

Use sed instead (GNU sed):
parse.sed
1h # Save the first line to hold space
2,3 { # On lines 2 and 3
/my pattern/ { # Match `my pattern`
x # If there is a match bring back the first line
p # and print it
:a; n; ba # Loop to the end of the file
}
}
Run it like this:
sed -snf parse.sed file1 file2 ...
Or as a one-liner:
sed -sn '1h; 2,3 { /my pattern/ { x; p; :a; n; ba; } }' file1 file2 ...
You might want to emit the filename as well, e.g. with your example data:
parse2.sed
1h # Save the first line to hold space
2,3 { # On lines 2 and 3
/two/ { # Match `my pattern`
F # Output the filename of the file currently being processed
x # If there is a match bring back the first line
p # and print it
:a; n; ba # Loop to the end of the file
}
}
Run it like this:
sed -snf parse2.sed file1 file2 | paste -d: - -
Output:
file1:file 1
file2:file 2

$ awk 'FNR==2{if(/one/) print line; nextfile} FNR==1{line=$0}' 1.txt 2.txt
file 1
$ awk 'FNR==2{if(/two/) print line; nextfile} FNR==1{line=$0}' 1.txt 2.txt
file 2
FNR will have line number for the current file being read
use FNR>=2 && FNR<=3 if you need a range of lines
FNR==1{line=$0} will save the contents of first line for future use
nextfile should be supported by most implementations, but the solution will still work (slower though) if you need to remove it

With grep and bash:
# Grep for a pattern and print filename and line number
grep -Hn one file[12] |
# Loop over matches where f=filename, n=match-line-number and s=matched-line
while IFS=: read f n s; do
# If match was on line 2 or line 3
# print the first line of the file
(( n == 2 || n == 3 )) && head -n1 $f
done
Output:
file 1

Only using grep, cut and | (pipe):
grep -rnw pattern dir | grep ":line_num:" | cut -d':' -f 1
Explanation
grep -rnw pattern dir
It return name of the file(s) where the pattern was found along with the line number.
It's output will be somthing like this
path/to/file/file1(.txt):8:some pattern 1
path/to/file/file2(.txt):4:some pattern 2
path/to/file/file3(.txt):2:some pattern 3
Now I'm using another grep to get the file with the right line number (for e.g. file that contains the pattern in line 2)
grep -rnw pattern dir | grep ":2:"
It's output will be
path/to/file/file3(.txt):2:line
Now I'm using cut to get the filename
grep -rnw pattern dir | grep ":2:" | cut -d':' -f 1
It will output the file name like this
path/to/file/file3(.txt)
P.S. - If you want to remove the "path/to/file/" from the filename you can use rev then cut and again rev, you can try this yourself or see the code below.
grep -rnw pattern dir | grep ":2:" | cut -d':' -f 1 | rev | cut -d'/' -f 1 | rev

Fetch latest matching string value

I have a file which contains two values for initial... keyword. I want to grab the latest date for matching initial... string. After getting the date I also need to format the date by replacing / with -
---other data
INFO | abc 1 | 2018/01/04 20:04:35 | initial...
INFO | abc 1 | 2018/02/05 17:01:42 | INFO | new| InitialLauncher | c.t.s.s.setup.launch | initial...
---other data
In the above example, my output should be 2018-02-05. Here, I am fetching the line which contains initial... value and only getting the line with latest date value. Then, I need to strip out the remaining string and fetch only the date value.
I am using the following grep but it is not yet as per the requirement.
grep -q -iF "initial..." /tmp/file.log

Using the knowledge that later dates appear later in the file, it's only necessary to print the date from the last line containing initial....
First step (drop the -q from grep — you don't want it to be quiet):
grep -iF 'initial...' /tmp/file.log |
tail -n 1 |
sed -e 's/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/' -e 's%/%-%g'
The (first) s/// command matches a series of non-pipes followed by a pipe, another series of non-pipes followed by a pipe, a blank, then captures a series of non-blanks, and finally matches a blank and anything; it replaces all that with just the captured string, which is the date field after the second pipe on the input line. The (second) s%%% command replaces slashes with dashes, using % to avoid the confusion that the equivalent s/\//-/g might engender, thereby reformatting the date in ISO 8601-style format.
But we can lose the tail with:
grep -iF 'initial...' /tmp/file.log |
sed -n -e '$ { s/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/; s%/%-%gp; }'
The -n suppresses normal output; the $ matches only the last line; the p after the second s/// operation prints the result.
The case-insensitive fixed-pattern search is more conveniently written in grep than in sed. Although it could be done in a single sed command, you have to work fairly hard, saving matching rows in the hold space, then swapping the hold and pattern space at the end, and doing the substitution and printing:
sed -n \
-e '/[Ii][Nn][Ii][Tt][Ii][Aa][Ll]\.\.\./h' \
-e '$ { x; s/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/; s%/%-%gp; }' /tmp/file.log
Each of these produces the output 2018-02-05 on the sample data. If fed an input with no initial... in it, they output nothing.

Grep for only (-o) the string you want, sort it, and cut for the first word:
grep -o '2[0-9]\{3\}/[0-9][0-9]/[0-9][0-9] [0-2][0-9]:[0-5][0-9]:[0-9][0-9] .* | initial' file.txt | sort | cut -d' ' -f1 | tai -1

something like this...
$ awk -F'|' '$NF~/initial\.\.\./ {if(max<$3) max=$3}
END {gsub("/","-",max);
split(max,dt," "); print dt[1]}' file

Reverse file using tac and sed

I have a usecase where I need to search and replace the last occurrence of a string in a file and write the changes back to the file. The case below is a simplified version of that usecase:
I'm attempting to reverse the file, make some changes reverse it back again and write to the file. I've tried the following snippet for this:
tac test | sed s/a/b/ | sed -i '1!G;h;$!d' test
test is a text file with contents:
a
1
2
3
4
5
I was expecting this command to make no changes to the order of the file, but it has actually reversed the contents to:
5
4
3
2
1
b
How can i make the substitution as well as retain the order of the file?

You can tac your file, apply substitution on first occurrence of desired pattern, tac again and tee result to a temporary file before you rename it with the original name:
tac file | sed '0,/a/{s//b/}' | tac > tmp && mv tmp file

Another way is to user grep to get the number of the last line that contains the text you want to change, then use sed to change that line:
$ linno=$( grep -n 'abc' <file> | tail -1 | cut -d: -f1 )
$ sed -i "${linno}s/abc/def/" <file>

Try to cat test | rev | sed -i '1!G;h;$!d' | rev
Or you can use only sed coomand:
For example you want to replace ABC on DEF:
You need to add 'g' to the end of your sed:
sed -e 's/\(.*\)ABC/\1DEF/g'
This tells sed to replace every occurrence of your regex ("globally") instead of only the first occurrence.
You should also add a $, if you want to ensure that it is replacing the last occurrence of ABC on the line:
sed -e 's/\(.*\)ABC$/\1DEF/g'
EDIT
Or simply add another | tac to your command:
tac test | sed s/a/b/ | sed -i '1!G;h;$!d' | tac

Here is a way to do this in a single command using awk.
First input file:
cat file
a
1
2
3
4
a
5
Now this awk command:
awk '{a[i++]=$0} END{p=i; while(i--) if (sub(/a/, "b", a[i])) break;
for(i=0; i<p; i++) print a[i]}' file
a
1
2
3
4
b
5
To save output back into original file use:
awk '{a[i++]=$0} END{p=i; while(i--) if (sub(/a/, "b", a[i])) break;
for(i=0; i<p; i++) print a[i]}' file >> $$.tmp && mv $$.tmp f

Another in awk. First a test file:
$ cat file
a
1
a
2
a
and solution:
$ awk '
$0=="a" && NR>1 { # when we meet "a"
print b; b="" # output and clear buffer b
}
{
b=b (b==""?"":ORS) $0 # gether the buffer
}
END { # in the end
sub(/^a/,"b",b) # replace the leading "a" in buffer b with "b"
print b # output buffer
}' file
a
1
a
2
b
Writing back the happens by redirecting the output to a temp file which replaces the original file (awk ... file > tmp && mv tmp file) or if you are using GNU awk v. 4.1.0+ you can use inplace edit (awk -i inplace ...).

Append all files to one single file in unix and rename the output file with part of first and last filenames

For example, I have below log files from the 16th-20th of Feb 2015. Now I want to create a single file named, mainentrywatcherReport_2015-02-16_2015-02-20.log. So in other words, I want to extract the date format from the first and last file of week (Mon-Fri) and create one output file every Saturday. I will be using cron to trigger the script every Saturday.
$ ls -l
mainentrywatcher_2015-02-16.log
mainentrywatcher_2015-02-17.log
mainentrywatcher_2015-02-18.log
mainentrywatcher_2015-02-19.log
mainentrywatcher_2015-02-20.log
$ cat *.log >> mainentrywatcherReport_2015-02-16_2015-02-20.log
$ mv *.log archive/
Can anybody help on how to rename the output file to above format?

Perhaps try this:
parta=`ls -l | head -n1 | cut -d'_' -f2 | cut -d'.' -f1`
partb=`ls -l | head -n5 | cut -d'_' -f2 | cut -d'.' -f1`
filename=mainentrywatcherReport_${parta}_${partb}.log
cat *.log >> ${filename}
"ls -l" output is described in the question
"head -nX" takes the Xth line of the output
"cut -d'_' -f2" takes everything (that remains) after the first underscore
"cut -d'.' -f1" times everything (that remains) before the first period
both commands are surrounded by ` marks (above tilde ~) to capture the output of the command to a variable
file name assembles the two dates stripped of the unnecessary with the other formatting desired for the final file name.
the cat command demonstrates one possible way to use the resulting filename
Happy coding! Leave a comment if you have any questions.

You can try this if you want to introduce simple looping...
FROM=ls -lrt mainentrywatcher_* | awk '{print $9}' | head -1 | cut -d"_" -f2 | cut -d"." -f1
TO=ls -lrt mainentrywatcher_* | awk '{print $9}' | tail -1 | cut -d"_" -f2 | cut -d"." -f1
FINAL_LOG=mainentrywatcherReport_${FROM}_${TO}.log
for i in ls -lrt mainentrywatcher_* | awk '{print $9}'
do
cat $i >> $FINAL_LOG
done
echo "All Logs Stored in $FINAL_LOG"

Another approach given your daily files and test contents as follows:
mainentrywatcher_2015-02-16.log -> a
mainentrywatcher_2015-02-17.log -> b
mainentrywatcher_2015-02-18.log -> c
mainentrywatcher_2015-02-19.log -> d
mainentrywatcher_2015-02-20.log -> e
That utilizes bash parameter expansion/substring extraction would be a simple loop:
#!/bin/bash
declare -i cnt=0 # simple counter to determine begin
for i in mainentrywatcher_2015-02-*; do # loop through each matching file
tmp=${i//*_/} # isolate date
tmp=${tmp//.*/}
[ $cnt -eq 0 ] && begin=$tmp || end=$tmp # assign first to begin, last to end
((cnt++)) # increment counter
done
cmbfname="${i//_*/}_${begin}_${end}.log" # form the combined logfile name
cat ${i//_*/}* > $cmbfname # cat all into combined name
## print out begin/end/cmbfname & contents to verify
printf "\nbegin: %s\nend : %s\nfname: %s\n\n" $begin $end $cmbfname
printf "contents: %s\n\n" $cmbfname
cat $cmbfname
exit 0
use/output:
alchemy:~/scr/tmp/stack/tmp> bash weekly.sh
begin: 2015-02-16
end : 2015-02-20
fname: mainentrywatcher_2015-02-16_2015-02-20.log
contents: mainentrywatcher_2015-02-16_2015-02-20.log
a
b
c
d
e
You can, of course, modify the for loop to accept a positional parameter containing the partial filename and pass the partial file name from the command line.

Something like this:
#!/bin/sh
LOGS="`echo mainentrywatcher_2[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].log`"
HEAD=
TAIL=
for logs in $LOGS
do
TAIL=`echo $logs | sed -e 's/^.*mainentrywatcher_//' -e 's/\.log$//'`
test -z "$HEAD" && HEAD=$TAIL
done
cat $LOGS >mainentrywatcherReport_${HEAD}_${TAIL}.log
mv $LOGS archive/
That is:
get a list of the existing logs (which happen to be sorted) in a variable $LOGS
walk through the list, getting just the date according to the example
save the first date as $HEAD
save the last date as $TAIL
after the loop, cat all of those files into the new output file
move the used-up log-files into the archive directory.

return all lines that match String1 in a file after the last matching String2 in the same file

I figured out how to get the line number of the last matching word in the file :
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
It gave me the value of 1787. So, I passed it manually to the sed command to search for the lines that contains the sentence "blades are down" after that line number and it returned all the lines successfully
sed -n '1787,$s/blades are down/&/p' myfile.txt
Is there a way that I can pass the line number from the first command to the second one through a variable or a file so I can but them in the script to be executed automatically ?
Thank you.

You can do this by just connecting your two commands with xargs. 'xargs -I %' allows you to take the stdin from a previous command and place it whenever you want in the next command. The '%' is where your '1787' will be written:
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/blades are down/&/p' myfile.txt

You can use:
command substitution to capture the result of the first command in a variable.
simple string concatenation to use the variable in your sed comand
startLine=$(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)
sed -n ${startLine}',$s/blades are down/&/p' myfile.txt
You don't strictly need the intermediate variable - you could simply use:
sed $(grep -n ' b ' textfile.txt | tail -1 | cut -d ':' -f1)',$s/blades are down/&/p' myfile.txt`
but it may make sense to do error checking on the result of the command substitution first.
Note that I've streamlined the first command by using grep's -n option, which puts the line number separated with : before each match.

First we can get "half" of the file after the last match of string2, then you can use grep to match all the string1
tac your_file | awk '{ if (match($0, "string2")) { exit ; } else {print;} }' | \
grep "string1"
but the order is reversed if you don't care about the order. But if you do care, just add another tac at the end with a pipe |.

This might work for you (GNU sed):
sed -n '/\n/ba;/ b /h;//!H;$!d;x;//!d;s/$/\n/;:a;/\`.*blades are down.*$/MP;D' file
This reads through the file storing all lines following the last match of the first string (" b ") in the hold space.
At the end of file, it swaps to the hold space, checks that it does indeed have at least one match, then prints out those lines that match the second string ("blades are down").
N.B. it makes the end case (/\n/) possible by adding a new line to the end of the hold space, which will eventually be thrown away. This also caters for the last line edge condition.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parsing file for a number on UNIX - string

You can use sed as: tail -1 file | sed -r 's/security\.provider\.([0-9]+).*/\1/'

You can do that using just a sed one liner: sed -ne '$s/security\.provider\.\([0-9]\+\).*/\1/p' <file>

Assuming your final line is always exactly "security.provider.xxx=" and you want xxx: $ sed -n \$p file | cut -d= -f 1 | cut -d. -f 3

Related

Searching specific lines of files using GREP

Fetch latest matching string value

Reverse file using tac and sed

Append all files to one single file in unix and rename the output file with part of first and last filenames

return all lines that match String1 in a file after the last matching String2 in the same file

Categories

Resources