cut a particular number from the url in linux - linux

I've a file with the below header generated by certain process
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
I want to cut just the number 8 from page=8 in the above content. How to go about it? Appreciate any help.

Try this -
$ cat f
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
$ awk -F'[&=<>]' '{for(i=1;i<=NF;i++) if($i ~ /^page$/) {print $(i+1)}}' f
2
8
If it is getting appended then you will get the last value using below awk :
$ awk -F'[&=<>]' '{for(i=1;i<=NF;i++) if($i ~ /^page$/) {kk=$(i+1)}} END{print kk}' ff
8
Limitation : Currently you have page=2 and page=8 and above command
will print the last page value.
And if you always want to print the 2nd value "8" (Added extra lines to the existing url, considering that it will keep on increasing and you always need the 2nd value then use below) -
$ cat f
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
<https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
$ awk -v k=1 -F'[&=<>]' '{for(i=1;i<=NF;i++) if(($i ~ /^page$/) && (k==2) ) {print $(i+1)} k++}' f
8

Following is an implementation using grep:
grep -Po "&page=[0-9]*" <file_name> | grep -Po "[0-9]*"
Example:
echo 'Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8000>; rel="last"' | grep -Po "&page=[0-9]*" | grep -Po "[0-9]*"

This will produces the result as expected.
echo 'Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=12345>; rel="last"' | grep -Po "&page=[0-9]*" |grep -Po "[0-9]*"| awk '2 == NR % $ct'

In awk. reverse the text, remove first [0-9]+=egap, output and rev again:
$ rev foo | awk 'sub(/[0-9]+=egap/,"")||1' |rev
Output:
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&>; rel="last"

try:
awk '{gsub(/.*page=/,"page=");sub(/>.*/,"");print}' Input_file
Simply substitute the all line with .*page= to page= which is nothing but will go till last page string(as * is a greedy regex match), so then substitute >.*(means starting from > to till end of line) with NULL, then print the line which will be page=8 or last value of the page. Off course I am considering that your Input_file is same as example shown.

awk -F'[= >]' '{print $12}' file
8
awk -F= '{split($8,a,">");print a[1]}' file
8
awk -F= '$8=="8>; rel"{print substr($8,1,1)}' file
8

The fact that a greedy regex is needed here (only the last occurrence of &page= should be matched) enables a simple sed solution:
sed -E 's/^.*&page=([0-9]+).*$/\1/' file
^.*&page= matches everything up to the last occurrence of &page on the line.
([0-9]+) matches one or more digits, and - thanks to enclosure in (...) stores the match in the 1st (and only) capture group, which the replacement string then reference as \1.
.*$ matches any remaining character on the line.
By virtue of the regex having matched the entire line, \1 therefore results in just the captured number as the output.
The above works with both GNU and BSD/macOS sed and takes advantage of modern extended regular expressions (-E), but in case you need a POSIX-compliant solution (which must use basic regular expressions and is therefore more cumbersome):
sed 's/^.*&page=\([0-9]\{1,\}\).*$/\1/' file
With GNU grep (on Linux, as requested), a single-pass grep -Po solution is also possible; like the sed solution, it relies on greedily matching up to the last &page=:
grep -Po "^.*&page=\K[0-9]+" file
-P activates support for PRCEs (Perl-compatible Regular Expressions).
-o only outputs the matching part of the line.
\K drops everything matched so far, so that what [0-9]+ matches - one or more digits - is the only output.

Related

Using sed to fetch date

I have a file which contains two values for abc... keyword. I want to grab the latest date for matching abc... string. After getting the date I also need to format the date by replacing / with -
---other data
2018/01/15 01:56:14.944+0000 INFO newagent.bridge BridgeTLSAssetector::setupACBContext() - abc...
2018/02/14 01:56:14.944+0000 INFO newagent.bridge BridgeTLSAssetector::setupACBContext() - abc...
---other data
In the above example, my output should be 2018-02-14. Here, I am fetching the line which contains abc... value and only getting the line with latest date value. Then, I need to strip out the remaining string and fetch only the date value.
I am using the following sed but it is not working
grep -iF "abc..." file.txt | tail -n 1 | sed -e 's/^[^|]*|[^|]*| *\([^ ]*\) .*/\1/' -e 's%/%-%g'
With awk:
$ awk '/abc\.\.\./{d=$1} END{gsub("/", "-", d); print d}' file.txt
2018-2-14
Something with sed:
tac file.txt | grep -Fi 'abc...' | sed 's/ .*//;s~/~-~g;q'
This does what you want:
grep -iF "abc..." file.txt | tail -n 1 | awk '{print $1}' | sed 's#/#-#g'
Outputs this:
2018-02-14
Since you asked for sed -
$: sed -nE ' / abc[.]{3}/x; $ { x; s! .*!!; s!/([0-9])/!/0\1/!g; s!/([0-9])$!/0\1!g; s!/!-!g; p; }' in
2018-02-14
arguments
-n says don't print by default
-E says use extended regexes
the script
/ abc[.]{3}/x; say on each line with abc... swap the line for the buffer
$ { x; s! .*!!; s!/([0-9])/!/0\1/!g; s!/([0-9])$!/0\1!g; s!/!-!g; p; } says on the LAST line($) do the set of commands inside the {}.
x swaps the buffer to get the last saved record back.
s! .*!!; deletes everything from the first space (after the date)
s!/([0-9])/!/0\1/!g; adds a zero to the month if needed
s!/([0-9])$!/0\1!g; adds a zero to the day if needed
s!/!-!g; converts the /'s to dashes
p prints the resulting record.
When you use sed for matching a part of the date, you can have it match year. month, date and abc... in one command.
sed -rn 's#([0-9]{4})/([0-9]{2})/([0-9]{2}).*abc[.]{3}.*#\1-\2-\3#p' file.txt | tail -1
Easy and more simple try this.
cat filename.txt | grep 'abc' | awk -F' ' '{print $1}'
As pattern abc always fix as per the given logs. So this will be more easier way to get desire output.

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1
grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'
If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

AWK--Print From End of Line till string is found

Using awk or sed, how would one print from the end of a line until (the first instance of) a string was found. For instance, if flow were the string then flow.com would be parsed from www.stackoverflow.com and similarly for www.flow.stackoverflow.com
sed is an excellent tool for simple substitutions on a single line:
sed 's/.*\(flow\)/\1/' file
try this line if it works for you:
awk -F'flow' 'NF>1{print FS$NF}' file
alternative one-liner:
awk 'sub(/.*flow/,"flow")' file
test (I added some numbers to the EOL, so that we know where did the output come from):
kent$ cat f
www.stackoverflow.com1
and similarly for 2
www.flow.stackoverflow.com3
kent$ awk -F'flow' 'NF>1{print FS$NF}' f
flow.com1
flow.com3
kent$ awk 'sub(/.*flow/,"flow")' f
flow.com1
flow.com3
note that if the string has some speical meaning (for regex) chars, like *, |, [ ... you may need to escape those.
GNU grep can do it:
grep -oP 'flow(?!.*flow).*' <<END
www.stackoverflow.com
nothing here
www.flow.stackoverflow.com
END
flow.com
flow.com
That regular expression finds "flow" where, looking ahead, "flow" is not found, and then the rest of the line.
This would also work: simpler regex but more effort:
rev filename | grep -oP '^.*?wolf' | rev

Loops with grep script

I'm asking this as a new question because people didn't seem to understand my original question.
I can figure out how to find if a word starts with a capital and is followed by 9 letters with the code:
echo "word" | grep -Eo '^[A-Z][[:alpha:]]{8}'
So that's part 1 of what I'm supposed to do. My actual script is supposed to loop through each word in a text file that is given as the first and only argument, then check if any of those words start with a capital and are 9 letters long.
I've tried:
cat textfile | grep -Eo '^[A-Z][[:alpha:]]{8}'
and
while read p
do echo $p | grep -Eo '^[A-Z][[:alpha:]]{8}'
done < $1
to no avail.
Although:
cat randomtext.txt
outputs:
The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha
so it's correctly outputting all the words in the file randomtext.txt
then why wouldn't
cat randomtext.txt | grep -Eo '^[A-Z][[:alpha:]]{8}'
work?
The problem is in the anchor. Your pattern starts with ^ which matches the beginning of a line, but the word you want to get returned is in the middle of a line. You can replace it with \b to match at a word boundary.
The words are all one after the other, but your grep expression refers to a whole row.
You ought to split the file into words:
sed -e 's/\s*\b\s*/\n/g' < file.txt | grep ...
Or maybe better, since you're only interested in alphanumeric sequences,
sed -e 's/\W\W*/\n/g' < file.txt | grep -E '^[A-Z][[:alpha:]]{8}$'
The $ (end of line) being made necessary because otherwise 'Supercalifragilisticexpialidocious' would match.
(I had modified {8} in {9} because you specified "and is followed by 9 letters", but then I saw you also state "and are 9 letters long")
By the way, if you use {8} and -o, you might be led into thinking a match is there where it isn't. "-o" means "only print the part matching my pattern".
So if you fed "Supercalifragilistic" to "^[A-Z][[:alpha:]]{8}", it would accept it as a match and print "Supercali". This is not what I think you asked.
If you cat the whole line is fed to grep at once. You should split the words before feeding to grep.
You could try:
cat randomtext | awk '{ for(i=1; i <= NF; i++) {print $i } }' | grep -Eo '^[A-Z][a-z]{8}'
You should do this :
$ cat file.txt
The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha
$ printf '%s\n' $(<file.txt) | grep -Eo '^[A-Z][[:alpha:]]{8}$'
Abcdefgha
If you want to work on the same source line, you need to remove the ^ character (means the beginning of the line) :
grep -Eo '\b[A-Z][[:alpha:]]{8}\b' file.txt
(added \b like choroba explains)

How to make GREP select only numeric values?

I use the df command in a bash script:
df . -B MB | tail -1 | awk {'print $4'} | grep .[0-9]*
This script returns:
99%
But I need only numbers (to make the next comparison).
If I use the grep regex without the dot:
df . -B MB | tail -1 | awk {'print $4'} | grep .[0-9]*
I receive nothing.
How to fix?
If you try:
echo "99%" |grep -o '[0-9]*'
It returns:
99
Here's the details on the -o (or --only-matching flag) works from the grep manual page.
Print only the matched (non-empty) parts of matching lines, with each such part on a separate output line. Output lines use the same delimiters as input, and delimiters are null bytes if -z (--null-data) is also used (see Other Options).
grep will print any lines matching the pattern you provide. If you only want to print the part of the line that matches the pattern, you can pass the -o option:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Like this:
echo 'Here is a line mentioning 99% somewhere' | grep -o '[0-9]+'
How about:
df . -B MB | tail -1 | awk {'print $4'} | cut -d'%' -f1
No need to used grep here, Try this:
df . -B MB | tail -1 | awk {'print substr($5, 1, length($5)-1)'}
function getPercentUsed() {
$sys = system("df -h /dev/sda6 --output=pcent | grep -o '[0-9]*'", $val);
return $val[0];
}
Don't use more commands than necessary, leave away tail, grep and cut. You can do this with only (a simple) awk
PS: giving a block-size en print only de persentage is a bit silly ;-) So leave also away the "-B MB"
df . |awk -F'[multiple field seperators]' '$NF=="Last field must be exactly --> mounted patition" {print $(NF-number from last field)}'
in your case, use:
df . |awk -F'[ %]' '$NF=="/" {print $(NF-2)}'
output: 81
If you want to show the percent symbol, you can leave the -F'[ %]' away and your print field will move 1 field further back
df . |awk '$NF=="/" {print $(NF-1)}'
output: 81%
You can use Perl style regular expressions as well. A digit is just \d then.
grep -Po "\\d+" filename
-P Interpret PATTERNS as Perl-compatible regular expressions (PCREs).
-o Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Resources