Grep a string in Linux - linux

I have a log file that records string like this: (1512424528.554:0:1880:1880). I need to grep out only the lines where the last value is over 1000 and print them. Thoughts?

Grep is not the most pragmatic tool for this task. As #WilliamPursell adviced use awk or other tools for this problem.
Solutions:
Using grep: You can solve it but not perfectly by looking for at least 4 digits starting with a digit 1 to 9 ([1-9][0-9]{3,}) between the last colon (.*: matching everything until the last colon) and the end of the line ($) using perl-regexp (-P flag). This solution is not perfect since it gives you result for greater than or equal to 1000.
grep -P '.*:[1-9][0-9]{3,}$' your_file
Using awk: You can simply compare the last field ($NF) to 1000 after setting the field delimiter with -F.
awk -F':' '$NF > 1000 { print $0 }' your_file
Using perl: You can lookup for the last digits between the last colon and the end of the line and capture those digits (\d+) into $1 that is compared to 1000.
perl -ne 'print if (/.*:(\d+)$/ and $1>1000)' your_file
UPDATE
Based on the comments of #vaettchen I added solutions to take into account the parenthesis if they are part of the logged strings.
grep:
grep -P '.*:[1-9][0-9]{3,}\)$' your_file
awk:
awk -F':' 'substr($NF, 0, length($NF)-1)*1 > 1000' your_file
perl:
perl -ne 'print if (/.*:(\d+)\)$/ and $1>1000)' your_file

Related

How to strip stdout before logging into file? [duplicate]

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?
You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com
Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.
It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar
Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)
There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.
This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.
If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'
the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d
An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is
It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'
If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev
Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.
choose -1
choose supports negative indexing (the syntax is similar to Python's slices).
I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

How To Substitute Piped Output of Awk Command With Variable

I'm trying to take a column and pipe it through an echo command. If possible, I would like to keep it in one line or do this as efficiently as possible. While researching, I found that I have to use single quotes to expand the variable and to escape the double quotes.
Here's what I was trying:
awk -F ',' '{print $2}' file1.txt | while read line; do echo "<href=\"'${i}'\">'${i}'</a>"; done
But, I keep getting the number of lines than the single line's output. If you know how to caputure each line in field 4, that would be so helpful.
File1.txt:
Hello,http://example1.com
Hello,http://example2.com
Hello,http://example3.com
Desired output:
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>
$ awk -F, '{printf "<href=\"%s\">%s</a>\n", $2, $2}' file
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>
Or slightly briefer but less robustly:
$ sed 's/.*,\(.*\)/<href="\1">\1<\/a>/' file
<href="http://example1.com">http://example1.com</a>
<href="http://example2.com">http://example2.com</a>
<href="http://example3.com">http://example3.com</a>

How to find the last field using 'cut'

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?
You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com
Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.
It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar
Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)
There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.
This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.
If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'
the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d
An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is
It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'
If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev
Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.
choose -1
choose supports negative indexing (the syntax is similar to Python's slices).
I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

How to check if the sed command replaced some string? [duplicate]

This question already has answers here:
How to check if sed has changed a file
(11 answers)
Closed 7 years ago.
This command replaces the old string with the new one if the one exists.
sed "s/$OLD/$NEW/g" "$source_filename" > $dest_filename
How can I check if the replacement happened ? (or how many times happened ?)
sed is not the right tool if you need to count the substitution, awk will fit better your needs :
awk -v OLD=foo -v NEW=bar '
($0 ~ OLD) {gsub(OLD, NEW); count++}1
END{print count " substitutions occured."}
' "$source_filename"
This latest solution counts only the number of lines substituted. The next snippet counts all substitutions with perl. This one has the advantage to be clearer than awk and we keep the syntax of sed substitution :
OLD=foo NEW=bar perl -pe '
$count += s/$ENV{OLD}/$ENV{NEW}/g;
END{print "$count substitutions occured.\n"}
' "$source_filename"
Edit
Thanks to william who had found the $count += s///g trick to count the number of substitutions (even or not on the same line)
This awk should count the total number of substitutions instead of the number of lines where substitutions took place:
awk 'END{print t, "substitutions"} {t+=gsub(old,new)}1' old="foo" new="bar" file
If it is free for you to choose other tool, like awk, (as #sputnick suggested), go with other tools. Awk could count how many times the pattern matched.
sed itself cannot count replacement, particularly if you use /g flag. however if you want to stick to sed and know the replacement times there is possibilities:
One way is
grep -o 'pattern'|wc -l file && sed 's/pattern/rep/g' oldfile > newfile
you could also do it with tee
cat file|tee >(grep -o 'pattern'|wc -l)|(sed 's/pattern/replace/g' >newfile)
see this small example:
kent$ cat file
abababababa
aaaaaa
xaxaxa
kent$ cat file|tee >(grep -o 'a'|wc -l)|(sed 's/a/-/g' >newfile)
15
kent$ cat newfile
-b-b-b-b-b-
------
x-x-x-
this worked for me.
awk -v s="OLD" -v c="NEW" '{count+=gsub(s,c); }1
END{print count "numbers"}
' opfilename

How to reverse order of fields using AWK?

I have a file with the following layout:
123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010
How can I convert it into the following by using AWK?
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2009-12-01
Didn't read the question properly the first time. You need a field separator that can be either a dash or a comma. Once you have that you can use the dash as an output field separator (as it's the most common) and fake the comma using concatenation:
awk -F',|-' 'OFS="-" {print $1 "," $4,$3,$2}' file
Pure awk
awk -F"," '{ n=split($2,b,"-");$2=b[3]"-"b[2]"-"b[1];$i=$1","$2 } 1' file
sed
sed -r 's/(^.[^,]*,)([0-9]{2})-([0-9]{2})-([0-9]{4})/\1\4-\3-\2/' file
sed 's/\(^.[^,]*,\)\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9]\+\)/\1\4-\3-\2/' file
Bash
#!/bin/bash
while IFS="," read -r a b
do
IFS="-"
set -- $b
echo "$a,$3-$2-$1"
done <"file"
Unfortunately, I think standard awk only allows one field separator character so you'll have to pre-process the data. You can do this with tr but if you really want an awk-only solution, use:
pax> echo '123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010' | awk -F, '{print $1"-"$2}' | awk -F- '{print $1","$4"-"$3"-"$2}'
This outputs:
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2010-12-01
as desired.
The first awk changes the , characters to - so that you have four fields separated with the same character (this is the bit I'd usually use tr ',' '-' for).
The second awk prints them out in the order you specified, correcting the field separators at the same time.
If you're using an awk implementation that allows multiple FS characters, you can use something like:
gawk -F ',|-' '{print $1","$4"-"$3"-"$2}'
If it doesn't need to be awk, you could use Perl too:
$ perl -nle 'print "$1,$4-$3-$2" while (/(\d{3}),(\d{2})-(\d{2})-(\d{4})\s*/g)' < file.txt

Resources