How to search for string including digits by grep command - linux

I have strings in a file in below format:
fixedstring_1
fixedstring_23
fixedstring_456
...
fixedstring_[1 to n digits]
I tried with grep -E "fixedstring_[.....n times]" filepath in terminal. But, failed.
I want commands to get the count (-c) and list the lines.

If I understand correctly, given the following file...
fixedstring_1
bar
fixedstring_456
foo
fixedstring_45622
fixedstring_
fixedstring
You want to match (and get the count of) only these lines:
fixedstring_1
fixedstring_456
fixedstring_45622
This should work:
grep -Ec 'fixedstring_[[:digit:]]+' filename
The [[:digit:]]+ part matches 1 or more digits. More on grep regexes here: http://www.gnu.org/savannah-checkouts/gnu/grep/manual/grep.html#Regular-Expressions
EDIT:
If you want to match strings with only a certain number of digit's you'll have to get a little more clever:
grep -E 'fixedstring_[[:digit:]]{MIN,MAX}([^[:digit:]]|$)' filename
Replace the MIN with the minimum number of digits you want to match, and MAX with the max.

Related

Show rows of a file which have a regular expression more than 'n' number of times

I have file- abc.txt, in below format-
a:,b:,c:,d:,e:,f:,g:
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
a:0;b:,c:2,d:1,e:,f:,g:
Now in unix, I want to get only those rows where this regular expression :[0-9] (colon followed by any number) exists more than 2 times.
Or in other words show rows where at least 3 attributes have numerical values present.
Output should be only 2nd and 3rd row
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
With basic grep:
grep '\(:[[:digit:]].*\)\{3,\}' file
:[[:digit:]].* matches a colon followed by a digit and zero or more arbitrary characters. This expressions is put into a sub pattern: \(...\). The expression \{3,\} means that the previous expression has to occur 3 or more times.
With extended posix regular expressions this can be written a little simpler, without the need to escape ( and {:
grep -E '(:[[:digit:]].*){3,}' file
$ awk -F':[0-9]' 'NF>3' file
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
a:0;b:,c:2,d:1,e:,f:,g:
perl -nE '/:[0-9](?{$count++})(?!)/; print if $count > 2; $count=0' input
perl -ne 'print if /(.*?\:\d.*?){2,}/' yourfile
This matches rows having character:number twice or more times.
https://regex101.com/r/tRWtbY/1

Get Text after word at specific position

I have file like this
TT;12-11-18;text;abc;def;word
AA;12-11-18;tee;abc;def;gih;word
TA;12-11-18;teet abc;def;word
TT;12-11-18;tdd;abc;def;gih;jkl;word
I want output like this
TT;12-11-18;text;abc;def;word
TA;12-11-18;teet abc;def;word
I want to get word if it occur at position 5 after date 12-11-18. I do not want this occurrence if its found after this position that is at 6th or 7th position. Count of position start from date 12-11-18
I want tried this command
cat file.txt|grep "word" -n1
This print all occurrence in which this pattern word is matched. How should I solve my problem?
Try this(GNU awk):
awk -F"[; ]" '/12-11-18/ && $6=="word"' file
Or sed one:
sed -n '/12-11-18;\([^; ]*[; ]\)\{3\}word/p' file
Or grep with basically the same regex(different escape):
grep -E "12-11-18;([^; ]*[; ]){3}word" file
[^; ] means any character that's not ; or (space).
* means match any repetition of former character/group.
-- [^; ]* means any length string that don't contain ; or space, the ^ in [^; ] is to negate.
[; ] means ; or space, either one occurance.
() is to group those above together.
{3} is to match three repetitives of former chracter/group.
As a whole ([^; ]*[; ]){3} means ;/space separated three fields included the delimiters.
As #kvantour points out, if there could be multiple spaces at one place they could be faulty.
To consider multiple spaces as one separator, then:
awk -F"(;| +)" '/12-11-18/ && $6=="word"'
and
grep -E "12-11-18;([^; ]*(;| +)){3}word"
or GNU sed (posix/bsd/osx sed does not support |):
sed -rn '/12-11-18;([^; ]*(;| +)){3}word/p'

Print only previous line after using GREP command [duplicate]

This question already has answers here:
grep - print line before, don't print match
(3 answers)
Closed 7 years ago.
I want to search for a particular string in a certain file. Once found I only want to print the previous line of the greped line but not the line obtained using grep command.
cmd : grep -B 1 line5
Ex:
lin1 with some text
lin2
lin3
lin4
lin5 with some text
Output will be
lin4
lin5 with some text
But is there any solution where I can obtain only lin4 but not lin5.
Thanks for the help in advance.
Display N lines after match
-A is the option which prints the specified N lines after the match as shown below.
Syntax:
grep -A <N> "string" FILENAME
Display N lines before match
-B is the option, which prints the specified N lines before the match.
Syntax:
grep -B <N> "string” FILENAME
Display N lines around match
-C is the option, which prints the specified N lines before the match. In some occasion you might want the match to be appeared with the lines from both the side. This options shows N lines in both the side(before & after) of match.
$ grep -C 2 "Example" FILENAME
Just do this:
grep -B 1 line5 | head -1
to get the first line of your output

How do I count the number of occurrences of a string in an entire file?

Is there an inbuilt command to do this or has anyone had any luck with a script that does it?
I am looking to count the number of times a certain string (not word) appears in a file. This can include multiple occurrences per line so the count should count every occurrence not just count 1 for lines that have the string 2 or more times.
For example, with this sample file:
blah(*)wasp( *)jkdjs(*)kdfks(l*)ffks(dl
flksj(*)gjkd(*
)jfhk(*)fj (*) ks)(*gfjk(*)
If I am looking to count the occurrences of the string (*) I would expect the count to be 6, i.e. 2 from the first line, 1 from the second line and 3 from the third line. Note how the one across lines 2-3 does not count because there is a LF character separating them.
Update: great responses so far! Can I ask that the script handle the conversion of (*) to \(*\), etc? That way I could just pass any desired string as an input parameter without worrying about what conversion needs to be done to it so it appears in the correct format.
You can use basic tools such as grep and wc:
grep -o '(\*)' input.txt | wc -l
Using perl's "Eskimo kiss" operator with the -n switch to print a total at the end. Use \Q...\E to ignore any meta characters.
perl -lnwe '$a+=()=/\Q(*)/g; }{ print $a;' file.txt
Script:
use strict;
use warnings;
my $count;
my $text = shift;
while (<>) {
$count += () = /\Q$text/g;
}
print "$count\n";
Usage:
perl script.pl "(*)" file.txt
This loops over the lines of the file, and on each line finds all occurrences of the string "(*)". Each time that string is found, $c is incremented. When there are no more lines to loop over, the value of $c is printed.
perl -ne'$c++ while /\(\*\)/g;END{print"$c\n"}' filename.txt
Update: Regarding your comment asking that this be converted into a solution that accepts a regex as an argument, you might do it like this:
perl -ne'BEGIN{$re=shift;}$c++ while /\Q$re/g;END{print"$c\n"}' 'regex' filename.txt
That ought to do the trick. If I felt inclined to skim through perlrun again I might see a more elegant solution, but this should work.
You could also eliminate the explicit inner while loop in favor of an implicit one by providing list context to the regexp:
perl -ne'BEGIN{$re=shift}$c+=()=/\Q$re/g;END{print"$c\n"}' 'regex' filename.txt
You can use basic grep command:
Example: If you want to find the no of occurrence of "hello" word in a file
grep -c "hello" filename
If you want to find the no of occurrence of a pattern then
grep -c -P "Your Pattern"
Pattern example : hell.w, \d+ etc
I have used below command to find particular string count in a file
grep search_String fileName|wc -l
text="(\*)"
grep -o $text file | wc -l
You can make it into a script which accepts arguments like this:
script count:
#!/bin/bash
text="$1"
file="$2"
grep -o "$text" "$file" | wc -l
Usage:
./count "(\*)" file_path

elif conditional statement not working

I have this file as:
The number is %d0The number is %d1The number is %d2The number is %d3The number is %d4The number is %d5The number is %d6The...
The number is %d67The number is %d68The number is %d69The number is %d70The number is %d71The number is %d72The....
The number is %d117The number is %d118The number is %d119The number is %d120The number is %d121The number is %d122
I want to pad it like:
The number is %d0 The number is %d1 The number is %d2 The number is %d3 The number is %d4 The number is %d5 The number is %d6
The number is %d63 The number is %d64 The number is %d65 The number is %d66 The number is %d67 The number is %d68 The number is %d69
d118The number is %d119The number is %d120The number is %d121The number is %d122The number is %d123The number is %d124The
Please tell me how to do it through shell script
I am working on Linux
Edit:
This single command pipeline should do what you want:
sed 's/\(d[0-9]\+\)/\1 /g;s/\(d[0-9 ]\{3\}\) */\1/g' test2.txt >test3.txt
# ^ three spaces here
Explanation:
For each sequence of digits following a "d", add three spaces after it. (I'll use "X" to represent spaces.)
d1 becomes d1XXX
d10 becomes d10XXX
d100 becomes d100XXX
Now (the part after the semicolon), capture every "d" and the next three character which must be digits or spaces and output them but not any spaces beyond.
d1XXX becomes d1XX
d10XXX becomes d10X
d100XXX becomes d100
If you want to wrap the lines as you seem to show in your sample data, then do this instead:
sed 's/\(d[0-9]\+\)/\1 /g;s/\(d[0-9 ]\{3\}\) */\1/g' test2.txt | fold -w 133 >test3.txt
You may need to adjust the argument of the fold command to make it come out right.
There's no need for if, grep, loops, etc.
Original answer:
First of all, you really need to say which shell you're using, but since you have elif and fi, I'm assuming it's Bourne-derived.
Based on that assumption, your script makes no sense.
The parentheses for the if and elif are unnecessary. In this context, they create a subshell which serves no purpose.
The sed commands in the if and elif say "if the pattern is found, copy hold space (it's empty, by the way) to pattern space and output it and output all other lines.
The first sed command will always be true so the elif will never be executed. sed always returns true unless there's an error.
This may be what you intended:
if grep -Eqs 'd[0-9]([^0-9]|$)' test2.txt; then
sed 's/\(d[0-9]\)\([^0-9]\|$\)/\1 \2/g' test2.txt >test3.txt
elif grep -Eqs 'd[0-9][0-9]([^0-9]|$)' test2.txt; then
sed 's/\(d[0-9][0-9]\)\([^0-9]\|$\)/\1 \2/g' test2.txt >test3.txt
else
cat test2.txt >test3.txt
fi
But I wonder if all that could be replaced by something like this one-liner:
sed 's/\(d[0-9][0-9]?\)\([^0-9]\|$\)/\1 \2/g' test2.txt >test3.txt
Since I don't know what test2.txt looks like, part of this is only guessing.

Resources