ubuntu terminal: grep for numbers compare [duplicate] - linux

This question already has answers here:
Is it possible to use egrep to match numbers within a range?
(2 answers)
Closed 6 years ago.
I have text file with table |ID | NAME | CREDIT| and content
Is it real to get all lines, where CREDIT < 1337(for example) by grep and ONLY with GREP, no awk or something else?
Have no idea, tnx

You can do it with pure grep, but it's ugly. Here you are:
grep -e " .$" -e " ..$" -e " ...$" -e " 1[0-2]..$" -e " 13[0-2].$" -e " 133[0-6]$"

This is a job very much unsuited to grep. As an artisan, you should select your tools carefully, no-one wants to try cutting down a giant Karri tree with a screwdriver :-)
It is almost certainly a job for awk. You haven't specified your content lines so let's assume for now they're of the form:
|iii|nnnnnnn|ccccc|
where the i, n and c sequences are the relevant column data.
To get those lines where the credit value is less than 1337, it's a simple matter to do:
awk -F'|' '$4 < 1337 {print}' inputFileName

Related

Make grep to exact match strings with and without dash "-"

The problem looks simple and common, so I've looked through many answers but seems that none of them provides appropriate general solution.
I need to grep large tab-separated 6 columns file (*.bed file in fact) to split it by the content of the first column using the list of string variables (items). I just need a row starting with a given string.
I was succesfully using
grep -w "$name" inputfile
$name is read from the list of strings
for that purpose until the case where strings have the following format (example): YAL038W but also YAL038W-A, YAL038W-B,...
So, grep with -w option considers YAL038W identical to YAL038W-A, YAL038W-B since "-" is word separator. it would work with "_" but not with "-".
I've found solutions based on awk which are working fine, for example:
awk -F $'\t' -vsearch=$name '$1==search' inputfile
but awk is terribly slow, over 10 times, see time measurements below
For 2.5 Gb input file and > 5000 items to look for, script is already running for >24 hours!
Example of inputfile:
YAL038W-A 0 48 HWI-1KL176:101:CC27NACXX:3:2208:17646:92047 0 +
YAL038W-A 0 48 HWI-1KL176:101:CC27NACXX:3:2211:17326:31268 0 +
YAL038W 1 50 HWI-1KL176:101:CC27NACXX:8:1205:16311:19319 3 +
YAL038W 1 27 HWI-1KL176:101:CC27NACXX:8:2103:4951:94527 42 +
time grep -w "YAL038W" inputfile > testfile.txt
real 0m3.569s
time awk -F $'\t' -vsearch="YAL038W" '$1==search' inputfile > testfile.txt
real 0m29.521s
I am looking for FAST solution using grep or something else, and I need to pass the variable to this command in the cycle.
Alternative is to modify the imput file by replacing "-" by "_", but it is the last possibility I believe...
Thanks in advance
I've found solutions based on awk which are working fine, for example:
awk -F $'\t' -vsearch=$name '$1==search' inputfile
but awk is terribly slow…
I am looking for FAST solution using grep …
If the above awk command worked for you, then this will do:
grep ^$name$'\t' inputfile
Just search at the beginning of each line for the name followed by a TAB.

Get text only within parenthesis from a file in linux terminal [duplicate]

This question already has an answer here:
How can I extract the content between two brackets?
(1 answer)
Closed 4 years ago.
I have a large log file I need to sort, I want to extract the text between parentheses. The format is something like this:
<#44541545451865156> (example#6144) has left the server!
How would I go about extracting "example#6144"?
This sed should work here:
sed -E -n 's/.*\((.*)\).*$/\1/p' file_name
There are many ways to skin this cat.
Assuming you always have only one lexeme in parentheses, you can use bash parameter expansion:
while read t; do echo $(t=${t#*(}; echo ${t%)*}); done <logfile
The first substitution: ${t#*(} cuts off everything up and including the left parenthesis, leaving you with example#6144) has left the server!; the second one: ${t%)*} cuts off the right parenthesis and everything after that.
Alternatively, you can also use awk:
awk -F'[)(]' '{print $2}' logfile
-F'[)(]' tells awk to use either parenthesis as the field delimiter, so it splits the input string into three tokens: <#44541545451865156>, example#6144, and has left the server!; then {print $2} instructs it to print the second token.
cut would also do:
cut -d'(' -f 2 logfile | cut -d')' -f 1
Try this:
sed -e 's/^.*(\([^()]*\)).*$/\1/' <logfile
The /^.*(\([^()]*\)).*$/ is a regular expression or regex. Regexes are hard to read until you get used to them, but are most useful for extracting text by pattern, as you are doing here.

Substring in linux based on first occurrence [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 5 years ago.
I have a raw unformatted Strings like below in a file.
"],"id":"1785695Jkc","vector":"profile","
"],"id":"jashj24231","vector":"profile","
"],"id":"3201298301","vector":"profile","
"],"id":"1123798749","vector":"profile","
I wanted to extract only the id values like below
1785695Jkc
I tried the below command
grep -o -P '(?<="],"id":").*(?=",")' myfile.txt >new.txt
but that takes the last occurance of the "," like below
1785695Jkc","vector":"profile
but I would need to split on the first occurrence only.
to extract only the id values like above which seem to be alphanumeric strings of length 10, use:
$ awk 'match($0,/[[:alnum:]]{10}/){print substr($0,RSTART,RLENGTH)}' file
1785695Jkc
jashj24231
3201298301
1123798749
If the definition of values like is not correct, please be more specific on the requirement.
Btw, changing your grep a bit works also:
$ grep -o -P '(?<="],"id":")[^"]*'
sed 's/"],"id":"\(.*\)","vector.*/\1/' myfile.txt
that assumes that all lines will start with "],"id":" as your input shows.
Oh, and this is GNU sed btw, your sed may use extended regular expressions, in which case lose the quoting of the brackets.
You can extract just the column you want using cut:
cut -f 2 -d , <filename> | cut -f 2 -d : | tr -d '"'
The first cut will take the id-value pair ("id": "jashj24231") and the second one extracts from that just the value ("jashj24231"). Finally tr removes the enclosing quotes.

I have a requirement of searching a pattern from a file and displaying the pattern only in the screen,not the whole line .How can I do it in linux? [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 5 years ago.
I have a requirement of searching a pattern like x=<followed by any values> from a file and displaying the pattern i.e x=<followed by any values>, only in the screen, not the whole line. How can I do it in Linux?
I have 3 answers, from simple (but with caveats) to complex (but foolproof):
1) If your pattern never appears more than once per line, you could do this (assuming your shell is
PATTERN="x="
sed "s/.*\($PATTERN\).*/\1/g" your_file | grep "$PATTERN"
2) If your pattern can appear more than once per line, it's a bit harder. One easy but hacky way to do this is to use a special characters that will not appear on any line that has your pattern, eg, "#":
PATTERN="x="
SPECIAL="#"
grep "$PATTERN" your_file | sed "s/$PATTERN/$SPECIAL/g" \
| sed "s/[^$SPECIAL]//g" | sed "s/$SPECIAL/$PATTERN/g"
(This won't separate the output pattern per line, eg. you'll see x=x=x= if a source line had 3 times "x=", this is easy to fix by adding a space in the last sed)
3) Something that always works no matter what:
PATTERN="x="
awk "NF>1{for(i=1;i<NF;i++) printf FS; print \"\"}" \
FS="$PATTERN" your_file

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Resources