unix awk pattern help, aka. search not working - search

newbie here asking for help
I need to only list lines that are in certain format:
<server>: (<#> dbs)<disk> <total>G <used>G <free>G <perc>% <mount>
wakanda: (5 dbs)/dev/sda1 12G 24G 12G 50% /
all lines that do not match this should not be displayed
so far i tried to kinda brute force it by trying:
awk '$0~p' p=".*:*dbs*G*G*G*" ./testFile
but this also lists lines like:
server: (207 dbs)
I dont understand why it would straight up ignore the Gs, I've been googling for considerable time now and I'm currently completely lost, please help

Could you please try following, written and tested with shown samples only in GNU awk.
awk '/.*dbs.*([0-9]+G ){3}/' Input_file
Explanation: awk works on method of condition(regexp) then action. Here I am putting a check to look for regex .*dbs.*([0-9]+G ){3} in current line and if regexp is found then no action is mentioned so by default printing of line will happen.

Related

Using SED to replace capture group with regex pattern

I need some help with a sed command that I thought would help solve an issue I have. I have basically have long text files that look something like this:
>TRINITY_DN112253_co_g1_i2 Len=3873 path=[38000:0-183]
ACTCACGCCCACATAAT
The ACT text blocks continue on, and then there are more blocks of text that follow the same pattern, except the text after the > differs slightly by numbers. I want to replace only this header part (the part followed by the >) to everything up until the very last “_” the sed command I thought seemed logical is the following:
sed -i ‘s/>.*/TRINITY.*_/‘
However, sed is literally changing each header to TRINITY.*_ rather than capturing the block I thought it would. Any help is appreciated!
(Also.. just to make things clear, I thought that my sed command would convert the top header block into this:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
This might help:
sed '/^>/s/[^_]*$//' file
Output:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
See: The Stack Overflow Regular Expressions FAQ

Delete Repeated Characters without back-referencing with SED

Let's say we have a file that contains
HHEELLOO
HHYYPPOOTTHHEESSIISS
and we want to delete repeated characters. To my knowledge we can do this with
s/\([A-Z]\)\1/\1/g
This is a homework problem and the professor said he wants us to try the exercises without back-referencing or extended regular expressions. Is that possible on this one? I would appreciate it if anyone could point me in the right direction, thanks!
The only reasonable way to do this is to use the right tool for the job, in this case tr:
$ tr -s 'A-Z' < file
HELO
HYPOTHESIS
If you were going to use sed for that specific problem though then it'd just be:
$ sed 's/\(.\)./\1/g' file
HELO
HYPOTHESIS
If that's not what you're looking for then edit your question to show more truly representative sample input and expected output.
Here's one way:
s/AA/A/g
s/BB/B/g
...
s/ZZ/Z/g
As a one-liner:
sed 's/AA/A/g; s/BB/B/g; ...'

How to write a script that takes two arguments lower bound and upper bound and displays the file sizes who are in that specific range?

This script has to be done utilizing the awk command. This is what I have so far, but it doesn't seem to be working. I am a newbie and trying to figure this out. If anyone can help me it would be greatly appreciated.
This is my attempt so far:
read $0
read $1
ls -l|awk '{/$0/, /$1/} {print $5 "\t" $9}'
In awk, rules consist of a pattern and a command which is executed if the pattern matches the record. Commands are delimited with brackets, patterns are not. And the range pattern does not work anything like you seem to think it does. Using a regular expression to match is not going to help much, since you are interested in numerical values in a given range. If your range were 10 to 20, for example, 15 would not match the lower bound, but would still be in range.

Using sed to print range when pattern is inside the range?

I have a log file full of queries, and I only want to see the queries that have an error. The log entries look something like:
path to file executing query
QUERY
SIZE: ...
ROWS: ...
MSG: ...
DURATION: ...
I want to print all of this stuff, but only when MSG: contains something of interest (an error message). All I've got right now is the sed -n '/^path to file/,/^DURATION/' and I have no idea where to go from here.
Note: Queries are often multiline, so using grep's -B sadly doesn't work all the time (this is what I've been doing thus far, just being generous with the -B value)
Somehow I'd like to use only sed, but if I absolutely must use something else like awk I guess that's fine.
Thanks!
You haven't said what an error message looks like, so I'll assume it contains the word "ERROR":
sed -n '/^MSG.*ERROR/{H;g;N;p;};/^DURATION/{s/.*//;h;d;};H' < logname
(I wish there were a tidier way to purge the hold space. Anyone?...)
I could suggest a solution with grep. That will work if the structure in the log file is always the same as above (i.e. MSG is in the 5th line, and one line follows):
egrep -i '^MSG:.*error' -A 1 -B 4 logfile
That means: If the word error occurs in a MSG line then output the block beginning from 4 lines before MSG till one line after it.
Of course you have to adjust the regexp to recognize an error.
This will not work if the structure of those blocks differs.
Perhaps you can use the cgrep.sed script, as described by Unix Power Tools book

How do I grep for entire, possibly wrapped, lines of code?

When searching code for strings, I constantly run into the problem that I get meaningless, context-less results. For example, if a function call is split across 3 lines, and I search for the name of a parameter, I get the parameter on a line by itself and not the name of the function.
For example, in a file containing
...
someFunctionCall ("test",
MY_CONSTANT,
(some *really) - long / expression);
grepping for MY_CONSTANT would return a line that looked like this:
MY_CONSTANT,
Likewise, in a comment block:
/////////////////////////////////////////
// FIXMESOON, do..while is the wrong choice here, because
// it makes the wrong thing happen
/////////////////////////////////////////
Grepping for FIXMESOON gives the very frustrating answer:
// FIXMESOON, do..while is the wrong choice here, because
When there are thousands of hits, single line results are a little meaningless. What I would like to do is have grep be aware of the start and stop points of source code lines, something as simple as having it consider ";" as the line separator would be a good start.
Bonus points if you can make it return the entire comment block if the hit is inside a comment.
I know you can't do this with grep alone. I also am aware of the option to have grep return a certain number of lines of context. Any suggestions on how to accomplish under Linux? FYI my preferred languages are C and Perl.
I'm sure I could write something, but I know that somebody must have already done this.
Thanks!
You can use pcregrep with the -M option (multiline matching; pcregrep is grep with Perl-compatible regular expressions). Something like:
pcregrep -M ";*\R*.*thingtosearchfor*\R*.*;.*"
Here's an example using awk.
$ cat file
blah1
blah2
function1 ("test",
MY_CONSTANT,
(some *really) - long / expression);
function2( one , two )
blah3
blah4
$ awk -vRS=")" '/function1/{gsub(".*function1","function1");print $0RT}' file
function1 ("test",
MY_CONSTANT,
(some *really)
the concept behind: RS is record separator. by setting it to ")", then every record in your file is separated by ")" instead of newline. This make it easy to find your "function1" since you can then "grep" for it. If you don't use awk, the same concept can be applied using "splitting" on ")".
You can write a command line using grep with the options that give you the line number and the filename, then xarg these results into awk to parse these columns and then use a little script from you to display the N lines surrounding that line? :)
If this isn't an academic endeavour you could just use cscope (for C code only though). If you are willing to drop the requirement to search in comments ctags should be enough (and it also supports Perl).
I had a situation in which I had an xml file full of the names of zip files in an xml style format, that is, with carrots bracketing the names of the files, say example.zip<\stuff>
I used awk to change all carrots into newlines then used grep :)

Resources