Using sed to print range when pattern is inside the range? - linux

I have a log file full of queries, and I only want to see the queries that have an error. The log entries look something like:
path to file executing query
QUERY
SIZE: ...
ROWS: ...
MSG: ...
DURATION: ...
I want to print all of this stuff, but only when MSG: contains something of interest (an error message). All I've got right now is the sed -n '/^path to file/,/^DURATION/' and I have no idea where to go from here.
Note: Queries are often multiline, so using grep's -B sadly doesn't work all the time (this is what I've been doing thus far, just being generous with the -B value)
Somehow I'd like to use only sed, but if I absolutely must use something else like awk I guess that's fine.
Thanks!

You haven't said what an error message looks like, so I'll assume it contains the word "ERROR":
sed -n '/^MSG.*ERROR/{H;g;N;p;};/^DURATION/{s/.*//;h;d;};H' < logname
(I wish there were a tidier way to purge the hold space. Anyone?...)

I could suggest a solution with grep. That will work if the structure in the log file is always the same as above (i.e. MSG is in the 5th line, and one line follows):
egrep -i '^MSG:.*error' -A 1 -B 4 logfile
That means: If the word error occurs in a MSG line then output the block beginning from 4 lines before MSG till one line after it.
Of course you have to adjust the regexp to recognize an error.
This will not work if the structure of those blocks differs.

Perhaps you can use the cgrep.sed script, as described by Unix Power Tools book

Related

Set line maximum of log file

Currently I write a simple logger to log messages from my bash script. The logger works fine and I simply write the date plus the message in the log file. Since the log file will increase, I would like to set the limit of the logger to for example 1000 lines. After reaching 1000 lines, it doesn't delete or totally clear the log file. It should truncate the first line and replace it with the new log line. So the file keeps 1000 lines and doesn't increase further. The latest line should always be at the top of the file. Is there any built in method? Or how could I solve this?
Why would you want to replace the first line with the new message thereby causing a jump in the order of messages in your log file instead of just deleting the first line and appending the new message, e.g. simplistically:
log() {
tail -999 logfile > tmp &&
{ cat tmp && printf '%s\n' "$*"; } > logfile
}
log "new message"
You don't even need a tmp file if your log file is always small lines, just save the output of the tail in a variable and printf that too.
Note that unlike a sed -i solution, the above will not change the inode, hardlinks, permissions or anything else for logfile - it's the same file as you started with just with updated content, it's not getting replaced with a new file.
Your chosen example may not be the best. As the comments have already pointed out, logrotate is the best tool to keep log file sizes at bay; furthermore, a line is not the best unit to measure size. Those commenters are both right.
However, I take your question at face value and answer it.
You can achieve what you want by shell builtins, but it is much faster and simpler to use an external tool like sed. (awk is another option, but it lacks the -i switch which simplifies your life in this case.)
So, suppose your file exists already and is named script.log then
maxlines=1000
log_msg='Whatever the log message is'
sed -i -e1i"\\$log_msg" -e$((maxlines))',$d' script.log
does what you want.
-i means modify the given file in place.
-e1i"\\$log_msg" means insert $log_msg before the first (1) line.
-e$((maxlines))',$d' means delete each line from line number $((maxlines)) to the last one ($).

Find and Replace Incrementally Across Multiple Files - Bash

I apologize in advance if this belongs in SuperUser, I always have a hard time discerning whether these scripting in bash questions are better placed here or there. Currently I know how to find and replace strings in multiple files, and how to find and replace strings within a single file incrementally from searching for a solution to this issue, but how to combine them eludes me.
Here's the explanation:
I have a few hundred files, each in sets of two: a data file (.data), and a message file (data.ms).
These files are linked via a key value unique to each set of two that looks like: ab.cdefghi
Here's what I want to do:
Step through each .data file and do the following:
Find:
MessageKey ab.cdefghi
Replace:
MessageKey xx.aaa0001
MessageKey xx.aaa0002
...
MessageKey xx.aaa0010
etc.
Incrementing by 1 every time I get to a new file.
Clarifications:
For reference, there is only one instance of "MessageKey" in every file.
The paired files have the same name, only their extensions differ, so I could simply step through all .data files and then all .data.ms files and use whatever incremental solution on both and they'd match fine, don't need anything too fancy to edit two files in tandem or anything.
For all intents and purposes whatever currently appears on the line after each MessageKey is garbage and I am completely throwing it out and replacing it with xx.aaa####
String length does matter, so I need xx.aa0009, xx.aaa0010 not xx.aa0009, xx.aa00010
I'm using cygwin.
I would approach this by creating a mapping from old key to new and dumping that into a temp file.
grep MessageKey *.data \
| sort -u \
| awk '{ printf("%s:xx.aaa%04d\n", $1, ++i); }' \
> /tmp/key_mapping
From there I would confirm that the file looks right before I applied the mapping using sed to the files.
cat /tmp/key_mapping \
| while read old new; do
sed -i -e "s:MessageKey $old:MessageKey $new:" * \
done
This will probably work for you, but it's neither elegant or efficient. This is how I would do it if I were only going to run it once. If I were going to run this regularly and efficiency mattered, I would probably write a quick python script.
#Carl.Anderson got me started on the right track and after a little tweaking, I ended up implementing his solution but with some syntax tweaks.
First of all, this solution only works if all of your files are located in the same directory. I'm sure anyone with even slightly more experience with UNIX than me could modify this to work recursively, but here goes:
First I ran:
-hr "MessageKey" . | sort -u | awk '{ printf("%s:xx.aaa%04d\n", $2, ++i); }' > MessageKey
This command was used to create a find and replace map file called "MessageKey."
The contents of which looked like:
In.Rtilyd1:aa.xxx0087
In.Rzueei1:aa.xxx0088
In.Sfricf1:aa.xxx0089
In.Slooac1:aa.xxx0090
etc...
Then I ran:
MessageKey | while IFS=: read old new; do sed -i -e "s/MessageKey $old/MessageKey $new/" *Data ; done
I had to use IFS=: (or I could have alternatively find and replaced all : in the map file with a space, but the former seemed easier.
Anyway, in the end this worked! Thanks Carl for pointing me in the right direction.

Trying to Delete Certain Lines in a Range Using sed

In a large file, I need to edit text and remove comments inside a particular range. For this simplified example, let's assume the range begins with _start_ and finishes at _end_.
I'm able to edit the text with no problem using a command like:
sed -i -r "/_start_/,/_end_/ s/SearchText/ReplaceText/" FileName
Please note the following (and let me know, of course, if any of my statements are inaccurate or misguided):
I used -i so that it would edit "FileName" in place, rather than write to a different file.
I used -r so that it would recognize extended regular expressions (which are not shown in the simplified example above, but which seem to be working correctly).
I used double-quotes so that it would correctly handle variables (also not shown in the simplified example above, but also working as expected).
That command above is doing exactly what I expect it to do. So I moved on to the second step of my process: a very similar command to remove comment lines within this range:
sed -i -r "/_start_/,/_end_/ /^#/ d" FileName
This, however, is having no effect: The lines that begin with # are not removed. Indeed, when I execute this command alone, absolutely nothing in my file is changed or deleted -- neither inside the range nor elsewhere.
In my searches on this site and elsewhere, I've found a lot of instructions on deleting lines using sed (instructions that I think I'm following correctly) -- but nothing about a failure such as I'm experiencing.
Can anyone advise me what I'm doing wrong here?
I'm very new to the UNIX/Linux environment, so I'm definitely open to alternate suggestions as to how to handle the issue. But just to satisfy my frustration, I'd love to know what's wrong with my sed command above.
The best source of information is often the man page. You can reach it with the command man sed.
d takes an address range according to the man page. An address can be a number, a /regexp/, or a number of other things. An address range is either one address or two addresses, separated by comma.
You have been trying to use an address range and then an address.
As 1_CR pointed out, you can work around by using a block instead:
sed -i -r "/_start_/,/_end_/ {/^#/ d}" FileName
A block accepts an address range, and every command accepts an address range again, so you can combine the regexps.
You need to change
sed -i -r "/_start_/,/_end_/ /^#/ d" FileName
to
sed -i -r "/_start_/,/_end_/{/^#/d}" FileName
In terms of doing exactly what your question asks, you can also do the same thing with a range of line numbers. It doesn't use regular expressions, but you might find doing this is easier if looking at the line numbers is convenient for you:
sed -i '<start-of-range>,<end-of-range>d' FileName

egrep not writing to a file

I am using the following command in order to extract domain names & the full domain extension from a file. Ex: www.abc.yahoo.com, www.efg.yahoo.com.us.
[a-z0-9\-]+\.com(\.[a-z]{2})?' source.txt | sort | uniq | sed -e 's/www.//'
> dest.txt
The command write correctly when I specify small maximum parameter -m 100 after the source.txt. The problem if I didn't specify, or if I specified a huge number. Although, I could write to files with grep (not egrep) before with huge numbers similar to what I'm trying now and that was successful. I also check the last modified date and time during the command being executed, and it seems there is no modification happening in the destination file. What could be the problem ?
As I mentioned in your earlier question, it's probably not an issue with egrep, but that your file is too big and that sort won't output anything (to uniq) until egrep is done. I suggested that you split the files into manageable chucks using the split command. Something like this:
split -l 10000000 source.txt split_source.
This will split the source.txt file into 10 million line chunks called split_source.a, split_source.b, split_source.c etc. And you can run the entire command on each one of those files (and maybe changing the pipe to append at the end: >> dest.txt).
The problem here is that you can get duplicates across multiple files, so at the end you may need to run
sort dest.txt | uniq > dest_uniq.txt
Your question is missing information.
That aside, a few thoughts. First, to debug and isolate your problem:
Run the egrep <params> | less so you can see what egreps doing, and eliminate any problem from sort, uniq, or sed (my bets on sort).
How big is your input? Any chance sort is dying from too much input?
Gonna need to see the full command to make further comments.
Second, to improve your script:
You may want to sort | uniq AFTER sed, otherwise you could end up with duplicates in your result set, AND an unsorted result set. Maybe that's what you want.
Consider wrapping your regular expressions with "^...$", if it's appropriate to establish beginning of line (^) and end of line ($) anchors. Otherwise you'll be matching portions in the middle of a line.

How do I grep for entire, possibly wrapped, lines of code?

When searching code for strings, I constantly run into the problem that I get meaningless, context-less results. For example, if a function call is split across 3 lines, and I search for the name of a parameter, I get the parameter on a line by itself and not the name of the function.
For example, in a file containing
...
someFunctionCall ("test",
MY_CONSTANT,
(some *really) - long / expression);
grepping for MY_CONSTANT would return a line that looked like this:
MY_CONSTANT,
Likewise, in a comment block:
/////////////////////////////////////////
// FIXMESOON, do..while is the wrong choice here, because
// it makes the wrong thing happen
/////////////////////////////////////////
Grepping for FIXMESOON gives the very frustrating answer:
// FIXMESOON, do..while is the wrong choice here, because
When there are thousands of hits, single line results are a little meaningless. What I would like to do is have grep be aware of the start and stop points of source code lines, something as simple as having it consider ";" as the line separator would be a good start.
Bonus points if you can make it return the entire comment block if the hit is inside a comment.
I know you can't do this with grep alone. I also am aware of the option to have grep return a certain number of lines of context. Any suggestions on how to accomplish under Linux? FYI my preferred languages are C and Perl.
I'm sure I could write something, but I know that somebody must have already done this.
Thanks!
You can use pcregrep with the -M option (multiline matching; pcregrep is grep with Perl-compatible regular expressions). Something like:
pcregrep -M ";*\R*.*thingtosearchfor*\R*.*;.*"
Here's an example using awk.
$ cat file
blah1
blah2
function1 ("test",
MY_CONSTANT,
(some *really) - long / expression);
function2( one , two )
blah3
blah4
$ awk -vRS=")" '/function1/{gsub(".*function1","function1");print $0RT}' file
function1 ("test",
MY_CONSTANT,
(some *really)
the concept behind: RS is record separator. by setting it to ")", then every record in your file is separated by ")" instead of newline. This make it easy to find your "function1" since you can then "grep" for it. If you don't use awk, the same concept can be applied using "splitting" on ")".
You can write a command line using grep with the options that give you the line number and the filename, then xarg these results into awk to parse these columns and then use a little script from you to display the N lines surrounding that line? :)
If this isn't an academic endeavour you could just use cscope (for C code only though). If you are willing to drop the requirement to search in comments ctags should be enough (and it also supports Perl).
I had a situation in which I had an xml file full of the names of zip files in an xml style format, that is, with carrots bracketing the names of the files, say example.zip<\stuff>
I used awk to change all carrots into newlines then used grep :)

Resources