General solution for bypassing file headers in shell commands - linux

I make extensive use of piping multiple linux shell commands, for example:
grep BLAH file1 | sed 's/old/new/' | sort -k 1,1 > file3
My files often have a header line, and often I have to preserve it throughout the pipeline. So, for example, I would want to grep, sed and sort from line 2 and on, while keeping the 1st line unchanged.
I am looking for some general solution that given some command(s) would preserve the header. I usually write the header to a file before the pipe and then cat it back after the pipe ends. I have started using zshell, so I was wondering if that might help to get a more streamlined solution.
Perhaps something like this:
(arrows are pipes in the image)
but I am not sure how to get that to work in zshell or if it is even possible. One problem is that I need to follow up the first pipe split with a command on both pipes.
Any creative solutions?

Vaughn and devnull have already directed you towards the solution. They both contain typos though and I have some remarks to add and would advise to use this instead:
{ head -n 1 file1; tail -n +2 file1 | grep BLAH | sed 's/old/new/' | sort -k 1,1; } >file3
What it does is take the first line of file1 in one command (your header) and does your grep/sed/whatever magic in a second command on the rest of the file (sans the header, tail -n +2) and redirects the combined output to file3.
Notes:
If your shell supports { } it is preferred over the ( ) construct in this case as it does not spawn a subshell (sometimes it is desirable to have the subshell, though).
head -2 is deprecated, you should use the -n parameter like head -n 2
You can skip the tail -n +2 file1 part if you absolutely know that what you are grepping for cannot be found in your header, but it is certainly cleaner this way.
This should work in most recent shells, btw (bash, ksh, zsh).

Related

How to input a command's result as a string argument in sed

i want to execute a command as follows on my bash terminal:
sed -i '6i `sed '1!d' input.in`' out
with which i can insert at line 6 of file out (with replacing -i option) the result of the sed '%1!d' input.in command. I haven't found anything useful, and have tried both `com`, $(com) and com | sed -i '6i ' out, where com stands for sed '%1!d' input.in. I don't have any problem changing the syntax of the whole command but i want it to be written in one line on terminal use sed.
Thanks for listening,
awaiting your answer.
For EdMorton:
Example Input:
input.in:
into a lake.
out:
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
Desired Output:
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
into a lake.
Try using r on standard input instead of i.
sed '%1!d' input.in |
sed -i '6r /dev/stdin' out
If your platform doesn't support /dev/stdin or /dev/fd/0, see if your sed supports - to mean standard input ... or, in the worst case, resort to a temporary file.
As commenters have already pointed out, %1!d does not appear to be a valid command in most sed dialects, but that is basically unimportant here. (If you mean to print just the first line, maybe you mean sed '1!d', although sed 'p;q' does that more efficiently.)
sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk.
Given this modified input file
$ cat input.in
a Windows folder C:\Windows\Temp
Here is what the sed solution you posted in your comments does:
$ sed '1!d' input.in > temp.of.in && sed "6i `cat temp.of.in`" out
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
a Windows folder C:WindowsTemp
and here is what an awk solution does more efficiently and accurately and without a temp file:
$ awk 'NR==1{x=$0;nextfile} FNR==6{print x} 1' input.in out
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
a Windows folder C:\Windows\Temp
Notice the awk solution preserved the path-separator backslashes while the sed one stripped them. Also note that you should really add && rm temp.of.in to the end of your sed command line to clean up the temp file and you should be using $(..) to execute your command, not obsolete backticks.
The awk solution uses GNU awk for ;nextfile, with other awks you'd replace that with }NR==FNR{next or similar but since you are using GNU sed I assume you have GNU awk too.
Note that if you DID have a burning desire to use sed and accept it won't exactly reproduce the input, there are simpler, more efficient ways to do what your current script does, e.g.:
sed "6i $(head -1 input.in)" out
or even your original idea, just rewritten to remove the obsolete backticks and negative logic of 1!d:
sed "6i $(sed -n '1p' input.in)" out
But seriously - just use awk. For anything other than simple substitutions on individual lines it's much more robust, efficient, clear, portable, extensible, etc. etc. than sed.
EDIT To address the questions in your comments:
Can you explain the arguments on awk.
There are no arguments, just a script that says: If this is the first line read from the first file save it in variable x then move on to the next file. If this is line 6 of the 2nd file print the contents of variable x. For every line of the 2nd file, print it (the 1 is idiomatic but a bit tricky at first glance - it's a true condition so it invokes the default action of printing the current input, equivalent to just writing {print}.
how can i replace the out file with the output (without using '>') as the option -i does on sed and avoid printing it to stdout? Just like GNU sed has -i, GNU awk has -i inplace. Be careful though because, just like with sed, it applies to every input file so if you don't print the contents of the first file then when the script is done the first file will be empty. There's various was to deal with that, including simply printing the lines from file 1 or turning inplace editing on/off in BEGINFILE/ENDFILE blocks, see https://www.gnu.org/software/gawk/manual/gawk.html#Extension-Sample-Inplace, but IMHO awk 'script' file1 file2 > temp && mv temp file2 is the simplest and clearest as well as being portable to all awks/seds/whatever.
Also if there is a multiline solution like "take lines 1 to 4" from "input.in" and drop them on line 6 of "out"? No problem:
.
awk '
NR==FNR { if (NR<=4) x=x $0 ORS; else nextfile }
FNR==6 { printf "%s", x }
{ print }
' input.in out
I changed the 1 from the previous script to { print } for clarity.

Alternative to ls in shell-script compatible with nohup

I have a shell-script which lists all the file names in a directory and store them in a new file.
The problem is that when I execute this script with the nohup command, it lists the first name four times instead of listing the correct names.
Commenting the problem with other programmers they think that the problem may be the ls command.
Part of my code is the following:
for i in $( ls -1 ./Datasets/); do
awk '{print $1}' ./genes.txt | head -$num_lineas | tail -1 >> ./aux
let num_lineas=$num_lineas-1
done
Do you know an alternative to ls that works well with nohup?
Thanks.
Don't use ls to feed the loop, use:
for i in ./Datasets/*; do
or if subdirectories are of interest
for i in ./Datasets/*/*; do
Lastly, and more correctly, use find if you need the entire tree below Datasets:
find ./Datasets -type f | while IFS= read -r file; do
(do stuff with $file)
done
Others frown, but there is nothing wrong with also using find as:
for file in $(find ./Datasets -type f); do
(do stuff with $file)
done
Just choose the syntax that most closely meets your needs.
First of all, don't parse ls! A simple glob will suffice. Secondly, your awk | head | tail chain can be simplified by only printing the first column of the line that you're interested in using awk. Thirdly, you can redirect the output of your loop to a file, rather than using >>.
Incorporating all of those changes into your script:
for i in Datasets/*; do
awk -v n="$(( num_lineas-- ))" 'NR==n{print $1}' genes.txt
done > aux
Every time the loop goes round, the value of $num_lineas will decrease by 1.
In terms of your problem with nohup, I would recommend looking into using something like screen, which is known to be a better solution for maintaining a session between logins.

using Linux shell script to edit and rename a file

I am trying to execute a command using Vi or ex to edit a file by deleting the first five lines, replace x with y, remove extra spaces at the end of each line but retain the carraige returns, and remove the last eight lines of the file, then rename the file into a shell script and run the new script from the current script.
This will be something that is scheduled in cron. I have been looking for a simple way to do it using the command line or a Vim script or something.
Any ideas? The format of the input file does not change, just the amount of lines, so I can't specify the line numbers for the last eight lines.
You actually have about half a dozen questions here. Here's an answer for the first five which are probably the ones you'll have the most difficulty solving:
sed -e ':label' -n -e '1d' -e 's/x/y/g' -e 's/[ \t]*$//g' -e '1,9!{P;N;D};N;b label' file.txt > script.sh
Vi is an interactive editor. You probably don't want to use it for something that'll be run by cron. Also, I agree with the comments saying this is probably a bad idea. Be that as it may:
printf 'one\ntwo\nthree\nfour\nfive\necho x \n1\n2\n3\n4\n5\n6\n7\n8\n' \
| sed '1,5d;s/ *$//;s/x/y/' \
| tail -r | sed 1,8d | tail -r \
| sh
Our first sed script does most of the work. We reverse the lines with tail -r, then delete the first 8 lines, then reverse again. That trims off the last 8 lines.
Note that on Linux systems (or any with GNU coreutils), you may also have a tac command which reverse lines, but tail -r is more portable.
Also, the final | sh simply runs the output. If you REALLY want to save this as a script, you can do that by redirecting the output to a file ... but I'll leave at least that to your imagination. Can't do all your scripting for you, can we?! :-)
To edit a file by a script, you could use ed (even if it hard to learn or remember).
You could also use some scripting language (Python, Perl, AWK, Ruby) to achieve your goal.

grep based on blacklist -- without procedural code?

It's a well-known task, simple to describe:
Given a text file foo.txt, and a blacklist file of exclusion strings, one per line, produce foo_filtered.txt that has only the lines of foo.txt that do not contain any exclusion string.
A common application is filtering compiler warnings from a build log, but to ignore warnings on files that are not yours. The file foo.txt is the warnings file (itself filtered from the build log), and a blacklist file excluded_filenames.txt with file names, one per line.
I know how it's done in procedural languages like Perl or AWK, and I've even done it with combinations of Linux commands such as cut, comm, and sort.
But I feel that I should be really close with xargs, and just can't see the last step.
I know that if excluded_filenames.txt has only 1 file name in it, then
grep -v foo.txt `cat excluded_filenames.txt`
will do it.
And I know that I can get the filenames one per line with
xargs -L1 -a excluded_filenames.txt
So how do I combine those two into a single solution, without explicit loops in a procedural language?
Looking for the simple and elegant solution.
You should use the -f option (or you can use fgrep which is the same):
grep -vf excluded_filenames.txt foo.txt
You could also use -F which is more directly the answer to what you asked:
grep -vF "`cat excluded_filenames.txt`" foo.txt
from man grep
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.

Highlight text similar to grep, but don't filter out text [duplicate]

This question already has answers here:
Colorized grep -- viewing the entire file with highlighted matches
(24 answers)
Closed 7 years ago.
When using grep, it will highlight any text in a line with a match to your regular expression.
What if I want this behaviour, but have grep print out all lines as well? I came up empty after a quick look through the grep man page.
Use ack. Checkout its --passthru option here: ack. It has the added benefit of allowing full perl regular expressions.
$ ack --passthru 'pattern1' file_name
$ command_here | ack --passthru 'pattern1'
You can also do it using grep like this:
$ grep --color -E '^|pattern1|pattern2' file_name
$ command_here | grep --color -E '^|pattern1|pattern2'
This will match all lines and highlight the patterns. The ^ matches every start of line, but won't get printed/highlighted since it's not a character.
(Note that most of the setups will use --color by default. You may not need that flag).
You can make sure that all lines match but there is nothing to highlight on irrelevant matches
egrep --color 'apple|' test.txt
Notes:
egrep may be spelled also grep -E
--color is usually default in most distributions
some variants of grep will "optimize" the empty match, so you might want to use "apple|$" instead (see: https://stackoverflow.com/a/13979036/939457)
EDIT:
This works with OS X Mountain Lion's grep:
grep --color -E 'pattern1|pattern2|$'
This is better than '^|pattern1|pattern2' because the ^ part of the alternation matches at the beginning of the line whereas the $ matches at the end of the line. Some regular expression engines won't highlight pattern1 or pattern2 because ^ already matched and the engine is eager.
Something similar happens for 'pattern1|pattern2|' because the regex engine notices the empty alternation at the end of the pattern string matches the beginning of the subject string.
[1]: http://www.regular-expressions.info/engine.html
FIRST EDIT:
I ended up using perl:
perl -pe 's:pattern:\033[31;1m$&\033[30;0m:g'
This assumes you have an ANSI-compatible terminal.
ORIGINAL ANSWER:
If you're stuck with a strange grep, this might work:
grep -E --color=always -A500 -B500 'pattern1|pattern2' | grep -v '^--'
Adjust the numbers to get all the lines you want.
The second grep just removes extraneous -- lines inserted by the BSD-style grep on Mac OS X Mountain Lion, even when the context of consecutive matches overlap.
I thought GNU grep omitted the -- lines when context overlaps, but it's been awhile so maybe I remember wrong.
You can use my highlight script from https://github.com/kepkin/dev-shell-essentials
It's better than grep cause you can highlight each match with it's own color.
$ command_here | highlight green "input" | highlight red "output"
Since you want matches highlighted, this is probably for human consumption (as opposed to piping to another program for instance), so a nice solution would be to use:
less -p <your-pattern> <your-file>
And if you don't care about case sensitivity:
less -i -p <your-pattern> <your-file>
This also has the advantage of having pages, which is nice when having to go through a long output
You can do it using only grep by:
reading the file line by line
matching a pattern in each line and highlighting pattern by grep
if there is no match, echo the line as is
which gives you the following:
while read line ; do (echo $line | grep PATTERN) || echo $line ; done < inputfile
If you want to print "all" lines, there is a simple working solution:
grep "test" -A 9999999 -B 9999999
A => After
B => Before
If you are doing this because you want more context in your search, you can do this:
cat BIG_FILE.txt | less
Doing a search in less should highlight your search terms.
Or pipe the output to your favorite editor. One example:
cat BIG_FILE.txt | vim -
Then search/highlight/replace.
If you are looking for a pattern in a directory recursively, you can either first save it to file.
ls -1R ./ | list-of-files.txt
And then grep that, or pipe it to the grep search
ls -1R | grep --color -rE '[A-Z]|'
This will look of listing all files, but colour the ones with uppercase letters. If you remove the last | you will only see the matches.
I use this to find images named badly with upper case for example, but normal grep does not show the path for each file just once per directory so this way I can see context.
Maybe this is an XY problem, and what you are really trying to do is to highlight occurrences of words as they appear in your shell. If so, you may be able to use your terminal emulator for this. For instance, in Konsole, start Find (ctrl+shift+F) and type your word. The word will then be highlighted whenever it occurs in new or existing output until you cancel the function.

Resources