Using Sed to extract the headers in multiple files - linux

I used head -3 to extract headers from some files that I needed to show header data I did this:
head -3 file1 file2 file3
and head -3 * works also.
I thought sed 3 file1 file2 file3 would work but it only gives the first file's output and not the others. I then tried sed -n '1,2p' file1 file2 file3. Again only the first file produced any output. I also tried with a wildcard sed -n '1,2p' filename* same result only the first file's output.
Everything I read seems like it should work. sed *filesnames*.
Thanks in advance

Assuming GNU sed as question is tagged linux. From GNU sed manual
-s
--separate By default, sed will consider the files specified on the command line as a single continuous long stream. This GNU sed
extension allows the user to consider them as separate files: range
addresses (such as ‘/abc/,/def/’) are not allowed to span several
files, line numbers are relative to the start of each file, $ refers
to the last line of each file, and files invoked from the R commands
are rewound at the start of each file.
Example:
$ cat file1
foo
bar
$ cat file2
123
456
$ sed -n '1p' file1 file2
foo
$ sed -n '3p' file1 file2
123
$ sed -sn '1p' file1 file2
foo
123
When using -i, the -s option is implied
$ sed -i '1chello' file1 file2
$ cat file1
hello
bar
$ cat file2
hello
456

Related

grep between two files

I want to find matching lines from file 2 when compared to file 1.
file2 contains multiple columns and column one contains information that could match file1.
I tried below commands and they didn't give any matching results (contents in file1 are definitely in file2) . I have used these commands previously to compare between different files and they worked.
grep -f file1 file2
grep -Fwf file1 file2
When i tried to grep whatever that's not matching, i get results
grep -vf file1 file2
file1 contains list of genes (754 genes) , one line each
ATM
ATP5B
ATR
ATRIP
ATRX
I have a feeling the problem is with my file1. When I tried to type several items manually in my file1 just to test, and do grep with file2, I get the matching lines from file2.
When I copied the contents of file1 (originally in excel) into notepad making a .txt file, I didn't get any matching results.
I can't see any problem with my file1. Any suggestion?
You said,
I copied the contents of file1 (originally in excel) into notepad making a .txt file
It's likely that the txt file contains carriage-return/linefeed pairs which are screwing up the grep. As I suggested in a comment, try this:
tr -d '\015' < file1 > file1a
grep -Fwf file1a file2
The tr invocation deletes all the carriage returns, giving you a proper Unix/Linux text file with only newlines (\n) as line terminators.
You said:
I can't see any problem with my file1.
Here's how to see the extra-carriage-return problem:
cat -v test1
Those little ^M markers at the end of each line are cat -v's way of showing you the carriage return control codes.
Addendum:
Carriage Return (CR) is decimal 13, hex 0x0d, octal 015, \r in C.
Line Feed (LF) is decimal 10, hex 0x0a, octal 012, \n in C.
Because it's an old-school utility, tr accepts octal (base 8) notation for control characters.
(I think in some versions tr -d '\r' would work, but I'm not sure, and anyway I'm not sure what version you have. tr -d '\015' should be universal.)
Simple shell script that performs grep for every input in file1.txt
#!/bin/bash
while read content; do
grep -q "$content" file2.txt
if [ $? -eq "0" ]; then
echo "$content" was found in file2 >> results.txt
fi
done < file1.txt
Let's suppose this is file2:
$ cat file2
a b ATM
c d e
f ATR g
Using grep and process substitution
We can get lines from file1 that match any of the columns in file2 via:
$ grep -wFf <(sed 's/[[:space:]]/\n/g' file2) file1
ATM
ATR
This works because it converts file2 to a form that grep understands:
$ sed 's/[[:space:]]/\n/g' file2
a
b
ATM
c
d
e
f
ATR
g
Using awk
$ awk 'FNR==NR{for (i=1;i<=NF;i++) seen[$i]; next} $0 in seen' file2 file1
ATM
ATR
Here, awk keeps track of every column that it sees in file2 and then print only those lines in file1 that match one of those columns
Try to use command
comm
it is a reversed version of diff

shell script to compare two files and write the difference to third file

I want to compare two files and redirect the difference between the two files to third one.
file1:
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
In case any file has # before /opt/c/c.sql, it should skip #
file2:
/opt/c/c.sql
/opt/a/a.sql
I want to get the difference between the two files. In this case, /opt/b/b.sql should be stored in a different file. Can anyone help me to achieve the above scenarios?
file1
$ cat file1 #both file1 and file2 may contain spaces which are ignored
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
/opt/h/m.sql
file2
$ cat file2
/opt/c/c.sql
/opt/a/a.sql
Do
awk 'NR==FNR{line[$1];next}
{if(!($1 in line)){if($0!=""){print}}}
' file2 file1 > file3
file3
$ cat file3
/opt/b/b.sql
/opt/h/m.sql
Notes:
The order of files passed to awk is important here, pass the file to check - file2 here - first followed by the master file -file1.
Check awk documentation to understand what is done here.
You can use some tools like cat, sed, sort and uniq.
The main observation is this: if the line is in both files then it is not unique in cat file1 file2.
Furthermore in cat file1 file2| sort, all doubles are in sequence. Using uniq -u we get unique lines and have this pipe:
cat file1 file2 | sort | uniq -u
Using sed to remove leading whitespace, empty and comment lines, we get this final pipe:
cat file1 file2 | sed -r 's/^[ \t]+//; /^#/ d; /^$/ d;' | sort | uniq -u > file3

merge contents of two files into one file in bash

I have two files which has following contents
File1
Line1file1
Line2file1
line3file1
line4file1
File2
Line1file2
Line2file2
line3file2
line4file2
I want to have these file's content merged to file3 as
File3
Line1file1
Line1file2
Line2file1
Line2file2
line3file1
line3file2
line4file1
line4file2
How do I merge the files consecutively from one file and another file in bash?
Thanks
You can always use paste command.
paste -d"\n" File1 File2 > File3
$ cat file1
Line1file1
Line2file1
line3file1
line4file1
$ cat file2
Line1file2
Line2file2
line3file2
line4file2
$ paste -d '\n' file1 file2 > file3
$ cat file3
Line1file1
Line1file2
Line2file1
Line2file2
line3file1
line3file2
line4file1
line4file2
paste is the way to go for this, but this alternative can be a useful approach if you ever need to add extra conditions or don't want to end up with blank lines when one file has more lines than the other or anything else that makes it a more complicated problem:
$ awk -v OFS='\t' '{print FNR, NR, $0}' file1 file2 | sort -n | cut -f3-
Line1file1
Line1file2
Line2file1
Line2file2
line3file1
line3file2
line4file1
line4file2
In Linux:
grep -En '.?' File1 File2 | sed -r 's/^[^:]+:([^:]+):(.*)$/\1 \2/g' \
| sort -n | cut -d' ' -f2- > File3
If you're on OS X, use -E instead of -r for the sed command. The idea is this:
Use grep to number the lines of each file.
Use sed to drop the file name and put the line number into a space-separated column.
Use sort -n to sort by the line number, which is stable and preserves file order.
Drop the line number with cut and redirect to the file.
Edit: Using paste is much simpler but will result in blank lines if one of your files is longer than the other, this method will only continue with lines from the longer file.
while read line1 && read -u 3 line2
do
printf "$line1\n" >> File3
printf "$line2\n" >> File3
done < File1 3<File2
you can use file descriptors, to read from two files and print each line to the out file.

Separating a joined file to original files in Linux

I know that to append or join multiple files in Linux, we can use the command: cat file1 >> file2.
But I couldn't find any command to separate file1 from file2 after joining them. In other words, I want both original file1 and file2 back again. I tried to use the split command but it just dismembers a file into multiple files with the same size.
Is there a way to do it?
There is no such command, since no information about what was file1 or file2 is retained. The new combined file is just a data stream.
In order to "split" them back up, you need rules about how to do so (such as, how many bytes file1 and file2 were).
When you perform the concatenation, the system doesn't keep track of how the resulting file was created. So it has no way of remembering where the original split was located in that file.
Can you explain what you are trying to do ?
No problem, as long as you still have file1:
$ echo foobar >file1
$ echo blah >file2
$ cat file1 >> file2
$ truncate -s $(( $(stat -c '%s' file2) - $(stat -c '%s' file1) )) file2
$ cat file2
blah
Also, instead of stat -c '%s' filename you can use wc -c filename | cut -f 1 -d ' ', which is longer but more portable.

how can I move all lines beginning in 'foobar' to the end of a file?

Say I have a script with a number of lines beginning foobar
I would like to move all of the lines to the end of the document while keeping their order
e.g. go from:
# There's a Polar Bear
# In our Frigidaire--
foobar['brangelina'] <- 2
# He likes it 'cause it's cold in there.
# With his seat in the meat
foobar['billybob'] <- 1
# And his face in the fish
to
# There's a Polar Bear
# In our Frigidaire--
# He likes it 'cause it's cold in there.
# With his seat in the meat
# And his face in the fish
foobar['brangelina'] <- 2
foobar['billybob'] <- 1
This is as far as I have gotten:
grep foobar file.txt > newfile.txt
sed -i 's/foobar//g' foo.txt
cat newfile.txt > foo.txt
This might work:
sed '/^foobar/{H;$!d;s/.*//};$G;s/\n*//' input_file
EDIT: Amended for the corner case when foobar is on the last line
This will do:
grep -v ^foobar file.txt > tmp1.txt
grep ^foobar file.txt > tmp2.txt
cat tmp1.txt tmp2.txt > newfile.txt
rm tmp1.txt tmp2.txt
The -v option returns all the lines which do not match the given pattern. The ^ marks the beginning of a line, so ^foobar matches lines beginning with foobar.
grep -v ^foobar file.txt > file1.txt
grep ^foobar file.txt > file2.txt
cat file2.txt >> file1.txt
grep -v ^foobar file.txt >newfile.txt
grep ^foobar file.txt >>newfile.txt
no need for temporary file
You can also do:
vim file.txt -c 'g/^foobar/m$' -c 'wq'
The -c switch means an Ex command follows, the g commands operates on all lines containing the pattern given, and the action is here m$ which means “move to end of file” (it preserves order). wq weans “save and exit vim”.
If this is too slow you can also prevent vim from reading vimrc:
vim -u NONE file.txt -c 'g/^foobar/m$' -c 'wq'

Resources