linux merge multiple files, but skip lines starting with '#' - linux

I have a list of 10 files that I want to merge into one file.
file1.txt
file2.txt
...
file10.txt
I normally do this with cat
cat file*.txt > merged_file.txt
However, I don't want the lines starting with '#' to be included in the merged_file.txt. How do I do this?

Something like this:
cat file*.txt | egrep -v '^#.*$' > merged_file.txt

Related

cat command merging two *.txt with missing columns mac osx

I like to use the cat command to join several *.txt files under mac osx.
my first file1.txt looks like:
a;b;c;d
1;2;3;4
second file2.txt:
a;b
5;6
7;8
what I want:
a;b;c;d
1;2;3;4
5;6;;
7;8;;
my question: can I skip the header from the second file in the output file? And how is cat dealing with the missing columns? writing NaNs?
maybe this command could do it?
head -1 file1.txt > all.txt;
tail -n +2 -q file*.txt >> all.txt
I don't think the cat command alone will deal with removing the headers or mark any missing columns, since all it does is concatenate files. But if you know the highest possible number of columns, you can do something like this:
cat file1.txt <( tail -n+2 file2.txt ) | gawk -F';' -v OFS=';' '{NF=4}1'
Where NF=4 is the highest number of columns (in your example, 4).
The command above is concatenating file1.txt with a header-less version of file2.txt, using the output of a subcommand as input (operator <( ) ). You can use the <( ) as many times you want for each file you're wanting to concatenate. The final command, gawk, was adapted from this answer) and it's padding out the column delimiters for you.
(note: use brew install gawk if gawk isn't found; Mac OS X's awk won't work)
If not having the first header doesn't bother you and you don't want to use cat, you could do:
gawk -F';' -v OFS=';' '{NF=4}1' file*.txt | egrep -v '^a;b'

how to copy lines 10 to 15 of a file into another file, in unix?

I want to copy lines 10 to 15 of a file into another file in Unix.
I am having files file1.txt and file2.txt.
I want to copy lines 10 to 15 from file1.txt to file2.txt.
Open a terminal with a shell then
sed -n '10,15p' file1.txt > file2.txt
Simple & easy.
If you want to append to the end instead of wiping file2.txt, use >> for redirection.
sed -n '10,15p' file1.txt >> file2.txt
^^
AWK is also a powerful command line text manipulator:
awk 'NR>=10 && NR<=15' file1.txt > file2.txt
In complement to the previous answer, you can use one of the following 3 solutions.
sed
Print only the lines in the range and redirect it to the output file
sed -n '10,15p' file1.txt > file2.txt
head/tail combination
Use head and tail to cut the file and to get only the range you need before redirecting the output to a file
head -n 15 file1.txt | tail -n 6 > file2.txt
awk
Print only the lines in the range and redirect it to the output file
awk 'NR>=10 && NR<=15' file1.txt > file2.txt

Linux diff get only line number in the output

I want to use linux diff command to get the following output:
2,4c2,4
I only want to know the line numbers where the files are different. I don't want the actual line on the console.
Eg:
If I will execute the following command:
diff file1.txt file2.txt
I would like the following output:
2,4c2,4
I don't want the output:
2,4c2,4
< I need to run the laundry.
< I need to wash the dog.
< I need to get the car detailed.
---
> I need to do the laundry.
> I need to wash the car.
> I need to get the dog detailed.
I went through the manual of diff command but I wasn't able to find any option that would allow me to achieve what I want.
Pipe it to grep and only show lines beginning with numbers.
diff file1.txt file2.txt | grep '^[1-9]'
pass the flag -f .
-sh-4.1$ cat file1.txt
I need to run the laundry.
I need to wash the dog.
difdferen line
I need to get the car detailed.
-sh-4.1$ cat file2.txt
I need to run the laundry.
I need to wash the dog.
I need to get the car detailed.
-sh-4.1$ diff -f file1.txt file2.txt
d3
Edited as per #Barmar comment: for it to work on changed lines .. you can just filter lines with "<" or ">" by asking for the inverse of lines that start with "<" or ">"
First : plain diff :
-sh-4.1$ diff file*
3d2
< difdferen line
4a4
> different line in file2
-sh-4.1$
with grep to filter lines that start with < or >
-sh-4.1$ diff file* | egrep -v "^<|^> |^-"
3,4d2
5a4
3d2
4a4
simplified version suggested by #Barmar
-sh-4.1$ diff file1.txt file2.txt | egrep -v "^[-<>]"
3,4d2
5a4

paste linux script

I have a small question and would appreciate your help in it please.
I need to merge different text files together using paste command as :
paste -d, ~/Desktop/*.txt > ~/Desktop/Out/merge.txt
However, the files got merged out of order. (text files are numbered 1, 2, 3, etc.)
I am using *.txt since different number of files exist for different scenarios.
Would you mind helping me in it please.
If you use modern bash you can write:
paste -d, ~/Desktop/{1..10}.txt > ~/Desktop/Out/merge.txt
If not, you must use something like:
paste -d, $(seq 1 10 | sed 's#.*#~/Desktop/&.txt) > ~/Desktop/Out/merge.txt
If you don't know which files you have in the directory,
you can list and sort them:
cd ~/Desktop/
paste -d, $(ls -1d *.txt| sort -n) > ~/Desktop/Out/merge.txt
Example:
$ touch {1..20}.txt
$ echo $(ls -1 | sort -n)
1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt 10.txt 11.txt 12.txt 13.txt 14.txt 15.txt 16.txt 17.txt 18.txt 19.txt 20.txt
Example2:
$ echo hello > 1.txt
$ echo dear > 5.txt
$ echo friend > 11.txt
$ paste -d, $(ls -1d *.txt| sort -n)
hello,dear,friend
Here's a rather long way of doing the same but in one line.
paste -d, $(ls ~/Desktop/*.txt | awk -F/ '{print $NF"/"$0}' | sort -n | cut -d/ -f2-) > ~/Desktop/merge.txt
I like one liners :-)
paste -d, $(ls ~/Desktop/*.txt) > ~/Desktop/Out/merge.txt
The * is being replaced by an alphabetically sorted list of filenames of your directory.
3.5.8 Filename Expansion
Bash scans each word for the characters ‘*’, ‘?’, and ‘[’. If one of these characters appears, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of file names matching the pattern.
So the filenaming does not have to be consecutive ;)

Comparing two files in linux terminal

There are two files called "a.txt" and "b.txt" both have a list of words. Now I want to check which words are extra in "a.txt" and are not in "b.txt".
I need a efficient algorithm as I need to compare two dictionaries.
if you have vim installed,try this:
vimdiff file1 file2
or
vim -d file1 file2
you will find it fantastic.
Sort them and use comm:
comm -23 <(sort a.txt) <(sort b.txt)
comm compares (sorted) input files and by default outputs three columns: lines that are unique to a, lines that are unique to b, and lines that are present in both. By specifying -1, -2 and/or -3 you can suppress the corresponding output. Therefore comm -23 a b lists only the entries that are unique to a. I use the <(...) syntax to sort the files on the fly, if they are already sorted you don't need this.
If you prefer the diff output style from git diff, you can use it with the --no-index flag to compare files not in a git repository:
git diff --no-index a.txt b.txt
Using a couple of files with around 200k file name strings in each, I benchmarked (with the built-in timecommand) this approach vs some of the other answers here:
git diff --no-index a.txt b.txt
# ~1.2s
comm -23 <(sort a.txt) <(sort b.txt)
# ~0.2s
diff a.txt b.txt
# ~2.6s
sdiff a.txt b.txt
# ~2.7s
vimdiff a.txt b.txt
# ~3.2s
comm seems to be the fastest by far, while git diff --no-index appears to be the fastest approach for diff-style output.
Update 2018-03-25 You can actually omit the --no-index flag unless you are inside a git repository and want to compare untracked files within that repository. From the man pages:
This form is to compare the given two paths on the filesystem. You can omit the --no-index option when running the command in a working tree controlled by Git and at least one of the paths points outside the working tree, or when running the command outside a working tree controlled by Git.
Try sdiff (man sdiff)
sdiff -s file1 file2
You can use diff tool in linux to compare two files. You can use --changed-group-format and --unchanged-group-format options to filter required data.
Following three options can use to select the relevant group for each option:
'%<' get lines from FILE1
'%>' get lines from FILE2
'' (empty string) for removing lines from both files.
E.g: diff --changed-group-format="%<" --unchanged-group-format="" file1.txt file2.txt
[root#vmoracle11 tmp]# cat file1.txt
test one
test two
test three
test four
test eight
[root#vmoracle11 tmp]# cat file2.txt
test one
test three
test nine
[root#vmoracle11 tmp]# diff --changed-group-format='%<' --unchanged-group-format='' file1.txt file2.txt
test two
test four
test eight
You can also use: colordiff: Displays the output of diff with colors.
About vimdiff: It allows you to compare files via SSH, for example :
vimdiff /var/log/secure scp://192.168.1.25/var/log/secure
Extracted from: http://www.sysadmit.com/2016/05/linux-diferencias-entre-dos-archivos.html
Also, do not forget about mcdiff - Internal diff viewer of GNU Midnight Commander.
For example:
mcdiff file1 file2
Enjoy!
Use comm -13 (requires sorted files):
$ cat file1
one
two
three
$ cat file2
one
two
three
four
$ comm -13 <(sort file1) <(sort file2)
four
You can also use:
sdiff file1 file2
To display differences side by side within your terminal!
diff a.txt b.txt | grep '<'
can then pipe to cut for a clean output
diff a.txt b.txt | grep '<' | cut -c 3
Here is my solution for this :
mkdir temp
mkdir results
cp /usr/share/dict/american-english ~/temp/american-english-dictionary
cp /usr/share/dict/british-english ~/temp/british-english-dictionary
cat ~/temp/american-english-dictionary | wc -l > ~/results/count-american-english-dictionary
cat ~/temp/british-english-dictionary | wc -l > ~/results/count-british-english-dictionary
grep -Fxf ~/temp/american-english-dictionary ~/temp/british-english-dictionary > ~/results/common-english
grep -Fxvf ~/results/common-english ~/temp/american-english-dictionary > ~/results/unique-american-english
grep -Fxvf ~/results/common-english ~/temp/british-english-dictionary > ~/results/unique-british-english
Using awk for it. Test files:
$ cat a.txt
one
two
three
four
four
$ cat b.txt
three
two
one
The awk:
$ awk '
NR==FNR { # process b.txt or the first file
seen[$0] # hash words to hash seen
next # next word in b.txt
} # process a.txt or all files after the first
!($0 in seen)' b.txt a.txt # if word is not hashed to seen, output it
Duplicates are outputed:
four
four
To avoid duplicates, add each newly met word in a.txt to seen hash:
$ awk '
NR==FNR {
seen[$0]
next
}
!($0 in seen) { # if word is not hashed to seen
seen[$0] # hash unseen a.txt words to seen to avoid duplicates
print # and output it
}' b.txt a.txt
Output:
four
If the word lists are comma-separated, like:
$ cat a.txt
four,four,three,three,two,one
five,six
$ cat b.txt
one,two,three
you have to do a couple of extra laps (forloops):
awk -F, ' # comma-separated input
NR==FNR {
for(i=1;i<=NF;i++) # loop all comma-separated fields
seen[$i]
next
}
{
for(i=1;i<=NF;i++)
if(!($i in seen)) {
seen[$i] # this time we buffer output (below):
buffer=buffer (buffer==""?"":",") $i
}
if(buffer!="") { # output unempty buffers after each record in a.txt
print buffer
buffer=""
}
}' b.txt a.txt
Output this time:
four
five,six

Resources