How can I find which lines in a certain file are not started by lines from another file using bash? - linux

I have two text files, A and B:
A:
a start
b stop
c start
e start
B:
b
c
How can I find which lines in A are not started by lines from B using shell(bash...) command. In this case, I want to get this answer:
a start
e start
Can I implement this using a single line of command?

This should do:
sed '/^$/d;s/^/^/' B | grep -vf - A
The sed command will take all non-empty lines (observe the /^$/d command) from the file B and prepend a caret ^ in front of each line (so as to obtain an anchor for grep's regexp), and spits all this to stdout. Then grep, with the -f option (which means take all patterns from a file, which happens to be stdin here, thanks to the - symbol) and does an invert matching (thanks to the -v option) on file A. Done.

I think this should do it:
sed 's/^/\^/g' B > C.tmp
grep -vEf C.tmp A
rm C.tmp

You can try using a combination of xargs, cat, and grep
Save the first letters of each line into FIRSTLETTERLIST. You can do this with some cat and sed work.
The idea is to take the blacklist and then match it against the interesting file.
cat file1.txt | xargs grep ^[^[$FIRSTLETTERLIST]]
This is untested, so I won't guarantee it will work, but it should point you in the right direction.

Related

Linux - Delete all lines from a given line number

I am trying to delete a file's contents from a supplied line number using sed. The problem is that sed isn't accepting the variable I supply to it
line_num=$(grep -n "debited" file.csv | sed -n '2 s/:.*//p') && sed -i.bak "$line_num,$d" file.csv
The idea is to delete all lines from a file after & including the second occurence of the pattern.
I'm not stubborn with sed. Awk & perl could do too.
Seems like you want to delete the rest of the file after a second showing of a pattern (debited), including that line.
Then can truncate it, ising tell for the length of what's been read up to that line
perl -e'while (<>) {
if ( ($cnt += /debited/) == 2 ) { truncate $ARGV, $len; exit }
$len = tell;
}' file
Here the $ARGV variable has the "current" file (when reading from <>). Feel free to introduce a variable with the pattern instead of the literal (debited), based on your context.
This can be made to look far nicer in a little script but it seems that a command-line program ("one-liner") is needed in the question.
I always suggest ed for editing files over trying to use sed to do it; a program intended from the beginning to work with a file instead of a stream of lines just works better for most tasks.
The idea is to delete all lines from a file after & including the second occurence[sic] of the pattern
Example:
$ cat demo.txt
a
b
c
debited 12
d
e
debited 14
f
g
h
$ printf "%s\n" '/debited/;//,$d' w | ed -s demo.txt
$ cat demo.txt
a
b
c
debited 12
d
e
The ed command /pattern/;//,$d first sets the current line cursor to the first one that matches the basic regular expression pattern, then moves it to the next match of the pattern and deletes everything from there to the end of the file. Then w writes the changed file back to disk.
you're doing lot's of unnecessary steps, this will do what you want.
$ awk '/debited/{c++} c==2{exit}1' file
delete second occurrence of the pattern and everything after it.
To replace the original file (and create backup)
$ awk ... file > t && mv -b --suffix=.bak t file

how to print last few lines when a pattern is matched using sed?

I want to last few lines when a pattern is matched in a file using sed.
if file has following entries:
This is the first line.
This is the second line.
This is the third line.
This is the forth line.
This is the Last line.
so, search for pattern, "Last" and print last few lines ..
Find 'Last' using sed and pipe it to tail command which print last n lines of the file -n specifies the no. of lines to be read from end of the file, here I am reading last 2 lines of the file.
sed '/Last/ p' yourfile.txt|tail -n 2
For more info on tail use man tail.
Also, | symbol here is known as a pipe(unnamed pipe), which helps in inter-process communication. So, in simple words sed feeds data to tail command using pipe.
I assume you mean "find the pattern and also print the previous few lines". grep is your friend: to print the previous 3 lines:
$ grep -B 3 "Last" file
This is the second line.
This is the third line.
This is the forth line.
This is the Last line.
-B n for "before". There's also -A n ("after"), and -C n ("context", both before and after).
This might work for you (GNU sed):
sed ':a;$!{N;s/\n/&/2;Ta};/Last/P;D' file
This will print the line containing Last and the two previous lines.
N.B. This will only print the lines before the match once. Also more lines can by shown by changing the 2 to however many lines you want.

print the duplicate lines using the sed command?

I am trying to print the duplicate lines in a file using the sed command.
In a file I have the following contents:
hi
hello
hi
how
hello
how can I print the duplicate lines in this file using sed command??
example: the output should be:
hi
hello
Not sure why it has to be in sed when you can use the uniq binary. Anywho, the file needs to be sorted so we have to do that first.
Using uniq and my preferred way:
$ sort file | uniq -d
hello
hi
Using GNU sed:
$ sort file | sed '$!N; s/^\(.*\)\n\1$/\1/; t; D'
hello
hi
We read the next line from input with the N command which appends the next line to pattern space separated by "\n" character.
$! prevents it from doing on the last line.
The substitution replaces two repeating strings with one.
The t command takes the script to the end where the current pattern space gets printed automatically.
If the substitution was not successful, D executes, deleting the non-repeated string.
The cycle continues and this way only the duplicate lines get printed once.
You can use process substitution if you please by doing <(sort file) to remove pipes.
Try something like:
sort file.txt | uniq -d
Sort the file and then print duplicate lines. If you wish to ignore the case then use -i option in uniq command.

Highlight text similar to grep, but don't filter out text [duplicate]

This question already has answers here:
Colorized grep -- viewing the entire file with highlighted matches
(24 answers)
Closed 7 years ago.
When using grep, it will highlight any text in a line with a match to your regular expression.
What if I want this behaviour, but have grep print out all lines as well? I came up empty after a quick look through the grep man page.
Use ack. Checkout its --passthru option here: ack. It has the added benefit of allowing full perl regular expressions.
$ ack --passthru 'pattern1' file_name
$ command_here | ack --passthru 'pattern1'
You can also do it using grep like this:
$ grep --color -E '^|pattern1|pattern2' file_name
$ command_here | grep --color -E '^|pattern1|pattern2'
This will match all lines and highlight the patterns. The ^ matches every start of line, but won't get printed/highlighted since it's not a character.
(Note that most of the setups will use --color by default. You may not need that flag).
You can make sure that all lines match but there is nothing to highlight on irrelevant matches
egrep --color 'apple|' test.txt
Notes:
egrep may be spelled also grep -E
--color is usually default in most distributions
some variants of grep will "optimize" the empty match, so you might want to use "apple|$" instead (see: https://stackoverflow.com/a/13979036/939457)
EDIT:
This works with OS X Mountain Lion's grep:
grep --color -E 'pattern1|pattern2|$'
This is better than '^|pattern1|pattern2' because the ^ part of the alternation matches at the beginning of the line whereas the $ matches at the end of the line. Some regular expression engines won't highlight pattern1 or pattern2 because ^ already matched and the engine is eager.
Something similar happens for 'pattern1|pattern2|' because the regex engine notices the empty alternation at the end of the pattern string matches the beginning of the subject string.
[1]: http://www.regular-expressions.info/engine.html
FIRST EDIT:
I ended up using perl:
perl -pe 's:pattern:\033[31;1m$&\033[30;0m:g'
This assumes you have an ANSI-compatible terminal.
ORIGINAL ANSWER:
If you're stuck with a strange grep, this might work:
grep -E --color=always -A500 -B500 'pattern1|pattern2' | grep -v '^--'
Adjust the numbers to get all the lines you want.
The second grep just removes extraneous -- lines inserted by the BSD-style grep on Mac OS X Mountain Lion, even when the context of consecutive matches overlap.
I thought GNU grep omitted the -- lines when context overlaps, but it's been awhile so maybe I remember wrong.
You can use my highlight script from https://github.com/kepkin/dev-shell-essentials
It's better than grep cause you can highlight each match with it's own color.
$ command_here | highlight green "input" | highlight red "output"
Since you want matches highlighted, this is probably for human consumption (as opposed to piping to another program for instance), so a nice solution would be to use:
less -p <your-pattern> <your-file>
And if you don't care about case sensitivity:
less -i -p <your-pattern> <your-file>
This also has the advantage of having pages, which is nice when having to go through a long output
You can do it using only grep by:
reading the file line by line
matching a pattern in each line and highlighting pattern by grep
if there is no match, echo the line as is
which gives you the following:
while read line ; do (echo $line | grep PATTERN) || echo $line ; done < inputfile
If you want to print "all" lines, there is a simple working solution:
grep "test" -A 9999999 -B 9999999
A => After
B => Before
If you are doing this because you want more context in your search, you can do this:
cat BIG_FILE.txt | less
Doing a search in less should highlight your search terms.
Or pipe the output to your favorite editor. One example:
cat BIG_FILE.txt | vim -
Then search/highlight/replace.
If you are looking for a pattern in a directory recursively, you can either first save it to file.
ls -1R ./ | list-of-files.txt
And then grep that, or pipe it to the grep search
ls -1R | grep --color -rE '[A-Z]|'
This will look of listing all files, but colour the ones with uppercase letters. If you remove the last | you will only see the matches.
I use this to find images named badly with upper case for example, but normal grep does not show the path for each file just once per directory so this way I can see context.
Maybe this is an XY problem, and what you are really trying to do is to highlight occurrences of words as they appear in your shell. If so, you may be able to use your terminal emulator for this. For instance, in Konsole, start Find (ctrl+shift+F) and type your word. The word will then be highlighted whenever it occurs in new or existing output until you cancel the function.

Diff-ing files with Linux command

What Linux command allow me to check if all the lines in file A exist in file B? (it's almost like a diff, but not quite). Also file A has uniq lines, as is the case with file B as well.
The comm command compares two sorted files, line by line, and is part of GNU coreutils.
Are you looking for a better diff tool?
https://stackoverflow.com/questions/12625/best-diff-tool
So, what if A has
a
a
b
and b has
a
b
What would you want the output to be(yes or no)?
Use diff command.
Here is a useful vide with complete usage of diff command under 3 min
Click Here
if cat A A B | sort | uniq -c | egrep -e '^[[:space:]]*2[[:space:]]' > /dev/null; then
echo "A has lines that are not in B."
fi
If you do not redirect the output, you will get a list of all the lines that are in A that are not in B (except each line will have a 2 in front if it). This relies on the lines in A being unique, and the lines in B being unique.
If they aren't, and you don't care about counting duplicates, it's relatively simple to transform each file into a list of unique lines using sort and uniq.

Resources