The diff command for files with empty content - linux

If I want to get the difference between the 2 directories, I use the command below:
diff -aruN dir1/ dir2/ > dir.patch
so the dir.patch file should comprise all differences I want, right?
But if dir2/ contains a file with empty content, and that file is not existent in dir1/, for example,
dir1/
dir2/empty_content_file.txt ------ with empty content.
Then the diff command will not generate any patch for empty_content_file.txt, but it is a needed file.
Is there any expertise or alternative way to do this?
Thank you in advance.

It's because you're using -N option, which is added to explicitly treat absent file as empty. man diff says :
-N, --new-file
treat absent file as empty

The screenshot below shows the operation of "diff -aru" command for inexistent files in the first directory, a message "Only in xxx" will show.

Related

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

How to open a "-" dashed filename using terminal?

I tried gedit, nano, vi, leafpad and other text editors , it won't open, I tried cat and other file looking commands, and I ensure you it's a file not a directory!
This type of approach has a lot of misunderstanding because using - as an argument refers to STDIN/STDOUT i.e dev/stdin or dev/stdout .So if you want to open this type of file you have to specify the full location of the file such as ./- .For eg. , if you want to see what is in that file use cat ./-
Both cat < - and ./- command will give you the output
you can use redirection
cat < -file_name
It looks like the rev command doesn't treat - as a special character.
From the man page
The rev utility copies the specified files to standard output, reversing the order of characters in every line.
so
rev - | rev
should show what's in the file in the correct order.
I tried with pico or vi command.pico readme which allowed me open in editor and read the contents.
if you want to open this type of file you have to specify the full location of the file such as ./- .For eg. , if you want to see what is in that file use cat ./-
cat ./- is the syntax that reveals the correct password for bandit the "rev -" reveals something else

Recursive grep with include giving incorrect results for current folder

I have created a test directory structure:
t1.html
t2.php
a/t1.html
a/t2.php
b/t1.html
b/t2.php
All files contain the string "HELLO".
The following commands are run from the root folder above:
> grep -r "HELLO" *
b/t1.html:HELLO
b/t2.php:HELLO
c/t1.html:HELLO
c/t2.php:HELLO
t1.html:HELLO
t2.php:HELLO
> grep -r --include=*.html "HELLO" *
b/t1.html:HELLO
c/t1.html:HELLO
t2.php:HELLO
Why is it including the correct .html files from the sub-directories, but the .php file from the current directory?
If I pop up a level to the directory above my whole structure, then it gives following result:
grep -r --include=*.html "HELLO" *
a/t1.html:HELLO
a/c/t1.html:HELLO
a/b/t1.html:HELLO
This is what I expected when ran from within my structure.
I assume I can achieve the goal using find+grep together, but I thought this was valid usage of grep.
Thanks for any help.
Andy
Use a dot instead of the asterisk:
grep -r HELLO .
Asterisk gets evaluated by the shell and replaced with the list of all the files in the current directory (whose names don't start with a dot). All of them are then grepped recursively.

How to find (grep) text for files in a perforce changelist?

How to grep/find for a particular text in all files within a pending changelist?
My use case:
I have a debug_flag in my code and would want to make sure I do not check-in any code with the debug_flag which will cause a compiler error for all others. (Not for me since I have the debug_flag declared locally)
p4 describe changelist# gives you the list of the files in a changelist, but it has some extra information, and the paths are with respect to the depo. Example:
p4 describe 12334
output:
Change 12334 by me on 2014/01/04 00:57:08 pending
Some test changelist
Affected files ...
... //depot/path/to/my/files/file1#15 edit
... //depot/path/to/my/files/file2#12 edit
With a few search/replace or a simple perls script, you can change this text output to a list of files with actual path and then run grep on them:
xargs grep "debug_flag" < file_list.txt

Manually merge two files using diff

I'd like to merge two files by doing the following:
Output the diff of the two files into a temp file and
Manually select the lines I want to copy/save.
The problem here is that diff -u only gives me a file lines of context, while I want to output the entire file in a unified format.
Is there any way diff can do this?
One option that might fit the bill for you,
sdiff : side-by-side diff of files.
sdiff -o merged.file left.file right.file
Once there, it will prompt you with what lines you want to keep from which file. Hit ? and then enter for a little help. Also man sdiff with the detailed goods.
(In my distro, these come packaged in the "diffutils" package [fedora,centos])
If you need to automate the process, you might want to try the util merge, which will mark conflicts in the files. However, that might put you back at square one.
"I want to output the entire file in a unified format. Is there any way diff can do this?"
Yes.
diff -U 9999999 file1.txt file2.txt > diff.txt
This should work, provided your files are less than 10 million lines long.
You can merge/combine the two files with diff using --
diff --line-format %L file1 file2
The easy answer is to use the -D flag to merge the files and surround the differences with C style #ifdef statements.
From the documentation:
-D NAME --ifdef=NAME
Output merged file to show `#ifdef NAME' diffs.
You can use it as follows:
$ diff -D NEWSTUFF file1 file2 > merged_file
I usually then just open the merged file in an editor and resolve the merge conflicts by hand.
You also can use options to output an ed script, etc.
If you are an emacs user, you can do this directly in emacs using the "emerge" tool:
https://www.gnu.org/software/emacs/manual/html_node/emacs/Emerge.html
Issuing M-x emerge-files will open an interactive prompt with a view of files A, B, and the merged file to allow choosing text that differs between files A & B, inserting part of A into B, and more.

Resources