Set Differences of Folders A and B ie |A-B| = C

Set Differences of Folders A and B ie |A-B| = C - linux

How can I get a set of the difference of files in folder A that are not in folder B, this should be output to folder C. I other words C would contain all the files that do not exist in B but exist in A? Is there a chain command in linux?

You might use the diff(1) command, perhaps as diff -Naur. BTW, patch(1) is handy too..
But you really want some version control system (a.k.a. revision control). Learn more about git. I strongly recommend you to use it, even for personal small projects.
Later, you might consider publishing some (perhaps most) of your code as free software, e.g. on github

May not be perfect. Try this:
(ls -1 A 2>/dev/null && ls -1 B 2>/dev/null) | sort | uniq -u | xargs -I REPLACE cp A/REPLACE C

Would the rsync command be useful? If you run it with
--dry-run
it would report how it would make one directory the same as the other, without actually changing anything.

Related

How to find the source of a copied file?

I have a file that I copied sometime back, but I forgot the source of it. Is there a way to find the source of the copied file? I don't remember which terminal I have used to try and check with Esc+P
Command used: cp -rf $source/file $destination/file
Thanks in advance!

You could try history | grep your_filename.

A Linux system has many files (and if you think of /proc/, it could change at every moment). And some other process can write or create (or append or truncate) files (e.g. some crontab(1) job...)
Assume you do know some parent directory containing the source file. Suppose it is /home/foo.
Then, you might use find(1) and some hashing command like md5sum(1) to compute and collect the hash of every file.
Use the property that two files A and B with identical contents (a sequence of bytes) have the same md5sum. Of course, the converse is false, but in practice unlikely.
So run first
find /home/foo -type f -exec md5sum '{}' \; > /tmp/foo-md5
then do seekingmd5=$(md5sum A )
then grep $seekingmd5 /tmp/foo-md5 will find lines for files having the same md5 than your original A
Depending on your filesystem and hardware, this could take hours.
You could accelerate slightly things by writing a C program using nftw(3) with md5init etc...

Linux command select specific directory

I have only two folders under a given directory. Is there any method to choose the second directory based on the order and not on the folder name?
Example: (I want to enter under doc2)
#ls
doc1 doc2

If you really want to use ls,
cd "$(ls -d */ | sed -n '2p')"
selects enters the second directory listed by it, independently of the number of directories provided by ls.
Parsing ls output is not a good idea generally, although it will work in most cases and will cause no harm if you are just using it in your interactive shell for fast navigation. You should not use this for serious programming.

You can use the tail command to get the last line
ls |tail -1

bash: get path from current directory given sub-directory name

Trying to write a script to clean up environment files after a resource is deleted. The problem is all the script is given as input is the name of the resource (this cannot be changed) with zero identifying information beyond that. How can I find the path of the directory the resource is sitting in?
The directory is set up a bit like the following, although much more extensive. All of these are directories, not files. There can be as many as 40+ directories to search, but the desired one is generally not more than 2-3 directories deep.
foo
aaa
aaa_green
aaa_blue
bbb
ccc
ccc_green
bar
ddd
eee
eee_green
eee_blue
fff
fff_green
fff_blue
fff_pink
I might be handed input like aaa_green or just ddd.
As an example, given eee_blue as input, I need to know eee_blue's path from the working directory so I can cd there and delete the directory. IE, I would expect to return bar/eee/eee_blue/ or bar/eee/, either is acceptable.
The "best" option I can see currently is to cd into the lowest level of each directory via multiple greps, get each's contents and look for a match, and when it does (eventually) match save that cd'ing as the path. This frankly sounds awful and inefficient.
The only other alternative method I could think of was a straight recursive grep, but I tested it and at 8 minutes it still hadn't finished running.
This script needs to run on both mac and linux, although in a desperate pinch I could go linux only.

The standard Unix tool for doing this sort of task is the find command. The GNU version of find has more extensive options than the POSIX specification (by quite a margin). The version on macOS Sierra (and Mac OS X) is similar to the GNU version. I found an online manual for OS X 10.9 at Apple find, but there's probably a better location somewhere.
It looks like you might want to run:
find . -name 'eee_blue'
which will print the names of matching files or directories, or perhaps:
find . -name 'eee_blue' -exec rm -fr {} +
which will run the rm -fr command on each name. You can run a custom script you create in place of rm -fr if you prefer; if the logic is complex, it's what I do.
Be extremely cautious before using rm -fr automatically!

How to list recently deleted files from a directory?

I'm not even sure if this is easily possible, but I would like to list the files that were recently deleted from a directory, recursively if possible.
I'm looking for a solution that does not require the creation of a temporary file containing a snapshot of the original directory structure against which to compare, because write access might not always be available. Edit: If it's possible to achieve the same result by storing the snapshot in a shell variable instead of a file, that would solve my problem.
Something like:
find /some/directory -type f -mmin -10 -deletedFilesOnly
Edit: OS: I'm using Ubuntu 14.04 LTS, but the command(s) would most likely be running in a variety of Linux boxes or Docker containers, most or all of which should be using ext4, and to which I would most likely not have access to make modifications.

You can use the debugfs utility,
debugfs is a simple to use RAM-based file system specially designed
for debugging purposes
First, run debugfs /dev/hda13 in your terminal (replacing /dev/hda13 with your own disk/partition).
(NOTE: You can find the name of your disk by running df / in the terminal).
Once in debug mode, you can use the command lsdel to list inodes corresponding with deleted files.
When files are removed in linux they are only un-linked but their
inodes (addresses in the disk where the file is actually present) are
not removed
To get paths of these deleted files you can use debugfs -R "ncheck 320236" replacing the number with your particular inode.
Inode Pathname
320236 /path/to/file
From here you can also inspect the contents of deleted files with cat. (NOTE: You can also recover from here if necessary).
Great post about this here.

So a few things:
You may have zero success if your partition is ext2; it works best with ext4
df /
Fill mount point with result from #2, in my case:
sudo debugfs /dev/mapper/q4os--desktop--vg-root
lsdel
q (to exit out of debugfs)
sudo debugfs -R 'ncheck 528754' /dev/sda2 2>/dev/null (replace number with one from step #4)

Thanks for your comments & answers guys. debugfs seems like an interesting solution to the initial requirements, but it is a bit overkill for the simple & light solution I was looking for; if I'm understanding correctly, the kernel must be built with debugfs support and the target directory must be in a debugfs mount. Unfortunately, that won't really work for my use-case; I must be able to provide a solution for existing, "basic" kernels and directories.
As this seems virtually impossible to accomplish, I've been able to negotiate and relax the requirements down to listing the amount of files that were recently deleted from a directory, recursively if possible.
This is the solution I ended up implementing:
A simple find command piped into wc to count the original number of files in the target directory (recursively). The result can then easily be stored in a shell or script variable, without requiring write access to the file system.
DEL_SCAN_ORIG_AMOUNT=$(find /some/directory -type f | wc -l)
We can then run the same command again later to get the updated number of files.
DEL_SCAN_NEW_AMOUNT=$(find /some/directory -type f | wc -l)
Then we can store the difference between the two in another variable and update the original amount.
DEL_SCAN_DEL_AMOUNT=$(($DEL_SCAN_ORIG_AMOUNT - $DEL_SCAN_NEW_AMOUNT));
DEL_SCAN_ORIG_AMOUNT=$DEL_SCAN_NEW_AMOUNT
We can then print a simple message if the number of files went down.
if [ $DEL_SCAN_DEL_AMOUNT -gt 0 ]; then echo "$DEL_SCAN_DEL_AMOUNT deleted files"; fi;
Return to step 2.
Unfortunately, this solution won't report anything if the same amount of files have been created and deleted during an interval, but that's not a huge issue for my use case.
To circumvent this, I'd have to store the actual list of files instead of the amount, but I haven't been able to make that work using shell variables. If anyone could figure that out, I'd help me immensely as it would meet the initial requirements!
I'd also like to know if anyone has comments on either of the two approaches.

Try:
lsof -nP | grep -i deleted

history >> history.txt
Look for all rm statements.

How to directly overwrite with 'unexpand' (spaces-to-tabs conversion)?

I'm trying to use something along the lines of
unexpand -t 4 *.php
but am unsure how to write this command to do what I want.
Weirdly,
unexpand -t 4 file.php > file.php
gives me an empty file. (i.e. overwriting file.php with nothing)
I can specify multiple files okay, but don't know how to then overwrite each file.
I could use my IDE, but there are ~67000 instances of to be replaced over 200 files, and this will take a while.
I expect that the answers to my question(s) will be standard unix fare, but I'm still learning...

You can very seldom use output redirection to replace the input. Replacing works with commands that support it internally (since they then do the basic steps themselves). From the shell level, it's far better to work in two steps, like so:
Do the operation on foo, creating foo.tmp
Move (rename) foo.tmp to foo, overwriting the original
This will be fast. It will require a bit more disk space, but if you do both steps before continuing to the next file, you will only need as much extra space as the largest single file, this should not be a problem.
Sketch script:
for a in *.php
do
unexpand -t 4 $a >$a-notab
mv $a-notab $a
done
You could do better (error-checking, and so on), but that is the basic outline.

Here's the command I used:
for p in $(find . -iname "*.js")
do
unexpand -t 4 $(dirname $p)/"$(basename $p)" > $(dirname $p)/"$(basename $p)-tab"
mv $(dirname $p)/"$(basename $p)-tab" $(dirname $p)/"$(basename $p)"
done
This version changes all files within the directory hierarchy rooted at the current working directory.
In my case, I only wanted to make this change to .js files; you can omit the iname clause from find if you wish, or use different args to cast your net differently.
My version wraps filenames in quotes, but it doesn't use quotes around 'interesting' directory names that appear in the paths of matching files.
To get it all on one line, add a semi after lines 1, 3, & 4.
This is potentially dangerous, so make a backup or use git before running the command. If you're using git, you can verify that only whitespace was changed with git diff -w.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string