What is the fastest way to find all the file with the same inode?

What is the fastest way to find all the file with the same inode? - linux

The only way I know is:
find /home -xdev -samefile file1
But it's really slow. I would like to find a tool like locate.
The real problems comes when you have a lot of file, I suppose the operation is O(n).

There is no mapping from inode to name. The only way is to walk the entire filesystem, which as you pointed out is O(number of files). (Actually, I think it's θ(number of files)).

I know this is an old question, but many versions of find have an inum option to match a known inode number easily. You can do this with the following command:
find . -inum 1234
This will still run through all files if allowed to do-so, but once you get a match you can always stop it manually; I'm not sure if find has an option to stop after a single match (perhaps with an -exec statement?)
This is much easier than dumping output to a file, sorting etc. and other methods, so should be used when available.

Here's a way:
Use find -printf "%i:\t%p or similar to create a listing of all files prefixed by inode, and output to a temporary file
Extract the first field - the inode with ':' appended - and sort to bring duplicates together and then restrict to duplicates, using cut -f 1 | sort | uniq -d, and output that to a second temporary file
Use fgrep -f to load the second file as a list of strings to search and search the first temporary file.
(When I wrote this, I interpreted the question as finding all files which had duplicate inodes. Of course, one could use the output of the first half of this as a kind of index, from inode to path, much like how locate works.)
On my own machine, I use these kinds of files a lot, and keep them sorted. I also have a text indexer application which can then apply binary search to quickly find all lines that have a common prefix. Such a tool ends up being quite useful for jobs like this.

What I'd typically do is: ls -i <file> to get the inode of that file, and then find /dir -type f -inum <inode value> -mount. (You want the -mount to avoid searching on different file systems, which is probably part of your performance issues.)
Other than that, I think that's about it.

Related

Bash find command not operating in depth-first-search

So I read that the find command in Bash should operate with DFS, but I don't see it happening.
My path tree:
- tests_ex22
- first
- middle
- story2.txt
- story1.txt
- last
- story3.txt
I run the following command:
find $1 -name "*.$2" -exec grep -wi $3 {} \;
And to my surprise, elements in "middle" are printed before elements in "first".
When find arrives in a new directory, I want it to look in the current dir before moving to a new dir. But, I do want it to move in a DFS way.
Why is this happening? How can I solve it? (ofc, I don't have to use find).

middle is an element of first. It's not processing middle before elements in first; it's processing middle as part of the processing of first's elements.
It sounds like you want find to sort entries and process all non-directory entries before directory entries. There is no such mode, I'm afraid. In general find processes directory entries in the order it finds them, which is fairly arbitrary. If it were to process them in a particular order—say, alphabetical, or files before subdirectories—it would be required to sort entries. find avoids that overhead. It does not sort entries, not even as an option.
This is in contrast to ls, which does indeed sort its output. ls is designed to be more of a human-friendly display tool whereas find is for scripting.
Sort by depth
If you're mainly printing file names you could induce find to print each entry's depth along with its path and then manually sort by depth. Something like this:
find "$1" -name "*.$2" -printf '%d\t%p\n' | sort -V | cut -f 2-
You'll have to adapt this to your use case. It's tricky to fit the grep in here.
Manual loop
Or you could write a recursive search by hand. Here are some starting points:
breadth-first option in the Linux find utility?
How do I recursively list all directories at a location, breadth-first?

Your example shows find operating in a depth first manner. If you want breadth first manner there's a tool that's compatible with find but breadth first that is called bfs.
https://github.com/tavianator/bfs

Find files that are too long for Synology encrypted shares

When trying to encrypt the homes share after the DSM6.1 update, I got a message that there are files with filenames longer than 143 characters. This is the maximum length for a filename in an encrypted Synology share.
Because there is a lot of stuff in the homes share (mostly my own) it was not practical to search for the files by hand. Nevertheless these files had to be deleted or renamed to allow the encryption of the share.
I needed an automated way to find all files in all subdirectories with a filename longer than 143 characters. Searching for the files via the network share using a Windows tool would probably have taken way too long.
I have figured out the solution by myself (with some internet research though, because I'm still a n00b) and want to share it with you, so that someone with the same problem might benefit from this.

So here it goes:
The find function in combination with grep does the trick.
find /volume1/homes/ -maxdepth 15 | grep -P '\/[^\/]{143,}[^\/]'
For my case I assumed that I probably don't have more than 15 nested directories. The maximum depth and the starting directory can be adjusted to your needs.
For the -P argument you might need to have Perl installed, I'm not sure about that, though.
The RegEx matches all elements that have a / somewhere followed by 143 or more of any character other than / and not having a / afterwards. By this we only get files and no directories. For including directories you can leave out the last condition
The RegEx explained for people who might not be too familiar with this:
\/ looks for a forward slash. A new file/directory name begins here.
[^\/] means: Every character except /
{143,} means: 143 or more occurrences of the preceding token
[^\/] same as above. This excludes all results that don't belong to a file.

find . -type f -iname "*" |awk -F'/' 'length($NF)>143{print $0}'
This will print all the files whose name is greater than 143. Note that this is considering only the file name not the full path while calculating length. If you want to consider whole path in length :
find . -type f -iname "*" |awk 'length($0)>143{print $0}'

Delete some lines from text using Linux command

I know how to match text using regex patterns but not how to manipulate them.
I have used grep to match and extract lines from a text file, but I want to remove those lines from the text. How can I achieve this without having to write a python or bash shell script?
I have searched on Google and was recommended to use sed, but I am new to it and don't know how it works.
Can anyone point me in the right direction or help me achieve this goal?

The -v option to grep inverts the search, reporting only the lines that don't match the pattern.
Since you know how to use grep to find the lines to be deleted, using grep -v and the same pattern will give you all the lines to be kept. You can write that to a temporary file and then copy or move the temporary file over the original.
grep -v pattern original.file > tmp.file
mv tmp.file original.file
You can also use sed, as shown in shellfish's answer.
There are multiple possible refinements for the grep solution, but for most people most of the time, what is shown is more or less adequate (it would be a good idea to use a per process intermediate file name, preferably with a random name such as the mktemp command gives you). You can add code to remove the intermediate file on an interrupt; suppress interrupts while moving back; use copy and remove instead of move if the original file has multiple hard links or is a symlink; etc. The sed command more or less works around these issues for you, but it is not cognizant of multiple hard links or symlinks.

Create the pattern which matches the lines using grep. Then create a sed script as follows:
sed -i '/pattern/d' file
Explanation:
The -i option means overwrite the input file, thus removing the files matching pattern.
pattern is the pattern you created for grep, e.g. ^a*b\+.
d this sed command stands for delete, it will delete lines matching the pattern.
file this is the input file, it can consist of a relative or absolute path.
For more information see man sed.

Find and Replace Incrementally Across Multiple Files - Bash

I apologize in advance if this belongs in SuperUser, I always have a hard time discerning whether these scripting in bash questions are better placed here or there. Currently I know how to find and replace strings in multiple files, and how to find and replace strings within a single file incrementally from searching for a solution to this issue, but how to combine them eludes me.
Here's the explanation:
I have a few hundred files, each in sets of two: a data file (.data), and a message file (data.ms).
These files are linked via a key value unique to each set of two that looks like: ab.cdefghi
Here's what I want to do:
Step through each .data file and do the following:
Find:
MessageKey ab.cdefghi
Replace:
MessageKey xx.aaa0001
MessageKey xx.aaa0002
...
MessageKey xx.aaa0010
etc.
Incrementing by 1 every time I get to a new file.
Clarifications:
For reference, there is only one instance of "MessageKey" in every file.
The paired files have the same name, only their extensions differ, so I could simply step through all .data files and then all .data.ms files and use whatever incremental solution on both and they'd match fine, don't need anything too fancy to edit two files in tandem or anything.
For all intents and purposes whatever currently appears on the line after each MessageKey is garbage and I am completely throwing it out and replacing it with xx.aaa####
String length does matter, so I need xx.aa0009, xx.aaa0010 not xx.aa0009, xx.aa00010
I'm using cygwin.

I would approach this by creating a mapping from old key to new and dumping that into a temp file.
grep MessageKey *.data \
| sort -u \
| awk '{ printf("%s:xx.aaa%04d\n", $1, ++i); }' \
> /tmp/key_mapping
From there I would confirm that the file looks right before I applied the mapping using sed to the files.
cat /tmp/key_mapping \
| while read old new; do
sed -i -e "s:MessageKey $old:MessageKey $new:" * \
done
This will probably work for you, but it's neither elegant or efficient. This is how I would do it if I were only going to run it once. If I were going to run this regularly and efficiency mattered, I would probably write a quick python script.

#Carl.Anderson got me started on the right track and after a little tweaking, I ended up implementing his solution but with some syntax tweaks.
First of all, this solution only works if all of your files are located in the same directory. I'm sure anyone with even slightly more experience with UNIX than me could modify this to work recursively, but here goes:
First I ran:
-hr "MessageKey" . | sort -u | awk '{ printf("%s:xx.aaa%04d\n", $2, ++i); }' > MessageKey
This command was used to create a find and replace map file called "MessageKey."
The contents of which looked like:
In.Rtilyd1:aa.xxx0087
In.Rzueei1:aa.xxx0088
In.Sfricf1:aa.xxx0089
In.Slooac1:aa.xxx0090
etc...
Then I ran:
MessageKey | while IFS=: read old new; do sed -i -e "s/MessageKey $old/MessageKey $new/" *Data ; done
I had to use IFS=: (or I could have alternatively find and replaced all : in the map file with a space, but the former seemed easier.
Anyway, in the end this worked! Thanks Carl for pointing me in the right direction.

how to change a single letter in filenames all over the file system?

i have hundreds of files with special characters ('æ', 'ø' and 'å') in their filenames.
i cannot copy these to my external mntfs disk without renaming.
the files are in dozens of different folders. there are thousands of other files without these letters in there as well.
i'd like to replace the special characters with their placeholders ('ae', 'oe' and 'aa'), while keeping the rest of the filename intact.
i'm on ubuntu. i'm thinking of using grep, sed and tr, but i don't know exactly how.

You can use rename command from util-linux package.
For example,
find / -type f -exec rename 'â' 'a' {} \;

convmv is used to convert filenames between encodings. I'm sure it can solve your problem, even if it might not be exactly what you asked for.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What is the fastest way to find all the file with the same inode? - linux

The only way I know is: find /home -xdev -samefile file1 But it's really slow. I would like to find a tool like locate. The real problems comes when you have a lot of file, I suppose the operation is O(n).

There is no mapping from inode to name. The only way is to walk the entire filesystem, which as you pointed out is O(number of files). (Actually, I think it's θ(number of files)).

What I'd typically do is: ls -i <file> to get the inode of that file, and then find /dir -type f -inum <inode value> -mount. (You want the -mount to avoid searching on different file systems, which is probably part of your performance issues.) Other than that, I think that's about it.

Related

Bash find command not operating in depth-first-search

Find files that are too long for Synology encrypted shares

Delete some lines from text using Linux command

Find and Replace Incrementally Across Multiple Files - Bash

how to change a single letter in filenames all over the file system?

Categories

Resources