Find files that are too long for Synology encrypted shares - linux

When trying to encrypt the homes share after the DSM6.1 update, I got a message that there are files with filenames longer than 143 characters. This is the maximum length for a filename in an encrypted Synology share.
Because there is a lot of stuff in the homes share (mostly my own) it was not practical to search for the files by hand. Nevertheless these files had to be deleted or renamed to allow the encryption of the share.
I needed an automated way to find all files in all subdirectories with a filename longer than 143 characters. Searching for the files via the network share using a Windows tool would probably have taken way too long.
I have figured out the solution by myself (with some internet research though, because I'm still a n00b) and want to share it with you, so that someone with the same problem might benefit from this.

So here it goes:
The find function in combination with grep does the trick.
find /volume1/homes/ -maxdepth 15 | grep -P '\/[^\/]{143,}[^\/]'
For my case I assumed that I probably don't have more than 15 nested directories. The maximum depth and the starting directory can be adjusted to your needs.
For the -P argument you might need to have Perl installed, I'm not sure about that, though.
The RegEx matches all elements that have a / somewhere followed by 143 or more of any character other than / and not having a / afterwards. By this we only get files and no directories. For including directories you can leave out the last condition
The RegEx explained for people who might not be too familiar with this:
\/ looks for a forward slash. A new file/directory name begins here.
[^\/] means: Every character except /
{143,} means: 143 or more occurrences of the preceding token
[^\/] same as above. This excludes all results that don't belong to a file.

find . -type f -iname "*" |awk -F'/' 'length($NF)>143{print $0}'
This will print all the files whose name is greater than 143. Note that this is considering only the file name not the full path while calculating length. If you want to consider whole path in length :
find . -type f -iname "*" |awk 'length($0)>143{print $0}'

Related

Bash find command not operating in depth-first-search

So I read that the find command in Bash should operate with DFS, but I don't see it happening.
My path tree:
- tests_ex22
- first
- middle
- story2.txt
- story1.txt
- last
- story3.txt
I run the following command:
find $1 -name "*.$2" -exec grep -wi $3 {} \;
And to my surprise, elements in "middle" are printed before elements in "first".
When find arrives in a new directory, I want it to look in the current dir before moving to a new dir. But, I do want it to move in a DFS way.
Why is this happening? How can I solve it? (ofc, I don't have to use find).
middle is an element of first. It's not processing middle before elements in first; it's processing middle as part of the processing of first's elements.
It sounds like you want find to sort entries and process all non-directory entries before directory entries. There is no such mode, I'm afraid. In general find processes directory entries in the order it finds them, which is fairly arbitrary. If it were to process them in a particular order—say, alphabetical, or files before subdirectories—it would be required to sort entries. find avoids that overhead. It does not sort entries, not even as an option.
This is in contrast to ls, which does indeed sort its output. ls is designed to be more of a human-friendly display tool whereas find is for scripting.
Sort by depth
If you're mainly printing file names you could induce find to print each entry's depth along with its path and then manually sort by depth. Something like this:
find "$1" -name "*.$2" -printf '%d\t%p\n' | sort -V | cut -f 2-
You'll have to adapt this to your use case. It's tricky to fit the grep in here.
Manual loop
Or you could write a recursive search by hand. Here are some starting points:
breadth-first option in the Linux find utility?
How do I recursively list all directories at a location, breadth-first?
Your example shows find operating in a depth first manner. If you want breadth first manner there's a tool that's compatible with find but breadth first that is called bfs.
https://github.com/tavianator/bfs

Unix command to replace all instances of a string in every file in a folder [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have a folder "model". In it, I need to replace all instances of the term "Test_Dbv3" to "TestDbv3". There are multiple files with names like test_host.hbm.xml, test_host2.hbm.xml, testHost.java, testHost2.java and so on. Is there any way I can possibly do this using a Unix command or a script in any language?
I'm working on RHEL5.
sed in in-place mode along with find should probably work:
find . -type f -exec sed -e 's/Test_Dbv3/TestDbv3/g' -i.bak '{}' +
The aptly named find command finds files. Here, we're finding files in the current working directory (.) that are files (-type f). Using these files, we're going to -exec a command: sed. + indicates the end of the command and that we'd like to replace {} with as many files as the operating system will allow.
sed will go file-by-file, line-by-line, executing commands we specify. The command we're giving it is s/Test_Dbv3/TestDbv3/g, which translates to “substitute matches of the regular expression Test_Dbv3 with the text TestDbv3, allowing multiple substitutions per line”. The -i.bak means to replace the original file with the result, saving the unmodified version with the filename suffixed with .bak.
s/_//g is your regex assuming you want all _ gone; otherwise I need to guess how to specify your regex:
For example s/^(Test|test)_/$1/g to replace test_ with test and
Test_ with Test if they are at the beginning of a line.
Or s/^(test)_/$1/gi will additionally work for all TEST_, tEsT_, etc.
If you decide to need completely case insensitive matching that is only available for the for perl -pi -e 's/.../.../gi' or GNU sed or more but not the sed command (not even variables like $1 are, are they?)
If there are also filenames starting like Test2_ or 1EXPERIMENT_ and more words you may would use s/^([A-Za-z0-9]{3,10})_/$1/g to match every combination of letters and numbers of length 3 to 10 chars, not just the Test or test you mentioned.
For even more specific regex search for "regex cheatsheet" and just don't wonder when single tools like sed or grep don't support everything should you even decide to use them.
Should you also ever need a command to only rename files in a folder,
but not edit their content you may try
rename 's/search/relace/' folder/* (not matching subdirectories)
or rename search replace folder/* (depending on version of rename).

Search files with multiple "dot" characters

In Linux how do I use find and regular expressions or a similar way without writing a script to search for files with multiple "dots" but IGNORE extension.
For e.g search through the following files will only return the second file. In this example ".ext" is the extension.
testing1234hellothisisafile.ext
testing.1234.hello.this.is.a.file.ext
The solution should work with one or more dots in the file name (ignoring the extension dot). This should also work for any files i.e. with any file extension
Thanks in advance
So if I understand correctly, you want to get the filenames with at least two additional dots in the name. This would do:
$ find -regex ".*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
./testing1234.hellothisisafile.ext
$ find -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
The key dot detecting part is \.+ (at least one dot), coupled with the separating anything (but a dot, but the previous part covers it already; a safety measure against greedy matching) [^.]*. Together they make the core part of the regex - we don't care what is before or after, just that somewhere there are three dots. Three since also the one from the current dir matters — if you'll be searching from elsewhere, remove one \.+[^.]* group:
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
In this case the result is the same, since the name contains a lot of dots, but the second regex is the correct one.

how to change a single letter in filenames all over the file system?

i have hundreds of files with special characters ('æ', 'ø' and 'å') in their filenames.
i cannot copy these to my external mntfs disk without renaming.
the files are in dozens of different folders. there are thousands of other files without these letters in there as well.
i'd like to replace the special characters with their placeholders ('ae', 'oe' and 'aa'), while keeping the rest of the filename intact.
i'm on ubuntu. i'm thinking of using grep, sed and tr, but i don't know exactly how.
You can use rename command from util-linux package.
For example,
find / -type f -exec rename 'â' 'a' {} \;
convmv is used to convert filenames between encodings. I'm sure it can solve your problem, even if it might not be exactly what you asked for.

What is the fastest way to find all the file with the same inode?

The only way I know is:
find /home -xdev -samefile file1
But it's really slow. I would like to find a tool like locate.
The real problems comes when you have a lot of file, I suppose the operation is O(n).
There is no mapping from inode to name. The only way is to walk the entire filesystem, which as you pointed out is O(number of files). (Actually, I think it's θ(number of files)).
I know this is an old question, but many versions of find have an inum option to match a known inode number easily. You can do this with the following command:
find . -inum 1234
This will still run through all files if allowed to do-so, but once you get a match you can always stop it manually; I'm not sure if find has an option to stop after a single match (perhaps with an -exec statement?)
This is much easier than dumping output to a file, sorting etc. and other methods, so should be used when available.
Here's a way:
Use find -printf "%i:\t%p or similar to create a listing of all files prefixed by inode, and output to a temporary file
Extract the first field - the inode with ':' appended - and sort to bring duplicates together and then restrict to duplicates, using cut -f 1 | sort | uniq -d, and output that to a second temporary file
Use fgrep -f to load the second file as a list of strings to search and search the first temporary file.
(When I wrote this, I interpreted the question as finding all files which had duplicate inodes. Of course, one could use the output of the first half of this as a kind of index, from inode to path, much like how locate works.)
On my own machine, I use these kinds of files a lot, and keep them sorted. I also have a text indexer application which can then apply binary search to quickly find all lines that have a common prefix. Such a tool ends up being quite useful for jobs like this.
What I'd typically do is: ls -i <file> to get the inode of that file, and then find /dir -type f -inum <inode value> -mount. (You want the -mount to avoid searching on different file systems, which is probably part of your performance issues.)
Other than that, I think that's about it.

Resources