how to change a single letter in filenames all over the file system? - linux

i have hundreds of files with special characters ('æ', 'ø' and 'å') in their filenames.
i cannot copy these to my external mntfs disk without renaming.
the files are in dozens of different folders. there are thousands of other files without these letters in there as well.
i'd like to replace the special characters with their placeholders ('ae', 'oe' and 'aa'), while keeping the rest of the filename intact.
i'm on ubuntu. i'm thinking of using grep, sed and tr, but i don't know exactly how.

You can use rename command from util-linux package.
For example,
find / -type f -exec rename 'â' 'a' {} \;

convmv is used to convert filenames between encodings. I'm sure it can solve your problem, even if it might not be exactly what you asked for.

Related

Find files that are too long for Synology encrypted shares

When trying to encrypt the homes share after the DSM6.1 update, I got a message that there are files with filenames longer than 143 characters. This is the maximum length for a filename in an encrypted Synology share.
Because there is a lot of stuff in the homes share (mostly my own) it was not practical to search for the files by hand. Nevertheless these files had to be deleted or renamed to allow the encryption of the share.
I needed an automated way to find all files in all subdirectories with a filename longer than 143 characters. Searching for the files via the network share using a Windows tool would probably have taken way too long.
I have figured out the solution by myself (with some internet research though, because I'm still a n00b) and want to share it with you, so that someone with the same problem might benefit from this.
So here it goes:
The find function in combination with grep does the trick.
find /volume1/homes/ -maxdepth 15 | grep -P '\/[^\/]{143,}[^\/]'
For my case I assumed that I probably don't have more than 15 nested directories. The maximum depth and the starting directory can be adjusted to your needs.
For the -P argument you might need to have Perl installed, I'm not sure about that, though.
The RegEx matches all elements that have a / somewhere followed by 143 or more of any character other than / and not having a / afterwards. By this we only get files and no directories. For including directories you can leave out the last condition
The RegEx explained for people who might not be too familiar with this:
\/ looks for a forward slash. A new file/directory name begins here.
[^\/] means: Every character except /
{143,} means: 143 or more occurrences of the preceding token
[^\/] same as above. This excludes all results that don't belong to a file.
find . -type f -iname "*" |awk -F'/' 'length($NF)>143{print $0}'
This will print all the files whose name is greater than 143. Note that this is considering only the file name not the full path while calculating length. If you want to consider whole path in length :
find . -type f -iname "*" |awk 'length($0)>143{print $0}'

Search ill encoded characters in a file on Linux

I have a lot of huge CSV files, some of them contain ill encoded characters: in vi, I see things like "<8f>" or "<8e>", for example.
First, I wanted to search and replace (:%s) all the characters, but it will be a very long process because I will have to do this everytime I have to handle a file, and I'm not always sure whether new characters are here.
Is it possible to detect such characters, so that I can extract lines containing ill encoded characters?
A simple command may exist, taking a file for argument and creating a file containing only the lines with a problem.
I don't know if I explain me very well...
Thanks in advance!
You could use :g/char/p [vim] to print all the lines in a given file, or the bash utility grep:
grep -lr 'char1\|char2\|char2' .
Will output all the files in a directory containing any of the chars you have listed (the -r makes it recursive and the -l lists only the filenames, rather than all the line matches.

Search files with multiple "dot" characters

In Linux how do I use find and regular expressions or a similar way without writing a script to search for files with multiple "dots" but IGNORE extension.
For e.g search through the following files will only return the second file. In this example ".ext" is the extension.
testing1234hellothisisafile.ext
testing.1234.hello.this.is.a.file.ext
The solution should work with one or more dots in the file name (ignoring the extension dot). This should also work for any files i.e. with any file extension
Thanks in advance
So if I understand correctly, you want to get the filenames with at least two additional dots in the name. This would do:
$ find -regex ".*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
./testing1234.hellothisisafile.ext
$ find -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
./testing.1234.hello.this.is.a.file.ext
The key dot detecting part is \.+ (at least one dot), coupled with the separating anything (but a dot, but the previous part covers it already; a safety measure against greedy matching) [^.]*. Together they make the core part of the regex - we don't care what is before or after, just that somewhere there are three dots. Three since also the one from the current dir matters — if you'll be searching from elsewhere, remove one \.+[^.]* group:
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
$ find delme/ -regex ".*\.+[^.]*\.+[^.]*\.+.*"
delme/testing.1234.hello.this.is.a.file.ext
In this case the result is the same, since the name contains a lot of dots, but the second regex is the correct one.

search and replace tools in linux

What are the best search and replace tools in linux?
I want to find an easy way.
Thanks
You can use rpl.
It will replace strings with new strings in multiple text files. It can work recursively over directories and supports limiting the search to specific file suffixes.
rpl [-iwRspfdtx [-q|-v]] <old_str> <new_str> <target_file(s)>
find for finding files/directories
grep or ack[1] for searching files
sed for search/replace in files
awk and cut for slicing/dicing text
for anything non-trivial I usually reach for perl
[1] http://betterthangrep.com/
find and sed are the classic tools.

What is the fastest way to find all the file with the same inode?

The only way I know is:
find /home -xdev -samefile file1
But it's really slow. I would like to find a tool like locate.
The real problems comes when you have a lot of file, I suppose the operation is O(n).
There is no mapping from inode to name. The only way is to walk the entire filesystem, which as you pointed out is O(number of files). (Actually, I think it's θ(number of files)).
I know this is an old question, but many versions of find have an inum option to match a known inode number easily. You can do this with the following command:
find . -inum 1234
This will still run through all files if allowed to do-so, but once you get a match you can always stop it manually; I'm not sure if find has an option to stop after a single match (perhaps with an -exec statement?)
This is much easier than dumping output to a file, sorting etc. and other methods, so should be used when available.
Here's a way:
Use find -printf "%i:\t%p or similar to create a listing of all files prefixed by inode, and output to a temporary file
Extract the first field - the inode with ':' appended - and sort to bring duplicates together and then restrict to duplicates, using cut -f 1 | sort | uniq -d, and output that to a second temporary file
Use fgrep -f to load the second file as a list of strings to search and search the first temporary file.
(When I wrote this, I interpreted the question as finding all files which had duplicate inodes. Of course, one could use the output of the first half of this as a kind of index, from inode to path, much like how locate works.)
On my own machine, I use these kinds of files a lot, and keep them sorted. I also have a text indexer application which can then apply binary search to quickly find all lines that have a common prefix. Such a tool ends up being quite useful for jobs like this.
What I'd typically do is: ls -i <file> to get the inode of that file, and then find /dir -type f -inum <inode value> -mount. (You want the -mount to avoid searching on different file systems, which is probably part of your performance issues.)
Other than that, I think that's about it.

Resources