Batch replacing unidentified Characters in Unix that were created by macOS - linux

On a Linux volume as part of a NAS with many TB of data some files were created from macOS and some of those files uploaded from macOS seem to include characters in filenames that cannot be reproduced via FTP or SMB file protocol. These files will appear as e.g. "picture_name001.jpg". Where the "" probably stands for a colon or slash.
I can search for "" and found out it applies to 2171 files in distributed locations on the volume. Way too much to manually find and correct each file name.
I thought I can connect to the NAS via SSH and simply loop through each directory doing an automated replace of the "" into "_", but this doesn't work because:
for file in **; do mv -- "$file" "${file///_}"; done
this attempt will throw back an error on the first item matching  with:
mv: can't rename '120422_LAXJFK': No such file or directory
So obviously this substitute character displayed as "" is not the way to address the file or directory as it refers to a name that doesn't actually exists in the volume index.
(A) How do I find out if "120422_LAX:JFK" or "120422_LAX/JFK" is meant here, and (B) how do I escape these invalid characters to eventually be able to automatically rename all those names to for example "120422_LAX_JFK"?
Is there for example a way to get a numerical file ID from the name and then instruct to rename the file by number in case its name contains ""?

I think the problem is that behind this "" can be different codes of symbols. When the system can't represent some characters (for example, given encoding is not supported), then it automatically replaced by some default character (in your case it is ""). But actually there is some code of the character, that should be in the name. BUT when you trying to do this for file in **; do mv -- "$file" "${file///_}"; done system can't recognize code, that symbol is "" is stands for.
I think this problem can be solved by changing the encoding of characters (they should be compatible and better the same) on both devices (mac and NAS)
Hope this would help

Related

Copying a file, but appending index if file exists

I have several directories with filenames being the same, but their data inside is different.
My program identifies these files (among many others) and I would like to copy all the matches to the same directory.
I am using shutil.copy(src,dst) but I don't want to overwrite files that already exist in that directory (previous matches) if they have the same name. I'd like to be able to append an integer if it already exists. Similar to the behavior in Windows10 when you copy where you can "keep both versions".
So for example, if I have file.txt in several places, the first time it would copy into dst directory it would be file.txt, the next time it would be file-1.txt (or something similar), and the next time it would be file-2.txt.
Are there any flags for shutil.copy or some other copy mechanism in Python that I could use to accomplish this?

Remove a lot of UUID format named files using rm

I'm having a lot of files in a directory under a linux Environment.
The problem is that those files are mixed with also a lot of UUID named files that who knows how got there.
Is there a way to issue a "rm" command that allows me to delete those files? without the risk of removing the other files (None of the other files have a UUID format for filename).
I think it has something to do by defining how many characters there is before each " - " simbol, so something among the lines of "rm 8chars-4chars-4-4-12" but I don't know how to say that to rm, I only know "rm somefolder/*" using * to delete its contents, but that's it.
Thanks in advance.
Actually solved it!
It was as easy as using the "?" wildcard, it determines a character and only one character.
So, in this particular case:
rm -v ????????-????-????-* //This says "remove (verbosely) 8-4-4-whatever"
So, that way, it deletes only files that follow this same format for the filename.
More information here: http://www.linfo.org/wildcard.html

readdir() in linux sometimes not returning correct string utf8

I have a file in a folder with name as
01一千个伤心的理由 张学友
but sometimes readdir() is simply returning all ????????? as the name of the file.
I searched for this on google and found that readdir has some utf-8 issue on some systems (like this one). Did I read correct? If on linux, this is the problem then is there any solution?
EDIT
The problem is that there are actually two scripts (one is mine and there is another also) which are mounting the same device on two different paths. I am mounting as utf-8 but the other one is not mounting it as utf-8 (its probably in default mode). So if mine script runs first on reboot or device insert, everything is fine. Otherwise the problem comes.
So the question is why the two mounts are affecting the other one and how can I correct it?
On Linux (or more generally, POSIX), pathnames are just a bunch of arbitrary bytes terminated by a '\0' (ASCII NULL) character, with pathname components separated by '/'. Every other byte value is allowed. How to interpret those bytes is up to the application. So most likely your problem has to do with different locale settings etc. E.g. "script 1" creates a pathname which contains invalid UTF-8 but happens to be correctly printable characters in whatever locale that "script 1" is running in.

How to identify line endings on a large number of files

Given a medium-size tree of files (a few hundred), is there some utility that can scan the whole tree (recursively) and display the name of each file and whether the file currently contains CRLF, LF, or mixed line terminators?
A GUI that can both display the current status and also selectively change specific files is preferred, but not essential.
Also prefer a solution for Windows, but I have access to both Bash for Windows and a Linux box that has access to the same file tree, so I can use something Linux-y if necessary.
Related Question: https://unix.stackexchange.com/questions/118959/how-to-find-files-that-contain-newline-in-filename
You can use linux' find to look recursivly for filenames containing newline characters:
find . -name $'*[\n\r]*'
From there you can proceed to do what you need to do.

Linux - Restoring a file

I've written a vary basic shell script that moves a specified file into the dustbin directory. The script is as follows:
#!/bin/bash
#move items to dustbin directory
mv "$#" ~/dustbin/
echo "File moved to dustbin"
This works fine for me, any file I specify gets moved to the dustbin directory. However, what I would like to do is create a new script that will move the file in the dustbin directory back to its original directory. I know I could easily write a script that would move it back to a location specified by the user, but I would prefer to have one that would move it to its original directory.
Is this possible?
I'm using Mac OS X 10.6.4 and Terminal
You will have to store where the original file is coming from then. Maybe in a seperate file, a database, or in the files attributes (meta-data).
Create a logfile with 2 columns:
The complete filename in the dustbin
The complete original path and filename
You will need this logfile anyway - what will you do when a user deleted 2 files in different directories, but with the same name? /home/user/.wgetrc and /home/user/old/.wgetrc ?
What will you do when a user deletes a file, makes a new one with the same name, and then deletes that too? You'll need versions or timestamps or something.
You need to store the original location somewhere, either in a database or in an extended attribute of the file. A database is definitely the easiest way to do it, though an extended attribute would be more robust. Looking in ~/.Trash/ I see some, but not all files have extended attributes, so I'm not sure how Apple does it.
You need to somehow encode the source directory in the file. I think the easiest would be to change the filename in the dustbin directory. So that /home/user/music/song.mp3 becomes ~/dustbin/song.mp3|home_user_music
And when you copy it back your script needs to process the file name and construct the path beginning at |.
Another approach would be to let the filesystem be your database.
A file moved from /some/directory/somewhere/filename would be moved to ~/dustbin/some/directory/somewhere/filename and you'd do find ~/dustbin -name "$file" to find it based on its basename (from user input). Then you'd just trim "~/bustbin" from the output of find and you'd have the destination ready to use. If more than one file is returned by find, you can list the proposed files for user selection. You could use ~/dustbin/$deletiondate if you wanted to make it possible to roll back to earlier versions.
You could do a cron job that would periodically remove old files and the directories (if empty).

Resources