How to rename a folder that contains smart quotes - linux

I have a folder that was created automatically. The user unintentionally provided smart (curly) quotes as part of the name, and the process that sanitizes the inputs did not catch these. As a result, the folder name contains the smart quotes. For example:
this-is-my-folder’s-name-“Bob”
I'm now trying to rename/remove said folder on the command line, and none of the standard tricks for dealing with files/folders with special characters (enclosing in quotes, escaping the characters, trying to rename it by inode, etc.) are working. All result in:
mv: cannot move this-is-my-folder’s-name-“Bob” to this-is-my-folders-name-BOB: No such file or directory
Can anyone provide some advice as to how I can achieve this?

To get the name in a format you can copy-and-paste into your shell:
printf '%q\n' this*
...will print out the filename in a manner the shell will accept as valid input. This might look something like:
$'this-is-my-folder200\231s-name-200\234Bob200\235'
...which you can then use as an argument to mv:
mv $'this-is-my-folder200\231s-name-200\234Bob200\235' this-is-my-folders-name-BOB
Incidentally, if your operating system works the same way mine does (when running the test above), this would explain why using single-character globs such as ? for those characters didn't work: They're actually more than one byte long each!

You can use shell globbing token ? to match any single character, so matching the smart quotes using ? should do:
mv this-is-my-folder?s-name-?Bob? new_name
Here replacing the smart quotes with ? to match the file name.

There are several possibilities.
If an initial substring of the file name ending before the first quote is unique within the directory, then you can use filename completion to help you type an appropriate command. Type "mv" (without the quotes) and the unique initial substring, then press the TAB key to request filename completion. Bash will complete the filename with the correct characters, correctly escaped.
Use a graphical file browser. Then you can select the file to rename by clicking on it. (Details of how to proceed from there depend on the browser.) If you don't have a graphical terminal and can't get one, then you may be able to do the same with a text-mode browser such as Midnight Commander.
A simple glob built with the ? or * wildcard should be able to match the filename
Use a more complex glob to select the filename, and perhaps others with the same problem. Maybe something like *[^a-zA-Z0-9-]* would do. Use a pattern substitution to assign a new name. Something like this:
for f in *[^a-zA-Z0-9-]*; do
mv "$f" "${f//[^a-zA-Z0-9-]/}"
done
The substitution replaces all appearances of a characters that are not decimal digits, appercase or lowercase Latin letters, or hyphens with nothing (i.e. it strips them). Do take care before you use this, though, to make sure you're not going to make more changes than you intend to do.

Related

Linux rename s/ - regex for wildcard single characte r

I have found a simple solution to my actual requirement, but I would still like to understand how to use the regex equivalent of the single character wildcard ? which we use for filtering ... in say ls
I would like to rename a group of files which differ by one character.
FROM
Impossible-S01E01-x264.mkv
Impossible-S01E02-x264.mkv
Impossible-S01E03-x264.mkv
Impossible-S01E04-x264.mkv
Impossible-S01E05-x264.mkv
TO
Impossible-S01E01.mkv
Impossible-S01E02.mkv
Impossible-S01E03.mkv
Impossible-S01E04.mkv
Impossible-S01E05.mkv
As I said above, my simple solution is:
rename s/-x264// *.mkv
That sorts out my needs - all good and well - but I really want to understand my first approach:
To list the files, I can use:
ls Impossible-S01E0?-x264.mkv
So what I was trying for the rename was:
rename s/Impossible-S01E0?-x264.mkv/Impossible-S01E0?.mkv/ *.mkv
I have read up here:
How do regular expressions differ from wildcards used to filter files
And here:
Why does my regular expression work in X but not in Y?
I see this:
. matches any character (or any character except a newline).
I just can't seem to wrap my head around how to use that - hoping someone will explain for my education.
{ edit: missed a backslash \ }
So, regular expressions aren't globs. If you wanted to keep the middle (e.g. catch the season/ep) and replace everything else, you'd need to use capture groups. e.g. s/^.*(S\d+E\d+).*\.(.*?)$/Foo-$1.$2/
This would extract an SxxExx and the file extension, throw everything else away, and compose a new filename.
In a bit more detail it:
Matches everything from the start until an SxxExx (where xx is actually any number of digits)
Captures the contents of SxxExx
Matches everything until the final literal .
Non-greedily matches everything after the ., which it captures.
For your specific case of removing a suffix, this is likely overkill, though.

use rename command to batch rename the files

I tried to just keep the numbers in the square brackets and the file extensions.
so the files below:
【004】ssd水电费.txt
【006】佛山市,地方cd2.txt
【022】风sf.pdf
I'd like to be:
004.txt
006.txt
022.pdf
or just like
4.txt
6.txt
22.pdf
I know the 'rename 's/old-exp/new-exp' command and a little bit regex, however I could not found a way to match the regex what i expected.
I tried rename 's/[\u4e00-\u9eff]+//' * to replace the Chinese chars but not work.
You want to use something like the following:
rename 'tr/A-Za-z0-9.//cd; s/^(\d+).*(\.[a-z]+)$/$1$2/' *
(You'll want to use -n first to test that it does what you want.)
That removes all characters from the file name other than A-Za-z0-9. and then pulls out only the extension and the digits at the beginning.
The reason the Unicode match doesn't work is because rename uses byte strings, not Unicode strings, since not all Unix paths are guaranteed to be valid Unicode. Therefore, unless you have to, it's easier to simply filter out the byte values that you don't want rather than than convert them to Unicode.

Find space escape

Writing a small script in bash (MacOS in fact) and I want to use find, with multiple sources. Not normally a problem, but the list of source directories to search is held as a string in a variable. Again, not normally a problem, but some of them contain spaces in their name.
I can construct the full command string and if entered directly at the command prompt (copy and paste in fact) it works as required and expected. But when I try and run it within the script, it flunks out on the spaces in the name and I have been unable to get around this.
I cannot quote the entire source string as that is then just seen as one single item which of course does not exist. I escape each space with a backslash within the string held in the variable and it is simply lost. If I use double backslash, they both remain in place and again it fails. Any method of quoting I have tried is basically ignored, the quotes are seen as normal characters and splitting is done at each space.
I have so far only been able to use eval on the whole command string to get it to work but I felt there ought to be a better solution than this.
Ironically, if I use AppleScript I CAN create a suitable command string and run it perfectly with doShellScript (ok, that's using JXA, but it's the same with actual AppleScript). However, I have so far been unable to find the correct escape mechanism just in a bash script, without resorting to eval.
Anyone suggest a solution to this?
If possible, don't store all paths in one string. An array is safer and more convenient:
paths=("first path" "second path" "and so on")
find "${paths[#]}"
The find command will expand to
find "first path" "second path" "and so on"
If you have to use the string and don't want to use eval, split the string into an array:
string="first\ path second\ path and\ so\ on"
read -a paths <<< "$string"
find "${paths[#]}"
Paths inside string should use \ to escape spaces; wraping paths inside"" or '' will not work. eval might be the better option here.

Replace pwd with USER in a file

I know that this is quite an easy thing for any advanced Vim programmer, but I have been trying to find a solution for a couple of hours now.
In my results file, there are certain lines like:
/Users/name/Project/Task1/folder1 : INFO : Random Info message
Here, /Users/name/Project/Task1/folder1 is my pwd i.e present working directory.
I want to replace all the occurrences of my pwd above in the file with 'USER'. How can I do that?
:%s#/Users/name/Project/Task1/folder1#USER#g
or
:%s#<C-r>=getcwd()<CR>#USER#g
If I understand you correctly you can simply use the search and replace functionality and escape the / character like this:
:%s/\/Users\/name\/Project\/Task1\/folder1/USER/
If you need to replace multiple current working directories (and thus want to have the pwd to be dynamic) it is probably easier to use something like sed:
sed "s~$(pwd)~USER~" < file
Note that the ~ is used as a delimiter for the command instead of the /, this way we do not need to escape the / in the path.

Pattern Matching log files

I am getting files like .log and _log in a folder ,i am able to pick .log files with /*.log$/ but unable to find files which are _log .
need a regex pattern which will take both type of files from a specified folder.
Your question is tagged both 'perl' and 'linux'. I'll assume here that you're talking about Perl style regular expressions, as it looks like that's what you are showing in your example snippet.
The *. sequence is a mistake.
Let's focus on what you want to match. You want to match any filename that ends in a dot followed by the literal characters 'log'. You also want to match any filename that ends in an underscore, followed by the literal characters 'log'. You really shouldn't concern yourself with the "anything at all" that can come before the final dot or underscore. So the regexp would probably be better written as this:
/[._]log$/
Notice we don't even bother with the dot-star. It isn't helpful in this situation.
If you want for your pattern to also match files where the literal characters 'log' may optionally be followed by an integer sequence (not mentioned in your question, but discussed in one of your followup comments), you could write it like this:
/[._]log\d*$/
Here the 'star' is helpful; it allows for zero or more digits sandwiched between the 'g' and the end of the string.
I totally agree (by upvoting) with DavidO's solution but it usually makes more sense, and increase readability, to use glob() to get a list of files from a particular directory
my $dir = "/path/here";
my #log_files = grep { /[\._]log\d*$/ } glob("$dir/*");
print join "\n", #log_files;
This will catch
foo.log
foo_log
foo.log1
foo_log22
Use the regexp /.*[._]log$/.
I'm surprised your first case worked -- /*.log$/ isn't legal regexp (since the * doesn't say what it is supposed to match zero-or-more of). Double-check your current results.

Resources