I tried to just keep the numbers in the square brackets and the file extensions.
so the files below:
【004】ssd水电费.txt
【006】佛山市,地方cd2.txt
【022】风sf.pdf
I'd like to be:
004.txt
006.txt
022.pdf
or just like
4.txt
6.txt
22.pdf
I know the 'rename 's/old-exp/new-exp' command and a little bit regex, however I could not found a way to match the regex what i expected.
I tried rename 's/[\u4e00-\u9eff]+//' * to replace the Chinese chars but not work.
You want to use something like the following:
rename 'tr/A-Za-z0-9.//cd; s/^(\d+).*(\.[a-z]+)$/$1$2/' *
(You'll want to use -n first to test that it does what you want.)
That removes all characters from the file name other than A-Za-z0-9. and then pulls out only the extension and the digits at the beginning.
The reason the Unicode match doesn't work is because rename uses byte strings, not Unicode strings, since not all Unix paths are guaranteed to be valid Unicode. Therefore, unless you have to, it's easier to simply filter out the byte values that you don't want rather than than convert them to Unicode.
Related
I'm writing a script that at one point needs to remove a literal string (no regex matching needed) from a file. Both the string and file path are in script variables.
The problem is that the string is multi-line, and may also contain special characters. Additionally, not all lines in the string are unique in the file (but the string as a whole is), so I cannot go line by line to delete each individual one from the file.
For example, when I echo "$stringToDelete" from my script I get something like:
values(
/the/#quick/[]/"fox",
//jumped/over/the,
//lazy/{},
)
I've tried some approaches using sed and awk but any attempt fails with issues around either the newlines or the special characters. I've seen some answers invoking perl but couldn't get that to work.
Also I can easily read the full file's content into a variable, so the solution doesn't need to edit the file directly - a way to remove the stringToDelete from another string var is fine too.
I have found a simple solution to my actual requirement, but I would still like to understand how to use the regex equivalent of the single character wildcard ? which we use for filtering ... in say ls
I would like to rename a group of files which differ by one character.
FROM
Impossible-S01E01-x264.mkv
Impossible-S01E02-x264.mkv
Impossible-S01E03-x264.mkv
Impossible-S01E04-x264.mkv
Impossible-S01E05-x264.mkv
TO
Impossible-S01E01.mkv
Impossible-S01E02.mkv
Impossible-S01E03.mkv
Impossible-S01E04.mkv
Impossible-S01E05.mkv
As I said above, my simple solution is:
rename s/-x264// *.mkv
That sorts out my needs - all good and well - but I really want to understand my first approach:
To list the files, I can use:
ls Impossible-S01E0?-x264.mkv
So what I was trying for the rename was:
rename s/Impossible-S01E0?-x264.mkv/Impossible-S01E0?.mkv/ *.mkv
I have read up here:
How do regular expressions differ from wildcards used to filter files
And here:
Why does my regular expression work in X but not in Y?
I see this:
. matches any character (or any character except a newline).
I just can't seem to wrap my head around how to use that - hoping someone will explain for my education.
{ edit: missed a backslash \ }
So, regular expressions aren't globs. If you wanted to keep the middle (e.g. catch the season/ep) and replace everything else, you'd need to use capture groups. e.g. s/^.*(S\d+E\d+).*\.(.*?)$/Foo-$1.$2/
This would extract an SxxExx and the file extension, throw everything else away, and compose a new filename.
In a bit more detail it:
Matches everything from the start until an SxxExx (where xx is actually any number of digits)
Captures the contents of SxxExx
Matches everything until the final literal .
Non-greedily matches everything after the ., which it captures.
For your specific case of removing a suffix, this is likely overkill, though.
I've got a ton of files as follows
audiofile_drums_1-ktpcwybsh5c.wav
soundsample_drums_2-fghlkjy57sa.wav
noise_snippet_guitar_5-mxjtgqta3o1.wav
louder_flute_9-mdlsiqpfj6c.wav
I want to remove everything between and including the "-" and the .wav file extension, to be left with
audiofile_drums_1.wav
soundsample_drums_2.wav
noise_snippet_guitar_5.wav
louder_flute_9.wav
I've tried to do delete everything following and including the character "-" using
rename 's/-.*//' *
Which gives me
audiofile_drums_1
soundsample_drums_2
noise_snippet_guitar_5
louder_flute_9
And for lack of finding an easy way to rename all the files again, adding .wav the extension, I am hoping there is a slicker way to do this in one nifty command in one stage instead of 2.
Any suggestions?
Thanks
You can use rename 's/-[^\.]*\.wav$/\.wav/' *
The first part -[^\.]*\.wav$ searchs for a - followed by n chars that are not . followed by .wav and the end of filename. The end of filename and .wav is not strictly needed but it helps avoid renaming files you don't want to rename.
The /\.wav/ preserves the extension.
Please not that rename is not a standard utility, and is part of perl, so rename may not be available on every linux system.
This works in my specific case, but should work for any file extension.
rename -n 's/-.*(?=\.wav$)//' *
The command looks for all characters after and inclusive of the - symbol in the filename, then, using a positive lookahead** (?=\.wav$) to search for the characters (the file extension in this case) at the end of the filename (denoted by $, and replaces them with no characters (removing them).
** NOTE: A positive look ahead is a zero width assertion.
It will affect the match but it will not be included
in the replacement. (The '.wav' part will not be
erased)
In this example (?=\.wav$) is the positive lookahead. The dollar sign $, as in regex, denotes at the end of the line, so perfect for a file extension.
I would like to see the actual file contents without it being formatted to print. For example, to show:
\n0.032,170\n0.034,290
Instead of:
0.032,170
0.34,290
Is there a command to echo the file's actual data in bash? I've tried using head, cat, more, etc. but all those seem to echo the "print-formatted" text. For example:
$ cat example.csv
0.032,170
0.34,290
How can I print the actual characters within the file?
This reads as if you miss understand what the "actual characters in the file" are. You will not find the characters \ and n in that file. But only a line feed, which is a specific character. So the utilities like cat do actually output exactly the characters in the file.
Putting it the other way around: if you really had those two characters literally in the file, then a utility like cat would actually output them. I just checked that, just to be sure.
You can easily check that yourself if you open the file using a hexeditor. There you will see the character 0A (decimal 10) which is a line feed character. You will not see the pair of the two characters \ and n somewhere in that file.
Many programming languages and also shell environments use escape sequences like \n in string definitions and identify those as control characters which would not be typable otherwise. So maybe that is where your impression comes from that your files should contain those two characters.
To display newlines as \n, you might try:
awk 1 ORS='\\n' input-file
This is not the "actual characters in the file", as \n is merely a conventional method of displaying a newline, but this does seem to be what you want.
I have a folder that was created automatically. The user unintentionally provided smart (curly) quotes as part of the name, and the process that sanitizes the inputs did not catch these. As a result, the folder name contains the smart quotes. For example:
this-is-my-folder’s-name-“Bob”
I'm now trying to rename/remove said folder on the command line, and none of the standard tricks for dealing with files/folders with special characters (enclosing in quotes, escaping the characters, trying to rename it by inode, etc.) are working. All result in:
mv: cannot move this-is-my-folder’s-name-“Bob” to this-is-my-folders-name-BOB: No such file or directory
Can anyone provide some advice as to how I can achieve this?
To get the name in a format you can copy-and-paste into your shell:
printf '%q\n' this*
...will print out the filename in a manner the shell will accept as valid input. This might look something like:
$'this-is-my-folder200\231s-name-200\234Bob200\235'
...which you can then use as an argument to mv:
mv $'this-is-my-folder200\231s-name-200\234Bob200\235' this-is-my-folders-name-BOB
Incidentally, if your operating system works the same way mine does (when running the test above), this would explain why using single-character globs such as ? for those characters didn't work: They're actually more than one byte long each!
You can use shell globbing token ? to match any single character, so matching the smart quotes using ? should do:
mv this-is-my-folder?s-name-?Bob? new_name
Here replacing the smart quotes with ? to match the file name.
There are several possibilities.
If an initial substring of the file name ending before the first quote is unique within the directory, then you can use filename completion to help you type an appropriate command. Type "mv" (without the quotes) and the unique initial substring, then press the TAB key to request filename completion. Bash will complete the filename with the correct characters, correctly escaped.
Use a graphical file browser. Then you can select the file to rename by clicking on it. (Details of how to proceed from there depend on the browser.) If you don't have a graphical terminal and can't get one, then you may be able to do the same with a text-mode browser such as Midnight Commander.
A simple glob built with the ? or * wildcard should be able to match the filename
Use a more complex glob to select the filename, and perhaps others with the same problem. Maybe something like *[^a-zA-Z0-9-]* would do. Use a pattern substitution to assign a new name. Something like this:
for f in *[^a-zA-Z0-9-]*; do
mv "$f" "${f//[^a-zA-Z0-9-]/}"
done
The substitution replaces all appearances of a characters that are not decimal digits, appercase or lowercase Latin letters, or hyphens with nothing (i.e. it strips them). Do take care before you use this, though, to make sure you're not going to make more changes than you intend to do.