How to run multiple commands on file pairs using a shell script? - linux

I've looked at this post and this post among a few others. They provided some information as to how to get started, unfortunately my use case is slightly more complex.
I have pairs of video files that I run ffmpeg commands on. The command I run transfers metadata from the original files to the converted files and looks as follows:
christian$ ffmpeg -i file_1.mov -i file_1.mp4 -map 1 -map_metadata 0 -c copy file_1_fixed.mp4
This post explains what the command does, should an explanation be required. Basically the use case is that both the original file and the converted file have the same name, but different extensions. How would I go about writing a shell script that finds all such pairs in a directory and composes and runs the command specified above for each pair?
I assume, from a logical point of view that I would need to loop through all the pairs, get the file names, do some sort of string manipulation (if this is even possible with shell scripting) compose the command and run it.
Any resources you could point me towards would be deeply appreciated. Just to clarify, some pseudo code:
for (file in directory) {
string name = file.getname
string command = "ffmpeg -i " + name + ".mov -i " + name + ".mp4 -map 1
-map_metadata 0 -c copy " + name + "_fixed.mp4"
run(command)
}
Hope this makes sense, please let me know if I should clarify more. Thank you for reading.

As you tagged this question with bash I send you this sketch for a bash script. This should work in general but you may adjust it to you actual needs:
#!/usr/bin/bash
# for debugging remove the hash from next line:
#set -x
# loop over all .mov files
for FILE in *.mov; do
FILE_MP4="${FILE/.mov/.mp4}"
FILE_FIXED="${FILE/.mov/_fixed.mp4}"
ffmpeg -i "$FILE" -i "$FILE_MP4" -map 1 -map_metadata 0 -c copy "$FILE_FIXED"
done
Notes/Hints:
for FILE in *.mov loops over all files with extension .mov but no other files. This is good because it will even work if called multiple times in the same directory.
The for loop will search in the current directory. You may use cd to change to a specific directory. (Handling of absolute or relative file paths instead of names is also possible...)
The quotes are choosen with care. Quoting in bash is very powerful but definitely not easy.
To check this script, you may prefix your ffmpeg command with command echo. This is like a dry run. (You will see what would be called without the echo "prefix".)

Related

Bash Scripting with xargs to BACK UP files

I need to copy a file from multiple locations to the BACK UP directory by retaining its directory structure. For example, I have a file "a.txt" at the following locations /a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt, I now need to copy this file from multiple locations to the backup directory /tmp/backup. The end result should be:
when i list /tmp/backup/a --> it should contain /b/a.txt /c/a.txt /d/a.txt & /e/a.txt.
For this, I had used the command: echo /a/*/a.txt | xargs -I {} -n 1 sudo cp --parent -vp {} /tmp/backup. This is throwing the error "cp: cannot stat '/a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt': No such file or directory"
-I option is taking the complete input from echo instead of individual values (like -n 1 does). If someone can help debug this issue that would be very helpful instead of providing an alternative command.
Use rsync with the --relative (-R) option to keep (parts of) the source paths.
I've used a wildcard for the source to match your example command rather than the explicit list of directories mentioned in your question.
rsync -avR /a/*/a.txt /tmp/backup/
Do the backups need to be exactly the same as the originals? In most cases, I'd prefer a little compression. [tar](https://man7.org/linux/man-pages/man1/tar.1.html) does a great job of bundling things including the directory structure.
tar cvzf /path/to/backup/tarball.tgz /source/path/
tar can't update compressed archives, so you can skip the compression
tar uf /path/to/backup/tarball.tar /source/path/
This gives you versioning of a sort, as if only updates changed files, but keeps the before and after versions, both.
If you have time and cycles and still want the compression, you can decompress before and recompress after.

Facing issues in making a bash script work

I'm new to Bash scripting. My script intended role is to access a provided path and then apply some software (RTG - Real time Genomics) commands on the data provided in that path. However, when i try to execute the bash from CLI, it gives me following error
ERROR:There were invalid input file paths
The path I have provided in the script is accurate. That is, In the original directory, where the program 'RTG' resides, I have made folders accordingly like /data/reads/NA19240 and placed both *_1.fastq and *_2.fastq files inside NA19240.
Here is the script:
#!/bin/bash
for left_fastq in /data/reads/NA19240/*_1.fastq; do
right_fastq=${left_fastq/_1.fastq/_2.fastq}
lane_id=$(basename ${left_fastq/_1.fastq})
rtg format -f fastq -q sanger -o ${lane_id} -l ${left_fastq} -r ${right_fastq} --sam-rg "#RG\tID:${lane_id}\tSM:NA19240\tPL:ILLUMINA"
done
I have tried many workarounds but still not being able to bypass this error. I will be really grateful if you guys can help me fixing this problem. Thanks
After adding set -aux in bash script for debugging purpose, I'm getting following output now
adnan#adnan-VirtualBox[Linux] ./format.sh
+ for left_fastq in '/data/reads/NA19240/*_1.fastq'
+ right_fastq='/data/reads/NA19240/*_2.fastq'
++ basename '/data/reads/NA19240/*'
+ lane_id='*'
+ ./rtg format -f fastq -q sanger -o '*' -l '/data/reads/NA19240/*_1.fastq' -r '/data/reads/NA19240/*_2.fastq' --sam-rg '#RG\tID:*\tSM:NA19240\tPL:ILLUMINA'
Error: File not found: "/data/reads/NA19240/*_1.fastq"
Error: File not found: "/data/reads/NA19240/*_2.fastq"
Error: There were 2 invalid input file paths
You need to set the nullglob option in the script, like so:
shopt -s nullglob
By default, non-matching globs are expanded to themselves. The output you got by setting set -aux indicates that the file glob /data/reads/NA19240/*_1.fastq is getting interpreted literally. The only way this would happen is if there were no files found, and nullglob was disabled.
In the original directory, where the program 'RTG' resides, I have
made folders accordingly like /data/reads/NA19240 and placed both
*_1.fastq and *_2.fastq files inside NA19240.
So you say, your data folders are in the original directory (whatever that may be), but in the script you wrongly specify them to be in the root directory (by the leading /).
Since you start the script in the original directory, just drop the leading / and use a relative path:
for left_fastq in data/reads/NA19240/*_1.fastq

Run additional command when rsync detects a file

I am currently running the following script to make an automatic backup of my Music:
#!/bin/bash
while :; do
rsync -ruv /mnt/hdd1/Music/ /mnt/hdd2/Music/
done
Whenever a new file is added to my music folder, it is detected by rsync and it is copied to my other disk. This script runs fine, but I would also like to convert the detected file to an ogg opus file for putting on my phone.
My question is: How do I run a command on a new file found by rsync -u?
I will also accept answers which work totally differently, but have the same result.
rsync -ruv /mnt/hdd1/Music /mnt/hdd2/ | sed -n 's|^Music/||p' >~/filelist.tmp
while IFS= read filename
do
[ -f "$filename" ] || continue
# do something with file
echo "Now processing '$filename'"
done <~/filelist.tmp
With the -v option, rsync prints the names of files it copies to stdout. I use sed to capture just those filenames, excluding the informational messages, to a file. The filenames in that file can be processed later as you like.
The approach with sed above depends on rsync displaying filenames starting with the final part of the source directory, e.g. "Music/" in my example above, which is then removed assuming that you don't need it. Alternately, one could try an explicit approach for excluding noise messages.

Monitoring directory changes with iwatch - getting new file's name not the full path?

Im using iwatch to monitor changes in the directory - as soon as new video file is added to the directory I grab that file and using ffmpeg add overlay sound to it. Here is the script:
iwatch -e close_write -c "/root/bin/ffmpeg -i %f -i /var/www/video/sound.mp3 -map 0 -map 1 -codec copy -shortest /var/www/new/video/${%f:15}" /var/www/video
But I have a problem. I move newly created file to a different directory and I need to save it with the same file name, but iwatch has only %f variable which returns full path. Knowing that the first part of the path will always be "/var/www/video/" , I can use ${%f:15} to get substring with the file name.
But the script below doesn't work - bash says "bad substitution". So the problem is here - when I try to move the file to the new directory:
/var/www/new/video/${%f:15}
What is correct syntax, way to achieve my goal?
You can use the output of the basename command:
/var/www/new/video/`basename %f`

How to directly overwrite with 'unexpand' (spaces-to-tabs conversion)?

I'm trying to use something along the lines of
unexpand -t 4 *.php
but am unsure how to write this command to do what I want.
Weirdly,
unexpand -t 4 file.php > file.php
gives me an empty file. (i.e. overwriting file.php with nothing)
I can specify multiple files okay, but don't know how to then overwrite each file.
I could use my IDE, but there are ~67000 instances of to be replaced over 200 files, and this will take a while.
I expect that the answers to my question(s) will be standard unix fare, but I'm still learning...
You can very seldom use output redirection to replace the input. Replacing works with commands that support it internally (since they then do the basic steps themselves). From the shell level, it's far better to work in two steps, like so:
Do the operation on foo, creating foo.tmp
Move (rename) foo.tmp to foo, overwriting the original
This will be fast. It will require a bit more disk space, but if you do both steps before continuing to the next file, you will only need as much extra space as the largest single file, this should not be a problem.
Sketch script:
for a in *.php
do
unexpand -t 4 $a >$a-notab
mv $a-notab $a
done
You could do better (error-checking, and so on), but that is the basic outline.
Here's the command I used:
for p in $(find . -iname "*.js")
do
unexpand -t 4 $(dirname $p)/"$(basename $p)" > $(dirname $p)/"$(basename $p)-tab"
mv $(dirname $p)/"$(basename $p)-tab" $(dirname $p)/"$(basename $p)"
done
This version changes all files within the directory hierarchy rooted at the current working directory.
In my case, I only wanted to make this change to .js files; you can omit the iname clause from find if you wish, or use different args to cast your net differently.
My version wraps filenames in quotes, but it doesn't use quotes around 'interesting' directory names that appear in the paths of matching files.
To get it all on one line, add a semi after lines 1, 3, & 4.
This is potentially dangerous, so make a backup or use git before running the command. If you're using git, you can verify that only whitespace was changed with git diff -w.

Resources