Extract metadata from multiple videos - linux

I am faced with a challenge that requires multiple aspects of bash. I work in Linux (precisely Debian Stretch). Here is the situation (for all points/problem I write along the solution I considered for now, but I'm open to other ideas) :
I have videos of various types (and various upper-lower case), such as .mp4, .mov, .MOV, .MP4, .avi,... located in a directory (and spread across an almost un-structured tree of directories). To find all I tried to use the find command
For each video, I need to extract some metadata (i.e. the name of the file, duration of video, size of file and date of creation/last modification). The package mediainfo yields (among a lot of other things) the required fields.
The output of mediainfo is a long list of fields with format : <Tag>\t : <value>. I need to extract values for fields Complete name, Duration, File size and Encoded date.
So with all this information, I must filter the required fields value and put them in a CSV file. I considered using sed.
My goal is to achieve all these tasks either in a script or a small amount of separate commands.
The idea code (this code is hideously wrong, but you can get an idea) :
find . -type f -name "*.[mp4|MP4|mov|MOV|avi|AVI]" -exec mediainfo {} | sed '/Complete name|Duration|File size|Encoded date/p' > myfile.csv \;
Would you have any idea how to perform this task ? I feel terribly lost in combining find, exec and sed and outputting to a csv...
Thanks in advance for your help !

So I finally managed to write a script doing that. Probably not the best way to do, but here it is :
resFile="myresult.csv"
dstDir="./destination/"
srcDir="./source/"
#first copy all files at same level in dstDir (with preserve and update)
#this is somehow necessary, relative name for MOV files and mediainfo
#do not seem to work together.
find $srcDir -type f \( -name "*.mp4" -o -name "*.mov" -o -name "*.MOV" -o -name "*.avi" \) -exec cp -up {} $dstDir \;
#then for each file, output mediainfo of file and keep only interesting tags. add ### between each file.
find $dstDir -type f \( -name "*.mp4" -o -name "*.mov" -o -name "*.MOV" -o -name "*.avi"\
-exec sh -c " mediainfo --Output=XML {} | sed '1,15!d;/Duration\|Complete\|File_size\|Encoded_date/!d' >> $resFile && echo '########' >> $resFile" \;
#removes tags : <Duration>42s 15ms</Duration> -> 42s 15ms
sed -i 's/^<.*>\(.*\)<.*>/\1/I' $resFile
#Extract exact filename (and not relative)
sed -i 's/^\.\/.*\/\(.*\)\.[mp4|MOV|mov|avi|MP4]/\1/' $resFile
#Puts fields for a file on a unique line separated with commas
sed -i 'N;s/\n/,/;N;s/\n/,/;N;s/\n/,/;N;s/\n/,/' $resFile
#remove all trailing ###
sed -i 's/,#*$//' $resFile
I would still be interested if anyone has idea to improve the code.
I "minimized" a little bit, my actual code is a bit more modular and performs a few checks

Try this. Due to less time,I was not able to complete. You just have to send output to CSV.
for c in $(locate --basename .mp4 .mkv .wmv .flv .webm .mov .avi)
do
Complete_name=$(mediainfo --Output=XML $c | xml_grep 'Complete_name' --text_only| awk 'BEGIN{FS="/"}{print $NF}')
echo $Complete_name
Duration=$(mediainfo --Output=XML $c | xml_grep 'Duration' --text_only --nb_result 1)
echo $Duration
File_size=$(mediainfo --Output=XML $c | xml_grep 'File_size' --text_only)
echo $File_size
Encoded_date=$(mediainfo --Output=XML $c | xml_grep 'Encoded_date' --text_only -nb_result 1 | awk '{print $2}')
echo $Encoded_date
done

Related

How to read out a file line by line and for every line do a search with find and copy the search result to destination?

I hope you can help me with the following problem:
The Situation
I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
The filenames contain an ID and a string like:
"2022-01-11-02 super important file"
The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.
What I want to achieve:
I want to read out ids.txt line by line.
For every line in ids.txt I want to do a find search and copy cp the result to destination.
So far I tried:
for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt
The output folder remains empty. Not using -exec also gives no (search)results.
I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)
As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.
Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?
Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?
In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:
my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output
EDIT:
This is some example content of the ids.txt file where only ids are listed (not the whole filename):
2022-01-11-01
2022-01-11-02
2020-12-01-62
EDIT II:
Going on with the solution from tripleee:
#!/bin/bash
grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done
In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.
Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?
If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.
sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +
The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").
This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.
(Here's a fix for the latter case:
xargs printf '-o\n-name\n%s*\n' <ids.txt |
...
Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.
A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:
xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$#"); find -false ${arr[#]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe
where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)
A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)
while read -r id; do
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt
Please try the following bash script copier.sh
#!/bin/bash
IFS=$'\n' # make newlines the only separator
set -f # disable globbing
file="files.txt" # name of file containing filenames
finish="finish" # destination directory
while read -r n ; do (
du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
);
done < $file
which copies recursively all the files named in files.txt from . and it's subfiles to ./finish
This new version works even if there are spaces in the directory names or file names.

Need guidance with a bash script to check log files in a certain directory for a certain string

I would like to preface this with I am a complete noob with scripting. So I have a situation where I need to manually look for a phone number that could live in one of hundreds of files.
so the logs live in the following directory.
/actlogs/sbclogger_archive
The logs file names are in directories numbered 01-31 inside of that directory and all the files are zipped.
Inside of those numbered directories are tons of files but the only ones I want to search are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
So I need to look in all the files in the following directory.
"/actlogs/sbclogger_archive"
Which has 31 directories labeled "01-31"
Then in each 01-31 there is hundreds of files the only ones I want to look are are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
The script I am using is below, please let me know what I could do to make this work.
#!/bin/bash
read -p "Enter a phone number: " text
read -p "Enter directory of log file's, Hint it should be /actlogs/sbclogger_archive: " directory
#arr=( $(find $directory -type f -exec grep -l "$text" {} \; | sort -r) )
#find $directory -type f -exec grep -qe "$text" {} \; -exec bash -c '
file=$(find $directory -type f -name 'sipd.log*' -exec grep -qe "$text" {} \; -exec bash -c 'select f; do echo $f; break; done' find-sh {} +;)
if [ -z "$file" ]; then
echo "No matches found."
else
echo "select tool:"
tools=("nano" "less" "vim" "quit")
select tool in "${tools[#]}"
do
case $tool in
"quit")
break
;;
*)
$tool $file
break
;;
esac
done
fi
This would give you the list of files matching:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec sh -c 'gunzip -c {}| grep -m1 -q 888333' \; -print
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Note: -m1 tells grep to stop after first match, since you need only the file name in this case, it's enough.
If you have zgrep, you can shorten it to:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec zgrep -l '888333' {} \;
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Also, some of the tools you are suggesting do not support gzip files (nano and some variants of less for example). In which case you might need to decompress the file and compress it again when done.
And, you might want to consider a loop if you want to "quit". Feeding the file list to the tool doesn't make sense.
Note: AFAIK zgrep doesn't do recursive:
DESCRIPTION
Zgrep invokes grep on compressed or gzipped files. These grep options will cause zgrep to terminate with an
error code:
(-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*). All other options specified are passed directly to grep. If no file is specified, then
the
standard input is decompressed if necessary and fed to grep. Otherwise the given files are uncompressed if necessary and fed to
grep.
so zgrep -rl "$text" "$directory" or zgrep -rl --include 'simpd.log*.gz' "$test" {01..31} won't work except if you have a special zgrep
As you must unzip before using your tool, i would divide the problem in two blocks.
Firstly, i would expand the paths you need (looking under <directory> for the phone <text>), and then iterate to apply the tool (because some tools like vim or nano cannot be piped).
Try something like this:
#!/bin/bash
#...
# text/directory input stuff
#...
tmpdir=$(mktemp -d)
trap 'rm -rf ${tmpdir}' EXIT
while IFS= read -r file; do
unzipped=${tmpdir}/$(basename "${file}" .gz)
gunzip -c "${file}" > "${unzipped}"
${tool} "${unzipped}"
done < <(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null)
Above is the proposed invert-form by Charles Duffy following this Bash FAQ.
If you prefer to iterate an array, you could build in this way:
# shellcheck disable=SC2207
files=( $(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null) )
for file in "${files[#]}"; do
# etc.
as in our particular case, the files to match have no spaces in their names and shellcheck warning is not so important (hidden above).
BRs

Concatenate directory and file name in GNU parallel

I have the following directory structure:
dir1/file.ogg
dir2/file.ogg
dir3/file.ogg
file2.ogg
I would like to convert all .ogg files to .wav with GNU Parallel. Here's where I got thus far:
find . -name '*.*' -type f -print0 | parallel -0 ffmpeg -i {} outputdir/{/.}.wav
The problem here is that although obviously directories have different names, the files inside have the same name. The aforementioned command will continuously overwrite the content of the directory. What I'd like instead is:
outputdir/dir1_file.ogg
outputdir/dir2_file.ogg
outputdir/dir3_file.ogg
outputdir/dir3_file2.ogg
Essentially, I'd like to extract the subdirectory name and concatenate it with file basename and put my own extension.
Any ideas?
Using perl transformation, this command should achieve the required effect :
find . -name '*.ogg' -type f -print0 | parallel -0 echo ffmpeg -i {} outputdir/{= '$_ =~ s[^\./][]; $_ =~ s[/][_]g; $_ =~ s[ogg$][wav]g;' =}
Remove echo to run ffmpeg command

Bash Script Watermarking all .jpeg without tumbnails

problem
/a.jpg
/a-150x150.jpg
/a-300x300.jpg
/b.jpg
/b-150x150.jpg
/b-300x300.jpg
how do I provide bash script watermarking only on the main image without 150x150 and 300x300 participating in the watermark?
I tried some code but not working. 300x300 remain watermarked
file -i *.jpg | grep image | awk -F':' '{ print $1 }' | while read IMAGE
I just want
/a.jpg
/b.jpg
/c.jpg
You can use find to select exactly the files you want and then feed them to a while loop using process substitution:
#!/bin/bash
while read filename ; do
echo "$filename" # your code here
done < <( find . -name '*.jpg' -and -not -regex '.*-[0-9]+x[0-9]+\.jpg' )
This code uses -name to select all .jpg images and -regex to exclude the ones with a -<number>x<number> pattern (regex has to be written to match the whole filename).
If your code needs to safeguard against exotic filenames with even newlines, you should use the -print0 switch with find and the -d $'\0' switch with read.

How do I rename lots of files changing the same filename element for each in Linux?

I'm trying to rename a load of files (I count over 200) that either have the company name in the filename, or in the text contents. I basically need to change any references to "company" to "newcompany", maintaining capitalisation where applicable (ie "Company becomes Newcompany", "company" becomes "newcompany"). I need to do this recursively.
Because the name could occur pretty much anywhere I've not been able to find example code anywhere that meets my requirements. It could be any of these examples, or more:
company.jpg
company.php
company.Class.php
company.Company.php
companysomething.jpg
Hopefully you get the idea. I not only need to do this with filenames, but also the contents of text files, such as HTML and PHP scripts. I'm presuming this would be a second command, but I'm not entirely sure what.
I've searched the codebase and found nearly 2000 mentions of the company name in nearly 300 files, so I don't fancy doing it manually.
Please help! :)
bash has powerful looping and substitution capabilities:
for filename in `find /root/of/where/files/are -name *company*`; do
mv $filename ${filename/company/newcompany}
done
for filename in `find /root/of/where/files/are -name *Company*`; do
mv $filename ${filename/Company/Newcompany}
done
For the file and directory names, use for, find, mv and sed.
For each path (f) that has company in the name, rename it (mv) from f to the new name where company is replaced by newcompany.
for f in `find -name '*company*'` ; do mv "$f" "`echo $f | sed s/company/nemcompany/`" ; done
For the file contents, use find, xargs and sed.
For every file, change company by newcompany in its content, keeping original file with extension .backup.
find -type f -print0 | xargs -0 sed -i .bakup 's/company/newcompany/g'
I'd suggest you take a look at man rename an extremely powerful perl-utility for, well, renaming files.
Standard syntax is
rename 's/\.htm$/\.html/' *.htm
the clever part is that the tool accept any perl-regexp as a pattern for a filename to be changed.
you might want to run it with the -n switch which will make the tool to only report what it would have changed.
Can't figure out a nice way to keep the capitalization right now, but since you already can search through the filestructure, issue several rename with different capitalization until all files are changed.
To loop through all files below current folder and to search for a particular string, you can use
find . -type f -exec grep -n -i STRING_TO_SEARCH_FOR /dev/null {} \;
The output from that command can be directed to a file (after some filtering to just extract the file names of the files that need to be changed).
find . /type ... > files_to_operate_on
Then wrap that in a while read loop and do some perl-magic for inplace-replacement
while read file
do
perl -pi -e 's/stringtoreplace/replacementstring/g' $file
done < files_to_operate_on
There are few right ways to recursively process files. Here's one:
while IFS= read -d $'\0' -r file ; do
newfile="${file//Company/Newcompany}"
newfile="${newfile//company/newcompany}"
mv -f "$file" "$newfile"
done < <(find /basedir/ -iname '*company*' -print0)
This will work with all possible file names, not just ones without whitespace in them.
Presumes bash.
For changing the contents of files I would advise caution because a blind replacement within a file could break things if the file is not plain text. That said, sed was made for this sort of thing.
while IFS= read -d $'\0' -r file ; do
sed -i '' -e 's/Company/Newcompany/g;s/company/newcompany/g'"$file"
done < <(find /basedir/ -iname '*company*' -print0)
For this run I recommend adding some additional switches to find to limit the files it will process, perhaps
find /basedir/ \( -iname '*company*' -and \( -iname '*.txt' -or -ianem '*.html' \) \) -print0

Resources