I am trying to detect silence at the end of an audio file.
I have made some progress with ffmpeg library. Here I used silencedetect to list all the silences in an audio file.
ffmpeg -i audio.wav -af silencedetect=n=-50dB:d=0.5 -f null - 2> /home/aliakber/log.txt
Here is the output of the command:
--With silence at the front and end of the audio file--
[silencedetect # 0x1043060] silence_start: 0.484979
[silencedetect # 0x1043060] silence_end: 1.36898 | silence_duration: 0.884
[silencedetect # 0x1043060] silence_start: 2.57298
[silencedetect # 0x1043060] silence_end: 3.48098 | silence_duration: 0.908
[silencedetect # 0x1043060] silence_start: 4.75698
size=N/A time=00:00:05.56 bitrate=N/A
--Without silence at the front and end of the audio file--
[silencedetect # 0x106fd60] silence_start: 0.353333
[silencedetect # 0x106fd60] silence_end: 1.25867 | silence_duration: 0.905333
[silencedetect # 0x106fd60] silence_start: 2.46533
[silencedetect # 0x106fd60] silence_end: 3.37067 | silence_duration: 0.905333
size=N/A time=00:00:04.61 bitrate=N/A
But I want something more flexible so that I can manipulate the output and do further task depending on the result.
I want to get the output something like true or false. If there is a certain period of silence exists at the end of the audio file it will return true and false otherwise.
Can someone suggest me an easy way to achieve this?
Try this:
ffmpeg -i audio.wav -af silencedetect=n=-50dB:d=0.5 -f null - 2>&1 | grep -Eo "silence_(start|end)" | tail -n 1 | grep "start" | wc -l
Output:
1 - there is silence at the end
0 - there is no silence at the end
Explanation:
As I see in the silence case there is no silence_end at the end of log.
2>&1 - redirect stderr to stdin
grep -Eo "silence_(start|end)" - filter log and keep only silence_start and silence_end from log. Each by new line.
tail -n 1 - get last line. (if it is. So now we there are 3 cases of state: 'silence_start', 'silence_end', <empty>)
grep "start" - keep line only if it contains start (2 cases: 'silence_start', <empty>)
wc -l - get number of lines. (1 in 'silence_start' and 0 in <empty> case)
The answer from #tarwirdur-turon doesn't work for me (in 2023 and ffmpeg version 5.1.2).
I came up with a somewhat convoluted script to do it. Convoluted, because it does error checking.
It uses 2 calls: ffprobe + ffmpeg to find reliably the duration of the audio file and tests it against the last silence_end by divison of the found values, which should be very close to 1.00. You can change the scale for calculating the division and various other values at the beginning of the script.
#! /bin/bash
set -e
INPUT="$1"
NOISE_FLOOR="-60db"
MIN_DUR=0.1
SCALE=2
[ -z "$INPUT" ] && echo "Needs audio file !" && exit 1
echo -n "$INPUT ends with silence: "
dur=$(ffprobe -i $INPUT -show_entries format=duration -v quiet -of csv="p=0" 2>&1)
if [ -z "$dur" ]; then
echo "FALSE" && exit 1
fi
# xargs alone trims spaces
last_silence_end=$(ffmpeg -i $INPUT -af silencedetect=noise=$NOISE_FLOOR:d=$MIN_DUR -f null - 2>&1 | grep silence_end | tail -n 1 | cut -d ' ' -f 5)
if [ -z "$last_silence_end" ]; then
echo "FALSE" && exit 0
fi
factor=$(bc <<<"scale=$SCALE; $dur / $last_silence_end")
if [ "$factor" == "1.00" ]; then
echo "TRUE"
else
echo "FALSE"
fi
exit 0
Related
I am learning bioinformatics.
I want to find GC content from a fasta file using Bash script.
GC content is basically (number of (g + c)) / (number of (a + t + g + c)).
I am trying to use wc command. But I was not able to get an answer.
Edit 17th Feb 2023.
After going through documentation and videos, I came up with a solution.
filename=$# # collecting all the filenames as parameters
for f in $filename # Looping over files
do
echo " $f is being processed..."
gc=( $( grep -v ">" < "$f" | grep -io 'g\|c'< "$f" | wc -l)) # Reading lines that dont start with < using -v. grep -io matches to either g or c and outputs each match on single line. wc -l counts the number of lines or indirectly the number of g and c. This is stored in a variable.
total=( $( grep -v ">" < "$f" | tr -d '\s\r' | wc -c)) # Spaces, tabs, new line are removed from the file using tr. Then the number of characters are counted by wc -c
percent=( $( echo "scale=2;100*$gc/$total" |bc -l)) # bc -l is used to get the answer in float format. scale=2 mentions the number of decimal points.
echo " The GC content of $f is: "$percent"%"
echo
done
Do not reinvent the wheel. For common bioinformatics tasks, use open-source tools that are specifically designed for these tasks, are well-tested, widely used, and handle edge cases. For example, use EMBOSS infoseq utility. EMBOSS can be easily installed, for example using conda.
Example:
Install EMBOSS package (do once):
conda create --name emboss emboss --channel iuc
Activate the conda environment and use EMBOSS infoseq, here to priitn the sequence name, length and percent GC:
source activate emboss
cat your_sequence_file_name.fasta | infoseq -auto -only -name -length -pgc stdin
source deactivate
This prints into STDOUT something like this:
Name Length %GC
seq_foo 119 60.50
seq_bar 104 39.42
seq_baz 191 46.60
...
This should work:
#!/usr/bin/env sh
# Adapted from https://www.biostars.org/p/17680
# Fail on error
set -o errexit
# Disable undefined variable reference
set -o nounset
# ================
# CONFIGURATION
# ================
# Fasta file path
FASTA_FILE="file.fasta"
# Number of digits after decimal point
N_DIGITS=3
# ================
# LOGGER
# ================
# Fatal log message
fatal() {
printf '[FATAL] %s\n' "$#" >&2
exit 1
}
# Info log message
info() {
printf '[INFO ] %s\n' "$#"
}
# ================
# MAIN
# ================
{
# Check command 'bc' exist
command -v bc > /dev/null 2>&1 || fatal "Command 'bc' not found"
# Check file exist
[ -f "$FASTA_FILE" ] || fatal "File '$FASTA_FILE' not found"
# Count number of sequences
_n_sequences=$(grep --count '^>' "$FASTA_FILE")
info "Analyzing $_n_sequences sequences"
[ "$_n_sequences" -ne 0 ] || fatal "No sequences found"
# Remove sequence wrapping
_fasta_file_content=$(
sed 's/\(^>.*$\)/#\1#/' "$FASTA_FILE" \
| tr --delete "\r\n" \
| sed 's/$/#/' \
| tr "#" "\n" \
| sed '/^$/d'
)
# Vars
_sequence=
_a_count_total=0
_c_count_total=0
_g_count_total=0
_t_count_total=0
# Read line by line
while IFS= read -r _line; do
# Check if header
if printf '%s\n' "$_line" | grep --quiet '^>'; then
# Save sequence and continue
_sequence=${_line#?}
continue
fi
# Count
_a_count=$(printf '%s\n' "$_line" | tr --delete --complement 'A' | wc --bytes)
_c_count=$(printf '%s\n' "$_line" | tr --delete --complement 'C' | wc --bytes)
_g_count=$(printf '%s\n' "$_line" | tr --delete --complement 'G' | wc --bytes)
_t_count=$(printf '%s\n' "$_line" | tr --delete --complement 'T' | wc --bytes)
# Add current count to total
_a_count_total=$((_a_count_total + _a_count))
_c_count_total=$((_c_count_total + _c_count))
_g_count_total=$((_g_count_total + _g_count))
_t_count_total=$((_t_count_total + _t_count))
# Calculate GC content
_gc=$(
printf 'scale = %d; a = %d; c = %d; g = %d; t = %d; (g + c) / (a + c + g + t)\n' \
"$N_DIGITS" "$_a_count" "$_c_count" "$_g_count" "$_t_count" \
| bc
)
# Add 0 before decimal point
_gc="$(printf "%.${N_DIGITS}f\n" "$_gc")"
info "Sequence '$_sequence' GC content: $_gc"
done << EOF
$_fasta_file_content
EOF
# Total data
info "Adenine total count: $_a_count_total"
info "Cytosine total count: $_c_count_total"
info "Guanine total count: $_g_count_total"
info "Thymine total count: $_t_count_total"
# Calculate total GC content
_gc=$(
printf 'scale = %d; a = %d; c = %d; g = %d; t = %d; (g + c) / (a + c + g + t)\n' \
"$N_DIGITS" "$_a_count_total" "$_c_count_total" "$_g_count_total" "$_t_count_total" \
| bc
)
# Add 0 before decimal point
_gc="$(printf "%.${N_DIGITS}f\n" "$_gc")"
info "GC content: $_gc"
}
The "Count number of sequences" and "Remove sequence wrapping" codes are adapted from https://www.biostars.org/p/17680
The script uses only basic commands except for bc to do the precision calculation (See bc installation).
You can configure the script by modifying the variables in the CONFIGURATION section.
Because you haven't indicated which one you want, the GC content is calculated for both each sequence and the overall. Therefore, get rid of anything that isn't necessary :)
Despite my lack of bioinformatics background, the script successfully parses and analyzes a fasta file.
This question already has answers here:
Bash script stops execution of ffmpeg in while loop - why?
(3 answers)
Execute "ffmpeg" command in a loop [duplicate]
(3 answers)
Closed 7 days ago.
I am trying to split audio files by their chapters. I have downloaded this as audio with yt-dlp with its chapters on. I have tried this very simple script to do the job:
#!/bin/sh
ffmpeg -loglevel 0 -i "$1" -f ffmetadata meta # take the metadata and output it to the file meta
cat meta | grep "END" | awk -F"=" '{print $2}' | awk -F"007000000" '{print $1}' > ends #
cat meta | grep "title=" | awk -F"=" '{print $2}' | cut -c4- > titles
from="0"
count=1
while IFS= read -r to; do
title=$(head -$count titles | tail -1)
ffmpeg -loglevel 0 -i "$1" -ss $from -to $to -c copy "$title".webm
echo $from $to
count=$(( $count+1 ))
from=$to
done < ends
You see that I echo out $from and $to because I noticed they are just wrong. Why is this? When I comment out the ffmpeg command in the while loop, the variables $from and $to turn out to be correct, but when it is uncommented they just become some stupid numbers.
Commented output:
0 465
465 770
770 890
890 1208
1208 1554
1554 1793
1793 2249
2249 2681
2681 2952
2952 3493
3493 3797
3797 3998
3998 4246
4246 4585
4585 5235
5235 5375
5375 5796
5796 6368
6368 6696
6696 6961
Uncommented output:
0 465
465 70
70 890
890 08
08 1554
1554 3
3 2249
2249
2952
2952 3493
3493
3998
3998 4246
4246 5235
5235 796
796 6368
6368
I tried lots of other stuff thinking that they might be the problem but they didn't change anything. One I remember is I tried havin $from and $to in the form of %H:%M:%S which, again, gave the same result.
Thanks in advance.
Here is an untested refactoring; hopefully it can at least help steer you in another direction.
Avoid temporary files.
Avoid reading the second input file repeatedly inside the loop.
Refactor the complex Awk scripts into a single script.
To be on the safe side, add a redirection from /dev/null to prevent ffmpeg from eating the input data.
#!/bin/sh
from=0
ffmpeg -loglevel 0 -i "$1" -f ffmetadata - |
awk -F '=' '/END/ { s=$2; sub(/007000000.*/, "", s); end[++i] = s }
/title=/ { t=$2; sub(/^([^-]-){3}/, "", t); title[++j] = t }
END { for(n=1; n<=i; n++) print end[n]; print title[n] }' |
while IFS="" read -r end; do
IFS="" read -r title
ffmpeg -loglevel 0 -i "$1" -ss "$from" -to "$end" -c copy "$title".webm </dev/null
from="$end"
done
The Awk script reads all the data into memory, and then prints one "end" marker followed by the corresponding title on the next line; I can't be sure what your ffmpeg -f ffmetadata command outputs, so I just blindly refactored what your scripts seemed to be doing. If the output is somewhat structured you can probably read one record at a time.
This has probably been created before and better than mine. I have a directory where files are created for a few milliseconds before being removed. I researched and couldn't find what I was looking for so I made something to do it and added a few more features.
How it works:
you run the script, input the directory, and input the time you want it to run from 10 seconds to 6000 seconds (1 hour). It validates what you enter to make sure the directory is real and you don't exceed or go below that time. using sdiff -s it will compare the state of the directory when the script began to a new version of it ever 0.001 seconds. If there are changes it will tell you.
I wanted to share it since other may find it useful, and more importantly ask if you guys had improvements. I have been doing a lot of self-taught (mostly using stack exchange) bash scripting for almost a year and I really love it. I am always looking to improve me code. I am new to interactive scripts so if you guys have recommendations for input validation I'd love to hear it. I couldn't figure out how to get the "if" statements for time in seconds combined to check for anything less than 10 and greater than 6,000 despite trying a lot of things so I just made them separate. The "sed" portions are kind of wonky here and I didn't do a great job optimizing. I just worked on them until the output was what I wanted.
EDIT: I don't have inotify and I don't think I could get it on this locked down system.
#!/bin/bash
# Directory Check Script
# Created 13 Aug 2022
CLISESSID=$$
export CLISESSID
### DEFINE A LOCATION WHERE FILES CAN BE TEMPORARILY MADE ###
tmp=/tmp
temp1=$tmp/temp1.txt
temp2=$tmp/temp2.txt
echo "This script will check a directory to see if any files were added for the length of time you specify"
read -ep 'What is the full directory you would like to verify? ' dir
if [ ! -d "$dir" ] ; then
echo "Directory does not exist. Exiting."
exit
fi
read -ep '(This must be between 10-6000. i.e 5 minutes = 300, 10 minutes = 600, 1 hour = 6000)
How many seconds would you like to check for? ' seconds
if [[ "$seconds" -lt 10 ]] ; then
echo "Seconds must be between 10 and 6000"
exit
fi
if [[ "$seconds" -gt 6000 ]] ; then
echo "Seconds must be between 10 and 6000"
exit
fi
echo "checking $dir for $seconds seconds."
ls --full-time $dir | tail -n +2 > $temp1
SECONDS=0
echo "Checking for changes to $dir every 0.001 seconds for $seconds seconds."
until [[ $(ls --full-time $dir | tail -n +2) != $(cat "$temp1") ]] > /dev/null 2>&1
do
if (( SECONDS > $seconds ))
then
echo "Exceded defined time of $seconds seconds. Exiting."
exit 1
fi
sleep 0.001
done
ls --full-time $dir | tail -n +2 > $temp2
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " |" | wc -l) -gt 0 ]] ; then
echo "
File has been modified in $dir:"
sdiff -w 400 -s $temp1 $temp2 | sed 's/|/\n/' | sed 's/^ *//g' | sed '1~ i Before:' | sed '3~ i After:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//'
fi
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " >" | wc -l) -gt 0 ]] ; then
echo "
File has been added to $dir:"
sdiff -w 400 -s /tmp/temp1.txt /tmp/temp2.txt | sed 's/>/\n/' | grep -v " |" | sed 's/^ *//g' | sed '1~ i Added file:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//' | sed '/./!d'
fi
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " <" | wc -l) -gt 0 ]] ; then
echo "
File has removed modified in $dir:"
sdiff -w 400 -s $temp1 /$temp2 | sed 's/</\n/' | grep -v " |" | sed 's/^ *//g' | sed '1~ i Removed file:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//' | sed '/./!d' | sed 's/ *$//'
fi
rm -f $temp1 $temp2
I made a short bash program to download podcasts and retrieve only last 20 seconds.
Strange thing is it fails downloading every other iteration. There seems to be a problem with the function trim_nsec, because when I get rid of it in the loop, all the rest correctly works.
Edit : addition of double quotes, which doesn't solve the problem
<!-- language: lang-bash -->
#!/bin/bash
# Get podcast list
wget -O feed http://www.rtl.fr/podcast/on-n-est-pas-forcement-d-accord.xml
function trim_nsec () {
# arguments : 1 : mp3file - 2 : duration - 3 : outputfile
duration=$(ffprobe -i "${1}" -show_entries format=duration -v quiet -of csv="p=0")
nth_second=$(echo "${duration} - ${2}"|bc)
ffmpeg -i "${1}" -ss "${nth_second}" "${3}"
}
cpt=1
# let's work only on the 4th first files
grep -Po 'http[^<]*.mp3' feed|grep admedia| head -n 4 > list
cat list | while read i
do
year=$(echo "$i" | cut -d"/" -f6)
day=$(echo "$i" | cut -d"/" -f7)
fullname=$(echo "$i" | awk -F"/" '{print $NF}')
fullnameend=$(echo "$fullname" |sed -e 's/\.mp3$/_end\.mp3/')
new_name=$(echo "$year"_"$day"_"$fullnameend")
# let's download
wget -O "$fullname" "$i"
# let's trim last 20 sec
trim_nsec "$fullname" 20 "$new_name"
echo "$cpt file processed"
#delete orig. file :
rm "$fullname"
((cpt++))
done
Any idea ?
The problem is most likely due to the fact that on errors, ffmpeg will try to get an input from user which will consume the input provided by cat list. See a similar question here or here. To prevent trim_nsec from consuming the input from cat list, you could do:
cat list | while read i
do
year=$(echo "$i" | cut -d"/" -f6)
day=$(echo "$i" | cut -d"/" -f7)
fullname=$(echo "$i" | awk -F"/" '{print $NF}')
fullnameend=$(echo "$fullname" |sed -e 's/\.mp3$/_end\.mp3/')
new_name=$(echo "$year"_"$day"_"$fullnameend")
# let's download
wget -c -O "$fullname" "$i"
# let's trim last 20 sec
trim_nsec "$fullname" 20 "$new_name" <&3
echo "$cpt file processed"
#delete orig. file :
#rm "$fullname"
((cpt++))
done 3<&1
I made a simple script that divides a flv file into multiple parts, converts them all to .mp4 individually and then merge all of them to form a final mp4 file. I did this to save time and convert large files in parallel.
However, I am stuck because the command that normally runs on command line for ffmpeg, doesn't run via script.
I am kind of stuck here and will like to have some assistance.
#!/bin/bash
#sleep 5
filenametmp=$1;
filename=`echo "$filenametmp" | awk '{split($0,a,"."); print a[1]}'`
echo $filename
output="$filename-output"
filenamewithoutpath=`echo "$output" | awk '{split($0,a,"/"); print a[4]}'`
echo $output $filenamewithoutpath
/usr/bin/ffmpeg -i $filenametmp -c copy -map 0 -segment_time $2 -f segment $output%01d.flv
#sleep 10
#echo "/bin/ls -lrt /root/storage/ | /bin/grep $filenamewithoutpath | /usr/bin/wc -l"
filecounttmp=`/bin/ls -lrt /opt/storage/ | /bin/grep $filenamewithoutpath | /usr/bin/wc -l`
filecount=`expr $filecounttmp - 1`
echo $filecount
for i in `seq 0 $filecount`
do
suffix=`expr 0000 + $i`
filenametoconvert="$output$suffix.flv"
convertedfilename="$output$suffix.mp4"
echo $filenametoconvert
/usr/bin/ffmpeg -i $filenametoconvert -c:v libx264 -crf 23 -preset medium -vsync 1 -r 25 -c:a aac -strict -2 -b:a 64k -ar 44100 -ac 1 $convertedfilename > /dev/null 2>&1 &
done
sleep 5
concatstring=""
for j in `seq 0 $filecount`
do
suffix=`expr 0000 + $j`
convertedfilenamemp4="$output$suffix.mp4"
#concatstring=`concat:$concatstring|$convertedfilenamemp4`
echo "file" $convertedfilenamemp4 >> $filename.txt
#ffmpeg -i concat:"$concatstring" -codec copy $filename.mp4
#ffmpeg -f concat -i $filename.txt -c copy $filename.mp4
done
echo $concatstring
ffmpeg -f concat -i $filename.txt -c copy $filename.mp4
rm $output*
rm $filename.txt
I run any flv file like this :
./ff.sh /opt/storage/tttttssssssssss_573f5b1cd473202daf2bf694.flv 20
I get this error message :
moov atom not found
I am on Ubuntu 14.04 LTS version, standard installation of ffmpeg.