Remove human voice from audio/video with software or command line

Remove human voice from audio/video with software or command line - audio

I'm seeking a way to remove a human voice from a video. Initially, I had the following:
video1.mp4
voice1.mp3
video1 has images and only no-human-voice sounds while voice1 has only one human voice Then I combined video1 with voice1 to create video2.m4, so in video2 I can hear both audios from video1 and from voice1. It is worth to mention that both video1 and voice1 have the same length of about 2 minutes.
This was one year ago. I deleted video1.mp4 accidentally, but I still have video2 and voice1.. Now I need to get video1.mp4 again. In other words, how to remove voice1 from video2? How to remove the human voice from video2?
I don't care if this is through software, command line, or even computer code (maybe Phyton; I've heard that Python can do cool stuff with audio).
Note: there is a similar question here in StackOverflow (Removal of Human Voice from a video or audio file), but it doesn't explain how to remove the audio.

The Problem
Rather than thinking about this as a problem of removing an unwanted voice, I would think of this as simply undoing the sum of two signals. At the moment we have three audio signals to consider, lets call them
A: The audio track to video1.mp4
B: The audio of voice1.mp3
C: The sum of A and B (i.e. C = A + B) which is now the audio track to video2.mp4
We no longer have access to A, but we still have B and C.
The ideal case
The ideal case assumes:
A is the same length as B
Summing of the two signals was done without any filtering
Solution
The solution in this case is fairly trivial, all we need to do is multiply B by a gain value of -1 (i.e. invert) and sum that with the signal C.
if
C = A + B
then
A = C - B
A = C + (B * -1)
Given you summed these signals in the first place, I assume you have access to some audio/video editing software. To invert B you could import the file into one of the following:
Audacity
Garage Band
FFMPEG
Adobe Premier / Audition
Final Cut Pro
Any software that can edit audio should also be capable of inverting audio signals. It would probably be ideal to have C and B in the same project for whatever DAW you are using to make tweaks on the fly.
Caveats
If the gain of B was changed (i.e. C = A + xB), then the solution is still fairly trivial as you just multiply -B by a factor x
If B is not the same length as A you will need to align -B with C correctly in order for the signals to cancel.
The non-ideal case
If you consider there has been some process applied to B the solution is a little more involved. This process could be anything, filtering (EQ), delay, reverb, pitch shift, speed-shift. Lets call this process a function H() and if B is the input to that process then that is: H(B).
i.e
C = A + H(B)
We can on longer simply invert B, we need to now apply exactly the same process to B first, then invert as H(B) may not be equal to H(-B). You should process B in exactly the same way first and then invert it at the end.
If for whatever reason you can't remember the process applied to B, then that leaves you a little stuck. Your best bet in that case is to try and re-create the process with trial an error. There is likely to be some remnant of H(B) unless you match the process exactly.

Related

Avoid impossible states with a data model for a card game in Haskell

I am trying to implement the Jass card game in Haskell and would like to make the game state as explicit as possible or avoiding impossible states with types.
The standard variant involves four players. The players opposite each other form a team.
In a game rounds are played until one team reaches the target score.
At the start of each round each player is dealt with 9 cards one player gets to choose the trump, and lead the first trick.
In a trick every player has to play one card. The cards are compared with respect to the leading suit and trump and the player with the highest card wins the trick. The points of the tricks are added to the score of the winner's team. The winner also is lead on the next trick. In a round tricks are played until each player has no cards left.
There is a more detailed explanation on wikipedia and a german explanation on jassa.at.
My data model for the player looks something like this
data Player = Player
{ playerID :: PlayerID
, name :: String
, cards :: Set Card
, sendMessage :: Message -> IO ()
, receiveAction :: IO Action
}
this way I can just use different send and receive functions for like command line or network players.
I would like the represent the game as a state machine with a pure update function, however in most states there is only one action from only one player that is valid and so i think that in the update function a big part is just handling invalid inputs.
I don't know what would be the best way to represent these parts of the game state.
The Players.
My first idea was to use a simple list and rotate this list every time so the current player is always the head. This is easy to do, but i also think it's easy for something to go wrong, like what if the list is empty, or there is only one player and so on...
Use an Array for the players and use the index for the current player. This has the advantage that to get to the next player I just have to increase the index. Somewhere i have to use mod to cycle the array but that's not really a problem. I also tried this with XDataKinds to have the size of the array on the type level and so the nextPlayer function can handel the mod. This also has the advantage that for the teams I can just use the even and odd player indices. But with this i have to store a extra Map from playerID or index to the cards of the players. So now the array and map can get out of sync.
Trick
I am not sure if i should store a list of played cards and who played them or just a record with the lead, highest card, winner and the value of the played cards and keep track of all played cards in a writer monad.
Rounds and Tricks
This is kinda the same for both, should i store a list with all played rounds and tricks or only the current round and trick and save the sum of the points from the previous rounds/tricks

There is temptation to have the type checker prove your program is perfect. However, that will become an endless pursuit. There really are not the hours in the day to spend teaching a computer how to be sure what you're sure is sure is for sure for every surety.
What I do is start with some way of solving the problem and then learn as I go what the pain points are. Which mistakes am I actually prone to making? In your case, I would implement the game (or part of) in some way I can figure out, and then from that experience I will know how to do it better.
cycle :: [a] -> [a] takes a list and repeats it forever. You can do this for your players and take the head of the list forever.
For non-empty lists there is Data.List.NonEmpty.
A way to only construct valid game states is to define an abstract data type. Rather than exporting the data constructors for a type, you only export functions of your own definition which can construct the type. That way, you can do whatever (runtime) checking or fixing you want.
Another tool is unit testing. Encoding propositions as types is difficult, especially in Haskell, and so the vast majority will not be. Instead, you can use property-based unit testing to recover a modicum of assurance.

Detection of similar sequences in ordered event lists

I have logs from a bunch (millions) of small experiments.
Each log contains a list (tens to hundreds) of entries. Each entry is a timestamp and an event ID (there are several thousands of event IDs, each of may occur many times in logs):
1403973044 alpha
1403973045 beta
1403973070 gamma
1403973070 alpha
1403973098 delta
I know that one event may trigger other events later.
I am researching this dataset. I am looking for "stable" sequences of events that occur often enough in the experiments.
Is there a way to do this without writing too much code and without using proprietary software? The solution should be scalable enough, and work on large datasets.
I think that this task is similar to what bioinformatics does — finding sequences in a DNA and such. Only my task includes many more than four letters in an alphabet... (Update, thanks to #JayInNyc: proteomics deals with larger alphabets than mine.)
(Note, BTW, that I do not know beforehand how stable and similar I want my sequences, what is the minimal sequence length etc. I'm researching the dataset, and will have to figure this out on the go.)
Anyway, any suggestions on the approaches/tools/libraries I could use?
Update: Some answers to the questions in comments:
Stable sequences: found often enough across the experiments. (How often is enough? Don't know yet. Looks like I need to calculate a top of the chains, and discard rarest.)
Similar sequences: sequences that look similar. "Are the sequences 'A B C D E' and 'A B C E D' (minor difference in sequence) similar according to you? Are the sequences 'A B C D E' and 'A B C 1 D E' (sequence of occurrence of selected events is same) also similar according to you?" — Yes to both questions. More drastic mutations are probably also OK. Again, I'd like to be able to calculate a top and discard the most dissimilar...
Timing: I can discard timing information for now (but not order). But it would be cool to have it in a similarity index formula.
Update 2: Expected output.
In the end I would like to have a rating of most popular longest stablest chains. A combination of all three factors should have effect in the calculation of the rating score.
A chain in such rating is, obviously, rather a cluster of similar enough chains.
A synthetic example of a chain-cluster:
alpha
beta
gamma
[garbage]
[garbage]
delta
another:
alpha
beta
gamma|zeta|epsilon
delta
(or whatever variant did not came to my mind right now.)
So, the end output would be something like that (numbers are completely random in this example):
Chain cluster ID | Times found | Time stab. factor | Chain stab. factor | Length | Score
A | 12345 | 123 | 3 | 5 | 100000
B | 54321 | 12 | 30 | 3 | 700000

I have thought about this setup for the past day or so -- how to do it in a sane scalable way in bash, etc.. The answer is really driven by the relational information you are wanting to draw from the data and the apparent size of the dataset you currently have. The xleanest solution will be to load you datasets into a relational database (MariaDB would by my recommendation)
Since your data already exists in an fairly clean format, your options for getting the data into a database are 2. (1) if the files have the data in a usable rowxcol setup, then you can simply use LOAD DATA INFILE to bring your data into the database; or (2) parse the files with bash in a while read line; do scenario, parse the data to get the data in the table format you desire, and use mysql batch mode to directly load the information into mysql in a single pass. The general form of the bash command would be mysql -uUser -hHost database -Bse "your insert command".
Once in a relational database, you then have the proper tool for the job of being able to run flexible queries against your data in a sane manner instead of continually writing/re-writing bash snippets to handle your data in a different way each time. That is probably the best scalable solution you are looking for. A little more work up-front, but a lot better setup going forward.

Wikipedia defines algorithm as 'a precise list of precise steps': 'I am looking for "stable" sequences of events that occur often enough in the experiments.' "Stable" and "often enough" without definition makes the task of giving you an algorithm impossible.
So I give you the trivial one to calculate the frequency of sequences of length 2. I will ignore the time stamp. Here is the awk code (pW stands for previous Word, pcs stands for pair counters):
#!/usr/bin/awk -f
BEGIN { getline; pW=$2; }
{ pcs[pW, $2]++; pW=$2; }
END {
for (i in pcs)
print i, pcs[i];
}
I duplicated your sample to show something meaningful looking
1403973044 alpha
1403973045 beta
1403973070 gamma
1403973070 alpha
1403973098 delta
1403973044 alpha
1403973045 beta
1403973070 gamma
1403973070 beta
1403973098 delta
1403973044 alpha
1403973045 beta
1403973070 gamma
1403973070 beta
1403973098 delta
1403973044 alpha
1403973045 beta
1403973070 gamma
1403973070 beta
1403973098 delta
Running the code above on it gives:
gammaalpha 1
alphabeta 4
gammabeta 3
deltaalpha 3
betagamma 4
alphadelta 1
betadelta 3
which can be interpreted as alpha followed by beta and beta followed by gamma are the most frequent length two sequences each occurring 4 times in the sample. I guess that would be your definition of stable sequence occurring often enough.
What's next?
(1) You can easily adopt the code above to sequences of length N and to find sequences occurring often enough you can sort (-k2nr) the output on the second column.
(2) To put a limit on N you can stipulate that no event triggers itself, that provides you with a cut-off point. Or you can place a limit on the timestamp ie the difference between consecutive events.
(3) So far those sequences were really strings and I used exact matching between them (CLRS terminology). Nothing prevents you from using your favourite similarity measure instead:
{ pcs[CLIFY(pW, $2)]++; pW=$2; }
CLIFY would be a function which takes k consecutive events and puts them into a bin ie maybe you want ABCDE and ABDCE to go to the same bin. CLIFY could of course take as an additional argument the set of bins so far.
The choice of awk is for convenience. It wouldn't fly, but you can easily run them in parallel.
It is unclear what you want to use this for but a google search for Markov chains, Mark V Shaney would probably help.

What is the difference in Natural when it comes to MOVE and (=) in statements?

I have only been programming in Natural for a couple of weeks over the course of a couple of years. I only do enough of it to get myself by.
Question: What is the difference between the MOVE a TO b and the a = b?
Code:
MOVE A TO B
MOVE D TO Y
Or
A = B
C = D

If you are using a Licensed product, you should have access to documentation at your site.
Software AG are the vendor. I found this with a simple internet search: http://documentation.softwareag.com/natural/nat638vms/general/print.htm
That is a manual for Natural on OpenVMS. It makes references to the Mainframe version, and looks good enough to answer your question.
This seems to be, at the simplest level, they are the same. However, if you want to do a calculation, you need the COMPUTE =, that can't be done with MOVE. There are various formats of the MOVE statement.
I have never used Natural, and can't test it. You have access to the product, that along with documentation will allow you to provide a full answer for yourself.

I think from what I can remember of Natural that basically they are they same. But I also remember that there are some difference.
For the most part I used = just because if you are using C++ that is a more common way of looking at it.
MOVE Your-Value TO Another-Value
is for the most part equal to
Another-Value = Your-Value
But I think where it is different slightly is as to what computations that you can and cannot perform with the = rather than the MOVE. You can MOVE to multiple values like below but the = can only move to a single variable.
MOVE A TO C D BaseBallScore
This is very useful if you have to move a lot of values at one time to several different counters but you could move one at a time. Like below
MOVE A TO C
MOVE A TO D
MOVE A TO BaseBallScore
There are also some functions that you can use with the MOVE that make it a nice option. Such as rounding a number
MOVE ROUNDED Value To NewValue <-- ROUNDED can take different parameters
Here is another function SUBSTRING that will let you move a part of a string to another part of the string. Normally I use the = just because that is how the boss does it but the MOVE statement gives a programmer a bit more flexibility.
MOVE SUBSTRING(#A,5,8) TO #B
An online reference for the move is located here:
http://documentation.softwareag.com/natural/nat638vms/print/sm.pdf

How to generate unique string?

I have a device and for each device I wish to generate a string of the following format: XXXXXXXX. Each X is either B, G, or R. An example is GRBRRBRB. This gives me roughly 7000 keys to work with which is enough as I doubt I'll have more devices.
I was thinking I could generate them all before hand, and dump them in a file or something, and just get the next key available from that, but I wonder if there is a better way to do this.
I know there are better ways to do it if I don't need guaranteed uniqueness but I definitely need that so I'm not sure what the best way to do it is.

Treat it as a ternary representation of a number, where R=0, B=1, G=2. So when you're writing the nth ID, the first digit is R if n % 3 == 0, B if n % 3 == 1, G otherwise. The second digit is the same, except you're looking at (n / 3) % 3; then for the third digit look at (n / 3^2) % 3; etc.

If your devices have any unique sequential ID available, I would implement a deterministic algorithm for that 'unique string' retrieval.
I mean something like
GetDeviceRgbString(deviceid) { // deterministic algorithm returning appropriate value using device id to choose it }
Otherwise consider storing it somewhere (depends on your environment, you gave little data about it, but that may be file, database, ... ) and marking them as used, preferably keep the data about assigned device, you may need it once.

I'm going to assume you want them unique to identify them for some type of network (internet) activity. That being the case, I would have a web service that can take care of making sure each device is unique by handing out IDs. The software on each device would see if it has an ID (stored locally), and if not, request one from the web service.

Programmatically merging two pieces of audio

I have two arrays of samples of two different audio clips. If I just programmatically add them together will this be the equivalent of layering one track over another in an audio editing suite? Like if I have one audio clip of bass the other of a drum and I want them playing together.
I would probably do something like this:
for (int i = 0; i < length_of_array; i++){
final_array[i] = first_array[i] + second_array[i];
}
If it is not done this way, could I get some indication of what would be the correct way?

This IS a correct way. Merging is called MIXING in audio jargon.
BUT:
If your samples are short (16 bit signed) - you will have to use int (32 bit signed) for addition and then clip the samples manually. If you don't, your values will wrap and you'll have so much fun listening to what you did :)
Here comes the code:
short first_array[1024];
short second_array[1024];
short final_array[1024];
for (int i = 0; i < length_of_array; i++)
{
int mixed=(int)first_array[i] + (int)second_array[i];
if (mixed>32767) mixed=32767;
if (mixed<-32768) mixed=-32768;
final_array[i] = (short)mixed;
}
In MOST cases you don't need anything else for normal audio samples, as the clipping will occur in extremely rare conditions. I am talking this from practice, not from theory.

Your above merging method will work if the sample rates and desired mix level of the two audio clips are identical. If the desired mix levels are different, then a slightly more general form of your mixer would be something like:
mixed_result[i] = rescale_and_clip_fix( volume1 * input1[i] + volume2 * input2[i] );
Where rescale_and_clip_fix() might be a limiter or compressor, following making sure the scale after multiplication is correct for the result's data type. If the result array is an integer data type, then you may also want to do rounding or noise filtering while scaling.
If the sample rates are different, then you will need to do a sample rate conversion on one of the input channels first and/or the result afterwards.

In general, this will get you what you want - however, watch for clipping. That is, be careful not to end up with integer overflow; and don't avoid this by just limiting the value to the max/minimum of the type in question. You may need to apply a compressor to bring the values back into range after adding them.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string