I am working on some voice recognition software as my first Haskell project. I am looking for something something that could work cross-platform. My plan is as follows:
Record audio as a .wav file
Use https://hackage.haskell.org/package/flac to convert the .wav file to a FLAC file.
Send a speech API request to Google's Speech API (as can be seen here https://cloud.google.com/speech/docs/quickstart).
Receive response and interpret it.
However, I am currently somewhat stuck on step 1 as it seems rather difficult to record microphone input in Haskell.
Since this is a learning experience I wanted to try to find something simple that I could understand quite easily. After some googling I found this: https://codereview.stackexchange.com/questions/169577/sound-recorder-based-on-sdl2, which uses Haskell SDL2 bindings and Codec.Audio.Wave. I tried copying the code to see if it would work, but alas it did not. I did not have any errors with running the program, but when I tried playing the wave file, nothing happened and it seemed completely empty. Nothing seems to have been recorded, as the file size is only 44 bytes.
I figured the issue was the SDL open device spec not being able to find a suitable recording device, so I tried checking the frequency to see if my microphone wasn't able to handle 48000 Hz and then since that didn't help (it is 96000 Hz max) I tried messing around with the device channels, but that did not help either.
The problem is not with my microphone since I tried recording some audio with arecord (on Arch Linux), which worked just fine. As I said, this is my first Haskell project and I am a bit of a newbie and although I do have experience in other languages it is completely possible that I have done something really silly.
I am not opposed to any solutions not involving SDL2, but as I am looking for something (hopefully) simple and cross-platform, this seemed like a good fit.
Related
I've tried to use an at91-sama5d2 and an max9880 codec. Since drivers where lacking I've written my own driver and now finally got sound. However it sounds really bad, see:
Sound quality example
I'm failing to identify the error here. What kind of error makes this sound so bad?
I would like to create a utility in either PHP or Perl to convert an audio file created by the Nortel's Callpilot voice mail system into a wave file. The problem is that the format, which has the .vbk file extension, is unknown to virtually any audio player. To date, I have not found one that will play a .vbk file. I've looked at audio file conversion libraries in CPAN and tried many of them, they don't recognize the file. I was not successful with PHP's audio formats manipulation either. Nortel does provide a converter, however, it does not suite my needs. I would like to have this run via cron on a CentOS system. I don't know how to reverse engineer this format. There seems to be just scraps of info on this format on the web. This page indicates that it is "based on the H.232 format":
https://www.odesk.com/o/jobs/job/Reverse-Engineer-Nortel-VBK-Audio-Format_~~f501f11679f3f6bb/
I know this is a very old thread, but I've recently been looking into converting Nortel's vbk format as well. Importing the vbk files into Audacity with raw data option, Encoding: U-Law, Byte order: little-endian, Channels: 1 Channel (Mono), Sample rate: 8000 Hz. Not sure if they have multiple formats for their vbk files, but mine were from a BCM50 phone system.
Well, this is the joy of closed proprietary systems. But there is a chance they could play nice. Try to contact Callpilot and see if they'll give you the format specs. It's worth a shot.
As for reverse engineering, you need to be able to generate known content. Like a constant tone at 60Hz for exactly 1 second. Then at 50Hz. Then at 10 seconds. Compare them. Isolate the data from the metadata. There is going to be compression involved, so try a handful of common compression schemes, maybe research into Nortel's practices will probably tell you more. If you can feed that into a player and get a tone back out, you're on your way.
There's probably more informed and structured ways to go about reverse engineering, but from my experience it's a lot of trial and error.
I searched many questions - but no one seems to be giving simplest, most uniform approach, hence please do not close as duplicate.
My requirement is simple: I have quiz app.
I want to include:
background music that plays continually - probably more than one
audio.
I need occassional sounds played at specific events - they
are very short in duration. Maybe 4-5 in number.
What sound format do I use? [aac etc]
How do I produce it? (optionally, get it from internet, if free)
What is the best approach to incorporate it? [audioplayback, openal etc)
Forgive me if this is quite stupid, but I am going very generic here and can't seem to find it.
Thanks for the help!
For sound format, use AAC or uncompressed 16-bit little endian in a CAF container (avoid mp3 since it's difficult to make it loop cleanly). You can convert using the command line tool 'afconvert':
Compressed:
afconvert -f caff -d aac sourcefile.wav destfile.caf
Uncompressed 16-bit:
afconvert -f caff -d LEI16 sourcefile.wav destfile.caf
For production, either record it yourself (using an audio program such as Audacity), get a professional to do it, or buy royalty free sounds/music.
To incorporate it, use AVAudioPlayer for music and OpenAL for sounds. OpenAL is difficult to use and doesn't decode compressed audio on its own, so you may want to use an audio library such as https://github.com/kstenerud/ObjectAL-for-iPhone
I want to start on a hobby project that focuses on displaying audio files in a folder in a certain fashion and has the ability to play such an audio file and shows basic control options for playing. However, i'm struggling to find a fit programming language for this.
The displaying part shouldn't be too hard and can probably be done in most of the programming languages. The audio part is what concerns me the most since it's not the main focus of the project and should only do limited things (so it shouldn't be too hard) and i do not know anything about sound support in the programming languages i currently know. (Java, C and C++)
Specifically i would like to be able to do these things:
Play a sound file
Stop/pause a playing song
Adjust volume
Show a bar that displays the current position in the song
Most files will be .mp3 files but being able to process other formats is certainly a plus. Since this is just a small project it's ok if it runs just on Windows. Scalabilty would be nice but not required.
It would be nice to have a small overview of audio support/audio libraries of programming languages (i'm always up for something new) that can accomplish these simple things, in a not too complicated way, aswell as personal experiences.
In this way i hope to create a better understanding of which programming language fits my project best. (i would very much like to not have to change language mid-way the project)
--
Edit:
This is only for a later stage of the project if the first part was successfull: i will want to change the file names of the audio files that are displayed. (to make them follow a specific format)
I haven't written audio processing programs much, but I know a lot of them exist for C and C++. For Java perhaps, too, but I don't know Java. I had used audio with SDL in a game, but that doesn't have that many features and I don't recommend it.
There's this question asking for a library in C, and there are a couple of similar questions that SO brings up on the side. You may want to take a look at those.
You would also need to look for a library that loads different file types. SDL at least, only opens .wav files, which I believe most of the playback libraries would support. For MP3, you will most likely need an additional library. I know Audacity uses LAME Mp3 so I'm guessing that should be good.
Some of the functionalities you want is also doable by yourself. For example, knowing the length of the music and the amount you have already read, you will know how far in the audio you are. Adjusting the volume is also a multiplication (in the simplest case) that you can do on the audio data if the library doesn't provide it.
A very good choice seems to be PortAudio which is used by Audacity, and also recommended in the accepted answer of the question I mentioned above.
I've done audio apps in both Java and C++. Java development goes way faster because it's a more powerful language and has garbage collection, but JavaSound is a pretty awful solution for audio. Of course, there are wrappers for FFMPEG and other stuff, so you can get a lot of things working. Here's an example of a Java audio app: http://www.indabamusic.com/help/mantis
OTOH, C++ gives you lots of control, low latency and wealth of libraries. (another answer mentioned Portaudio, which is, indeed, great.) But you will definitely find it also has a much longer development cycle.
You can certainly do everything you want to do with either language.
I'd like to play a sound and have some way of reliably telling how much of it has thus far been played.
I've looked at several sound libraries but they are all horribly underdocumented and only seem to export a "PlaySound, no questions asked" routine.
I.e, I want this:
a = Sound(filename)
PlaySound(a);
while true:
print a.miliseconds_elapsed, a.length
sleep(1)
C, C++ or Python solutions preferred.
Thank you.
I use BASS Audio Library (http://www.un4seen.com/)
BASS is an audio library for use in Windows and Mac OSX software. Its purpose is to provide developers with powerful and efficient sample, stream (MP3, MP2, MP1, OGG, WAV, AIFF, custom generated, and more via add-ons), MOD music (XM, IT, S3M, MOD, MTM, UMX), MO3 music (MP3/OGG compressed MODs), and recording functions. All in a tiny DLL, under 100KB in size.*
A C program using BASS is as simple as
HSTREAM str;
BASS_Init(-1,44100,0,0,NULL);
BASS_Start();
str=BASS_StreamCreateFile(FALSE,filename,0,0,0);
BASS_ChannelPlay(str,FALSE);
while (BASS_ChannelIsActive(str)==BASS_ACTIVE_PLAYING) {
pos=BASS_ChannelGetPosition(str,BASS_POS_BYTE);
}
BASS_Stop();
BASS_Free();
This is most likely going to be both hardware-dependent (sound card etc) and OS-dependent (size of buffers used by OS etc).
Maybe it would help if you said a little more about what you're really trying to achieve and also whether we can make any assumptions about what hardware and OS this will run on ?
One possible solution: assume that the sound starts playing more or less immediately and then use a reasonably accurate timer to determine how much of the sound has played (since it will have a known, fixed sample rate).
I'm also looking for a nice Audiolibrary, where i can directly write on the Soundcards Buffer. I didn't have time yet to have a look at it myself, but pyAudio looks pretty nice. If you scroll down on the page you see an example similar like yours.
With help of the buffersize, number of channels and sample rate you can easily calculate the time each loop-step lasts and print it out.