Pitch recognition games in the browser - audio

I'd really like to develop some games and exercises that run in the browser and use pitch detection to listen with the microphone and award points for correct answers.
Is this in the realms of fantasy? If not, what is the starting point?

Related

Decide Pixi.js or Phaser [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 21 days ago.
The community reviewed whether to reopen this question 21 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
One of my school project is to make a realtime multiplayers webpage game, I am currently having difficulty to decide if I should go Pixi.js or Phaser for the game graphic and control, could anyone talk a little bit about what they are good at and better that each other?
Phaser uses Pixi for rendering, albeit an older and heavily modified version of it. Current versions of Pixi may give you better performance, but you'll have to implement by hand what's readily available in Phaser.
They are different by that Pixi is a rendering engine and Phaser is a game framework.
I'll quote Rich, the creator of Phaser:
Off the top of my head, here is what Phaser adds onto Pixi:
Choice of physics systems (arcade or full body)
A Game World and a Camera which can pan around it
Tilemap support
A particle system
Sound support (both web audio and legacy audio)
More advanced input handling (input priority, drag and drop, etc)
Keyboard and Gamepad inputs
Scale Manager to handle game / scene resizing + full screen support
Tween Manager for tweening game objects, hooked into the core clock (so it pauses properly when your game does)
Asset loader (supporting all kinds of file types) and Cache
A State Manager to let you swap between game states easily
Game clock + custom timers + timer events
And probably lots more I forgot. As someone has commented though, it depends entirely on what you want to make. Lots of people use Pixi who don't make games at all. However as this is a game dev forum, I'm going to suspect you do :)
I guess just try it. If you don't like it put it down to experience and just use Pixi "raw".
Source: http://www.html5gamedevs.com/topic/12656-phaser-pixi/#comment-72893
Depending on how much you can wait, you may actually wait to try Phaser 3 (Lazer), which is currently in the works, and will have its own rendering engine. I think, however, that learning the current version of Phaser is a good starting point, and many things in Lazer will be the same.
Phaser gives you a full game framework. Pixi is a rendering engine as Kamen described above.
My idea, if you are a beginner on HTML5 game development, you can have two different approaches;
If you have a product ahead of you to complete, Phaser gives you more tools and therefore speed. It is the biggest sea to swim in for HTML5 game development. But it has its own limitations. Off course you can write your own tools but at the end it is a framework and like every framework it forces you to use its own flow and tools to run smoothly. It would require some time for a developer to understand its flaw, pinpoint their needs and if Phaser doesn't meet them, implement their own solutions. But many people use Phaser and most possibly, there is an answer to all of the problems for a beginner. At the beginning they were using Pixi.js as renderer but now they have their own.
If you want to learn by digging deep into HTML5 renderers and game development, starting by using Pixi.js might be a better decision. As mentioned, Pixi.js is only the renderer. It has cool features but it needs more development upon it to make games. But it also gives you the freedom. You mostly won't have to deal with renderers(WebGL or Canvas) but rest is fully up to you. Personally, I started with Pixi.js, I knew about Phaser but I didn't look deeper into it and wrote my own framework. After my framework got into some point on development, I checked Phaser and I realized that what I had in mind was mostly already existed on Phaser. But still it gave me a deeper information about HTML5 game development.

How to automatically transcribe a Skype meeting, correctly attributed to each participant?

Assuming each participant agrees to the recording and transcription of the Skype call, is there a way to transcribe the meeting (either live or offline or both) such that it produces a text transcript where each spoken text is correctly attributed to the speaker. The transcript could then be input to any variety of search or NLP algorithms.
The top 3 Google search hits of "automatically transcribe Skype" refer to apps which make manual transcription easier:
(1) http://www.dummies.com/how-to/content/how-to-convert-skype-audio-to-text-with-transcribe.html
(2) http://ask.metafilter.com/231400/How-to-record-and-transcribe-Skype-conversation
(3) https://www.ttetranscripts.com/blog/how-to-record-and-transcribe-your-skype-conversations
While it would be trivial to record the audio and send it to a speech-to-text engine, I doubt it would be very high quality because the best results are usually speaker dependent models (else we wouldn't have to take time to train Dragon Naturally Speaking).
But, before we can choose speaker dependent transcription models, we need to know which segment of the audio belongs to which speaker. There's 2 ways that this is solved:
There is an easy way to retrieve all the audio that came from each participant, e.g. you just record all the audio from each speaker's microphone during the call, and you don't have to do any segmentation.
In case the first option isn't feasible or prohibitive in some way, we have to use a Speaker Diarization algorithm, which segments the audio into N clusters/speakers (most algorithms allow for being told how many speakers in the audio, but some can figure this out on their own). For real-time transcript as the call goes on, I imagine we'd need some fancy Real Time Speaker Diarization algorithm.
In any case, once the segmentation is solved, each participant has their trained speaker model, which is then applied to their portions of the audio. At the end of the day, everyone gets a nice conversation transcript and later one we can do fancy things like topic analysis or maybe Big Brother wants to sift over everyone's project meetings without having to listen to hours of audio.
My question is, what would be a way to implement this in practice?

How Text to Audio softwares works [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to create a software that can convert readable-texts(non-English) to Audio sound output.
After some searches what I have realized that most of the existing audio readers are too robotic and lacks the human-speech like effects.
I am looking for some algorithm/paper-work, which can give me some idea on how to proceed/implement such a thing.
or
Does anyone know, How some of the world's best Text-Reader software works?
My expectation are:
Reduced Robotic-like sound, and more of Human-like sound
High Quality Output
Light weight, yet Fast process speed
**Please edit this question, if anyone thinks some points are missing on this aspect.
Some small steps might help you give some basic Idea of what happens-
You need to create a dictionary of words, each word with its name and sound.
Create your own signal processor, this will help you add effects to your sound, like you might want robotic, or a female version or something else.
Parse the text file you want to read in array formats, dividing each word and punctuations, to form an array and. eg. "I want to die, this isn't a correct way to live." this will form an array as {I:want:to:die:,:this:isn't:a:correct:way:to:live:.}
Use the punctuation to implement life like parameters like , for small pause and . for longer pauses in your audio reader.
Use the words to take out audio from your database(dictionary) list in point 1.
Play the whole array continuously with a pause between each array element, will work similar to spaces
I think these are major ways to do this. To make it faster you can use advanced sound processing tools, to cache small sound data and add data on fly while you are modulating sound signals.
Might this help you.
Could be nice if you can tell us what kind of app you'll create (Movil, Web, Desktop) and also in what code you'll develop it (Php, Java, C++, etc). Because if you search in google, you'll find a lot plugins for website that convert text to audio that you can download them and see the code.
Also it's hard to find an app that not sound like a robot and if you find it maybe you'll pay for it.
The "robotic" aspect of text to speech that you are concerned about is a matter of the quality of "prosody". This is an active research area. You could probably get a PhD for working on improving prosody in TTS systems. If you would like to read about current research you can try searching for "improving prosody in text to speech".
A big part of the problem is having an accurate model of speech prosody in a given language. The thesis "MeLos: Analysis and Modelling of Speech Prosody and Speaking Style" by Nicolas Obin (2012) contains a survey of the state of the art in speech prosody modelling. Or try searching for "text to speech prosody survey state of the art".

sensor programming [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 months ago.
Improve this question
I´ve a question according sensor programming. I´m searching a sensor that tells me, for example, if a glass of water is more than half full. I´ve already googled that, but I can´t find anything.
So my questions are:
Where can I buy such a sensor?
What programming language do I need to control such a sensor?
Thanks for answers..
Update from comments below one of the answers
What I really need it for is a big container, in which is some corn. I
want to use the sensor to tell me, just as the corn is under a defined
point of the container. So that I can calculate, at which time I have
to refill the container.
Your sensor could be a level sensor. There are several principles on which level sensors work (see here). Some of them will work with granular solid material. (For example, an ultrasonic range sensor could shoot a pulse at the surface of corn mass, detect the reflection, measure round trip time of flight.)
... or it could be a proximity sensor, as somebody had suggested above.
... or it could be a weight sensor. Here's an application note on weighing vessels.
If you google "level sensor for grains", you may find something useful.
What language to use would depend on what you will connect connect the sensor to. If it will be connected to a microcontroller, the language would be C. If it will be connected to a PC, then it would depend a lot on the particular model of the sensor.
By the way, here's a web group dedicated to sensors.
I would imagine you could use a similar mechanism to a car's fuel tank. Have a mechanism that stays afloat in the container with an attached arm and a magnet on it, then using a Hall sensor you can observe the change in hall reading as the floating part rises or falls within the container.
"What I really need it for is a big container, in which is some corn."
Perhaps one of those sensors that are used to ensure garage entry ways are clear before an automatic garage door is allowed to close. It uses an optical beam of light.
Do you know the size of the glass in question? You could just get a scale and work out how heavy the glass would be when it is half full of water. My guess is that you could probably find a sensor that could do this and it would most likely need to be written in C.
This guy seems to be having the same problems:
http://forums.makezine.com/comments.php?DiscussionID=6052
Good luck.
Also check out Arduino for micro controller electronics.

Ways to identify (musical) scores [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm searching ways to identify scores, when someone is playing i.e. guitar. How can I manage that?
I've heard that midi stores music data as musical scores. I wonder if it's a good solution.
MIDI does store musical scores, but it doesn't (normally) extract them from recorded sounds. You can't take an mp3 file and "convert it to MIDI", in a standard or entirely reliable way.
You create a MIDI file using a recorder (or "sequencer"), which might be a desktop application where you "write the score" like a composer does, or it might be a musical device like a keyboard, which records which keys you press, how hard and for how long, and interprets that as a score.
A MIDI player takes the data/score, and reproduces it using its own voice (or "sound font" if you like). So the advantage of MIDI data is firstly that the voice is already available on the playback device (and so the data is very compact), and second that the same data ("tune") can be played using different voices ("instruments")[*].
I believe there are MIDI guitars, but I don't know how "good" they are. The tone of an electric guitar comes in part from resonances of the solid body. This could of course be imitated by the voice at playback time, but there are bound to be some things that you can do with an electric guitar but which the MIDI format cannot capture or represent (for example I'd guess feedback is impossible).
Software exists to extract MIDI data from recorded sound - this is a bit like the way OCR extracts ASCII character data from images of text. It's not a major means of recording someone's guitar-playing, but if what you want is to get a first approximation to the score/tabs, you could try it.
Here's a randomly-selected example, found by Googling "convert from wav to MIDI":
http://www.pluto.dti.ne.jp/~araki/amazingmidi/
[*] But members of the audience, you find yourselves wondering, "what is this mindless automaton which bangs out the tunes it's instructed to, without comprehension or any aesthetic sense". Ladies and gentlemen, Colin Sell at the piano.
Music recognition/retrieval is an extremely difficult and almost entirely unsolved AI problem. Try to extract the frequency from a signal file of someone playing a single unwavering note one time - it's much more difficult than just "apply Fourier transform, read off solution". Compound that with polyphony, noise, rubato, vibrato/portamento, plus the fact that (contrary to speech recognition) we don't even have a working a-priori model of what music actually is, and you begin to see the difficulty. There are absolutely fascinating research papers and even entire conferences on the topic, but in the short term, you're just plain out of luck.
Are you aware you are attempting something extremely difficult? It's a very complex topic you could spend years researching yourself or pay $$$$$$ for existing commercial solutions.
MIDI is a reasonable choice for your output format.
For the rest you will need Fast Fourier Transforms working off a high-resolution capture of the input analogue sounds plus at least seven years of musical theory.
Good luck.
If the player is playing in tune, there will be very distinct frequencies in the signal, or at least frequencies with a mathematical separation. It may be possible to characterise a signal using spectral analysis to distinguish music from noise; or at least melodic music from noise - avant guarde experimental music may not pass ;). The distinction may become more difficult with multiple instrumentalists, percussion, and non-standard or poor tuning; traditional Chinese or Indian music for example uses different scales than western music.
Extracting the frequencies in the signal will require signal processing techniques such as Fast Fourier Transform. Categorising the signal as music/not music could be done by statistical analysis, or AI techniques such as neural networks or fuzzy logic

Resources