I have a IP Camera which sends 8000hz Sampling rate configured audio and H264 video.
I made a program generating TS-file from this IP camera and it works fine on VLC, Android Media player except IPhone, Mac OSX Safari. (The program works with HLS Server that I made.)
Video playing in Iphone, Safari is fine, but Audio isn't. (I can hear sound, but it's not played smoothly)
I understand that Audio PTS in the TS packet should be MPEG2 Sytem PCR Clock based (90000hz). Timestamp value IP camera sends is based Sampling rate based(8000hz), so I multiply 90000/8000 to Timestamp to make PTS be MPEG2 PCR clock when I write audio's PTS in TS-file.
Is the wrong way multiplying 90000/8000 to Audio PTS?
any help will be appreciated.
You are most likely suffering from rounding errors. The PTS in TS MUST be perfect or many players will attempt to resynchronize playback with the reference clock , this will often appears as dropped samples, or inserted silence.
Make sure your starting PTS is accurate by counting samples and converting to to 90khz. Do your multiply before your divide e.g. (sampleCount * 90000) / sampleRate (NOT sampleCount * ( 90000 / sampleRate ), and make sure you are using a 64bit integer to avoid integer overflows. Or better yet, use av_rescale from libavutil.
Related
I'm trying to understand how to implement audio playback from scratch on attiny85. The goal is to play a short sound (cat meows, so i want it to remain recognizable) from an array representing strength of audio signal sampled at fixed interval.
As far as i understand, signal strength is linearly mapped to voltage of analogue audio signal. As far as I know, audio cards are Digital to Analogue Converters, but attiny85 probably doesn't have that.
I'm curious if I can use pwm to play the sound back. Since pwm changes average voltage by changing duty cycle of alternating high and low phases of signal, it most likely would result in the drop of audio quality. Wav sampling rates can differ between 1 HZ and 4.3 GHz according to google. Attiny85 has internal clock with frequency up to 8MHz (which I hope is same for it's pwm generator).
Considering reconfiguring the timer and pwm settings as well as looping in the array, what is the maximum sampling rate of audio i can reliably play? And should i even try to do it with pwm, or there are better options?
Given a system clock of 8 MHz, you can use PWM to generate mono (single-channel) audio.
Consider a PWM period of 1000 clocks, giving you about 10 bit resolution. The sample rate will be 8000 Hz then, which gives you some kind of lo-fi audio.
If you reduce your signal resolution to 8 bits, you'll get 8 MHz / 28 = 31.25 kHz sample rate. This gets near hi-fi.
Synchronize your sample output with the PWM generator, and use an appropriate analogue filter.
Many years ago I built a digital door bell with a sample rate of 8 kHz and 8 bit samples. It played nice sounds in the quality of telephones. The microcontroller was a 8051 derivative and it used an R-2R ladder as DAC.
A simpel sinus can be generated by using a 50% PWM signal and varying the frequency. Given some filtering effect through the speaker, it would mimik a single tone audio signal.
Making more advanced tones (needed for natural sound) quickly gets more complicated and the duty cycle of the signal can also be used to trick the human ear into hearing harmonics. Check out the arduino function tone() for some inspiration.
Be carefull when connecting a small speaker to the Arduino, preferably a transistor/buffer/small amplifyer should be place between the Arduino and the speaker.
My goal is to record audio using an electret microphone hooked into the analog pin of an esp8266 (12E) and then be able to play this audio on another device. My circuit is:
In order to check the output of the microphone I connected the circuit to the oscilloscope and got this:
In the "gif" above you can see the waves made by my voice when talking to microphone.
here is my code on esp8266:
void loop() {
sensorValue = analogRead(sensorPin);
Serial.print(sensorValue);
Serial.print(" ");
}
I would like to play the audio on the "Audacity" software in order to have an understanding of the result. Therefore, I copied the numbers from the serial monitor and paste it into the python code that maps the data to (-1,1) interval:
def mapPoint(value, currentMin, currentMax, targetMin, targetMax):
currentInterval = currentMax - currentMin
targetInterval = targetMax - targetMin
valueScaled = float(value - currentMin) / float(currentInterval)
return round(targetMin + (valueScaled * targetInterval),5)
class mapper():
def __init__(self,raws):
self.raws=raws.split(" ")
self.raws=[float(i) for i in self.raws]
def mapAll(self):
self.mappeds=[mapPoint(i,min(self.raws),max(self.raws),-1,1) for i in self.raws ]
self.strmappeds=str(self.mappeds).replace(",","").replace("]","").replace("[","")
return self.strmappeds
Which takes the string of numbers, map them on the target interval (-1 ,+1) and return a space (" ") separated string of data ready to import into Audacity software. (Tools>Sample Data Import and then select the text file including the data). The result of importing data from almost 5 seconds voice:
which is about half a second and when I play I hear unintelligible noise. I also tried lower frequencies but there was only noise there, too.
The suspected causes for the problem are:
1- Esp8266 has not the capability to read the analog pin fast enough to return meaningful data (which is probably not the case since it's clock speed is around 100MHz).
2- The way software is gathering the data and outputs it is not the most optimized way (In the loop, Serial.print, etc.)
3- The microphone circuit output is too noisy. (which might be, but as observed from the oscilloscope test, my voice has to make a difference in the output audio. Which was not audible from the audacity)
4- The way I mapped and prepared the data for the Audacity.
Is there something else I could try?
Are there similar projects out there? (which to my surprise I couldn't find anything which was done transparently!)
What can be the right way to do this? (since it can be a very useful and economic method for recording, transmitting and analyzing audio.)
There are many issues with your project:
You do not set a bias voltage on A0. The ADC can only measure voltages between Ground and VCC. When removing the microphone from the circuit, the voltage at A0 should be close to VCC/2. This is usually achieved by adding a voltage divider between VCC and GND made of 2 resistors, and connected directly to A0. Between the cap and A0.
Also, your circuit looks weird... Is the 47uF cap connected directly to the 3.3V ? If that's the case, you should connect it to pin 2 of the microphone instead. This would also indicate that right now your ADC is only recording noise (no bias voltage will do that).
You do not pace you input, meaning that you do not have a constant sampling rate. That is a very important issue. I suggest you set yourself a realistic target that is well within the limits of the ADC, and the limits of your serial port. The transfer rate in bytes/sec of a serial port is usually equal to baud-rate / 8. For 9600 bauds, that's only about 1200 bytes/sec, which means that once converted to text, you max transfer rate drops to about 400 samples per second. This issue needs to be addressed and the max calculated before you begin, as the max attainable overall sample rate is the maximum of the sample rate from the ADC and the transfer rate of the serial port.
The way to grab samples depends a lot on your needs and what you are trying to do with this project, your audio bandwidth, resolution and audio quality requirements for the application and the amount of work you can put into it. Reading from a loop as you are doing now may work with a fast enough serial port, but the quality will always be poor.
The way that is usually done is with a timer interrupt starting the ADC measurement and an ADC interrupt grabbing the result and storing it in a small FIFO, while the main loop transfers from this ADC fifo to the serial port, along the other tasks assigned to the chip. This cannot be done directly with the Arduino libraries, as you need to control the ADC directly to do that.
Here a short checklist of things to do:
Get the full ESP8266 datasheet from Expressif. Look up the actual specs of the ADC, mainly: the sample rates and resolutions available with your oscillator, and also its electrical constraints, at least its input voltage range and input impedance.
Once you know these numbers, set yourself some target, the math needed for successful project need input numbers. What is your application? Do you want to record audio or just detect a nondescript noise? What are the minimum requirements needed for things to work?
Look up in the Arduino documentartion how to set up a timer interrupt and an ADC interrupt.
Look up in the datasheet which registers you'll need to access to configure and run the ADC.
Fix the voltage bias issue on the ADC input. Nothing can work before that's done, and you do not want to destroy your processor.
Make sure the input AC voltage (the 'swing' voltage) is large enough to give you the results you want. It is not unusual to have to amplify a mic signal (with an opamp or a transistor), just for impedance matching.
Then you can start writing code.
This may sound awfully complex for such a small task, but that's what the average day of an embedded programmer looks like.
[EDIT] Your circuit would work a lot better if you simply replaced the 47uF DC blocking capacitor by a series resistor. Its value should be in the 2.2k to 7.6k range, to keep the circuit impedance within the 10k Ohms or so needed for the ADC. This would insure that the input voltage to A0 is within the operating limits of the ADC (GND-3.3V on the NodeMCU board, 0-1V with bare chip).
The signal may still be too weak for your application, though. What is the amplitude of the signal on your scope? How many bits of resolution does that range cover once converted by the ADC? Example, for a .1V peak to peak signal (SIG = 0.1), an ADC range of 0-3.3V (RNG = 3.3) and 10 bits of resolution (RES = 1024), you'll have
binary-range = RES * (SIG / RNG)
= 1024 * (0.1 / 3.3)
= 1024 * .03
= 31.03
A range of 31, which means around Log2(31) (~= 5) useful bits of resolution, is that enough for your application ?
As an aside note: The ADC will give you positive values, with a DC offset, You will probably need to filter the digital output with a DC blocking filter before playback. https://manual.audacityteam.org/man/dc_offset.html
I need to play 4 audios through a browser web.
These audios last 150ms, 300ms, 450ms and 600ms.
I don't care about latency (if an audio is played 100 ms after it's not that important for my purpose).
But I do care about the duration of these audios: is the 150ms audio last exactly 150ms or there is an error due to the audio board or other components?
I know for sure that there is an error (I see a test using a Mac).
My question is: can anyone show me a paper, an article or anything that talks about the duration and test different setting or tell me if this error is always (Windows, Mac, old device, new device) very small (less than 10ms for example).
In other words: if I play an audio of 100ms how long does it really last (100ms? more? less?)?
In what manner is the sound not lasting the correct amount of time?
Does the beginning or the end get cut off?
Does the sound play back slower or faster than it should?
In my experience, I've never heard an error with playback rates caused by the browser or sound boards. But I have come across situations where a sound is played back with a different audio format than which is was encoded. For example, a sound encoded at 48000 fps played back at 44100 fps will take longer to execute, but will be very close to the original in pitch (maybe about a 1/2 step lower). I recommend as a diagnostic step to confirm the audio format used at each end. How to do so will depend on the systems being used.
I'm trying to record raw composite video siganl to an audio file by connecting the yellow rca cable from a player to the mic input in my pc so I can then put the cable in my audio output and connect it with the video input in an old crt tv and play back the signal to the tv so that I can view the original video.
But that didn't work and I can only see random white lines.
Is that due to frequency limits in the audio format or in the onboard audio chip, or is analog-digital conversion and the other way when recording and playing back damaging the signal?
Video signals operate in ranges above 1 Mhz, where high-quality audio signals only max out at ~96Khz. Video signals would likely need to be be encoded in a format that an audio recorder could pick up, then decoded back into a video signal before a television could render it properly. This answer on the Sound Design exchange may be of interest to you.
A very high bitrate uncompressed audio file may be able to store a low-fidelity video signal, a black and white signal could be stored at sub-vhs quality, but could be at least a resolvable image, recording component video may be possible even though syncing the seperate tracks would be hard.
I tried it.
Sampling rate is 192KHz. It can record up to 192/2=96KHz.
I succeed to capture part of luminance signal.
Color signal is in very high frequency.
So we can't record color signal using soundcard.
Video is very distorted.
However we may can caputure more clearly using soundcard more highter sampling rate.
https://m.youtube.com/watch?v=-Q_YraNAGhw&feature=youtu.be
i created an app which plays a playlist of small tracks every thing was working fine , till windows phone 8.1 update
the problem is -> there is weird tick sound" at track end
so i tried to play the track in xbox music player it also has the same tick ... i tried to play the audio at my pc and android device the audio is okay, so i think it's a wp8.1 issue or a comparability issue with my mp3 tracks
so, is there any specifications for the mp3 to be compatible with wp8.1?
or any work around in code, i was thinking a bout muting the sound before the track end , by the way i'm using AudioPlayerAgent
All audio rendering processes encounter this same challenge/problem. Root cause : sound is a curve and as it varies above/below centerline, (typically varies from -1 to 0 to +1 where centerline is 0), if it ends not close enough to the centerline this pop/tick sound happens, (speaker is left in the lurge not at 0 and will physically instantaneously return to 0 producing the tick). Solution : either the player ~helps~ the sound by artificially forcing the hand by ending the clip at the centerline or do similar as a preprocess step in the source media. This ending transition can happen quickly, yet not instantaneously, or you'd be back where you started with the instantaneous transition to 0. Silence is just when the media supplies a series of zeros (IE. at centerline).