I have a Canon Powershot S100 and I would like to be able to use it to play videos. The only format it supports it is a H.264 Linear PCM, 16 bit little-endian signed integer, 48000 Hz MOV file at 1080p 24FPS, and any other format is recognized as "Unrecognized Image". I have a LOT of mp4 files I would like to convert to that specific format, however I cannot find anything online related to converting mp4's to MOVs in that specific encoding.
I tried looking up online tools to achieve the same result, but none of them could get the specific audio format along with the video, and removing the audio from the MOV also makes the camera not recognize it.
It would be ideal if I could make a python script do the converting for me, as there are hundreds of mp4 files in a folder on my desktop.
If anyone could help, that would be greatly appreciated!
It is necessary to record sound in WAV format, forming a waveform according to a certain law. I figured out how the WAV header works, but I don't understand how the sound is recorded in any way. I chose a 32-bit recording per sample, here is an example of sample from a real file:
55 73 0A 0D 32 33 30 3D
In theory, it should be a fractional number from -1.0 to 1.0. There are questions: how is the sign stored here? I also wonder where the comma stands here.
I wrote a jpeg compressor/decompressor years ago, which can handle lossless and lossy jpeg files. It works well, but doesn't always decode jpeg streams in DICOM files correctly.
I know jpeg well, but I know little about DICOM. Lossless jpeg in DICOM can't possibly be compliant with the jpeg ISO standard. There must be some modification, either hard coded, or modified by a parameter somewhere in a DICOM file outside of the jpeg file stream.
My code fails on most of the sample DICOM files (compsamples_jpeg.tar) at:
ftp://medical.nema.org/MEDICAL/Dicom/DataSets/WG04/
Here's what happens when I decode the first lossless jpeg (IMAGES\JPLL\CT1_JPLL) in this set:
dicom decoded image
The left image is rendered from my code, the right was rendered by an online DICOM reader:
www (dot) ofoct (dot) com (slash) viewer (slash) dicom-viewer-online (dot) html
(x)MedCon, an open source DICOM reader, fails at the exact same pixel as my code, so I'm not the only one who has this problem.
xmedcon dot sourceforge dot net
I have read this jpeg stream byte by byte, drew the huffman tree and calculated the huffman codes with pencil and paper, and my code does exactly what it is supposed to do. Here are the huffman codes:
0 00
4 01
3 100
5 101
1 1100
2 1101
6 1110
7 11110
8 111110
9 1111110
12 11111110
11 111111110
10 1111111110
15 11111111110
Here is the compressed data after the SOS marker:
ff 00 de 0c 00 (00 after ff is stuff byte)
11111111 11011110 00001100 00000000
11111111110 si=15
111100000110000 diff=30768
The online viewer says the first pixel value is -3024. If this is correct, the first diff value should be -3024, but it is not.
After this, my code correctly decodes about 2/5 of the image, but then decodes a wildly inaccurate diff value:
d2 a1 fe ff 00 e0 (00 after ff is stuff byte)
1010111 10100001 11111110 11111111 11100000
101 si=5
01111 diff=-16
01 si=4
0000 diff=-15
111111110 si=11 ????
11111111111 diff=2047
If you look at the image decoded by the online viewer, there is no radical change in pixel intensity at this location, so the si=11 value can't be correct.
I am sure I have a good understanding of jpeg, but jpeg streams in DICOM don't seem to follow the jpeg standard. What extensions/changes are made to jpeg streams when they are embedded in DICOM files?
DICOM specifies the use of ISO 10918 just as it is written, so there is nothing magic about the use of lossless JPEG in DICOM images, other than the matters of reinterpreting the always unsigned output of the decoded bitstream as signed (depending on Pixel Representation) and applying the Rescale Slope and Intercept to the decoded "stored pixel values" into whatever "values" a viewer might report (e.g., as Hounsfield Units), as Paolo describes. Or to put it another way, do not rely on the "pixel values" reported by a viewer to be the same as the direct output of the decoded bitstream.
For reference, here are the sections in DICOM that address the use of 10918 in general:
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_8.2.html#sect_8.2.1
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_A.4.html#sect_A.4.1
DICOM encoders may split individual compressed frames into separate fragments, as in the case of this sample that deliberately uses fragmentation to test the decoding capability. I expect you know that and have taken care of reassembling compressed the bit stream across fragment boundaries (i.e., removing the fixed length Item tags between fragments):
http://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_A.4.html
Though some encoders may be buggy, I don't think that is the case for IMAGES\JPLL\CT1_JPLL in the NEMA sample dataset, which I created many years ago using the Stanford PVRG codec.
My own decoder (minimal as it is) at http://www.dclunie.com/pixelmed/software/codec/ has no problem with it. The source is available, so if you want to recompile it with some of the debugging messages turned on to track each decoded value, predictor input value, restart at the beginning of each row, etc., to compare with your own logic, feel free.
Finally, since JPEG lossless is used rarely outside DICOM, you may find it hard to obtain other samples to test with. One such source that comes to mind is the USF digitized mammography collection (medical, but not DICOM), at http://marathon.csee.usf.edu/Mammography/Database.html.
David
PS. I did check which codec XMedCon is using at https://sourceforge.net/projects/xmedcon/ and it seems to use some copy of the Cornell lossless code; so it may be vulnerable to the same bug described in the post that BitBank referred to (https://groups.google.com/forum/#!topic/comp.protocols.dicom/Yl5GkZ8ggOE) or some other error. I didn't try to decipher the source code to see.
The first pixel's value is indeed -3024 as the online dicom viewer says:
You correctly decode the first amplitude as 30768, but the first pixel has the predictor set to zero and therefore its real value is 32768+30768=63536. This is an unsigned value.
Now, the pixel representation tag says that the file values are in b2 complement (signed), therefore when we use the most significant bit as a sign the value becomes -2000.
When we apply the value in the rescale slope intercept tag (-1024) then the value of the first pixel becomes -3024.
However, my codec doesn't find any amplitude 2047 near the row 179, so maybe your codec is going out of sync somehow: the loss of sync is also visible in the subsequent rows (they are all shifted to the right).
If I have a text file of sample amplitudes (0-26522), how can I create a playable audio file from them?
I have a vague recollection of tinkering with .pcm files and 8-bit samples way back in the nineties.
Is there any software to automatically create an audio file (PCM or other format) from my samples? I found SoX, but I even after looking at the documentation I can't figure out if it can do what I want, and if so how...
GUI audio workstation called Audacity that lets you do this
File -> Import -> Raw Data
Encoding: Signed 16-bit PCM // even though your ints are unsigned it still works
Byte order: little endian
Channels 1 channel mono
then just hit Import
to confirm this works, in a text editor I just did a ( cut N paste followed by select all paste,paste,paste,paste ) of below list of ints about 10 times to generate several thousand ints in a vertical column ... this is my toy input file ... after above Import just save by doing
File -> Export Audio
where you choose which output format ( mp3, aac, PCM, ...) once I did this the output mp3 is playable ... using my toy input file I did hear a sine tone
3
305
20294
11029
585
3
305
20294
11029
585
3
305
20294
11029
585
I was reading THIS TUTORIAL on wav files and I have some confusions.
Suppose I use PCM_16_BIT as my encoding format. So this should mean each of my sound samples need 16 bits to represent them shouldn't it?
But in this tutorial, the second figure shows 4 bytes as one sample. Why is that? I suppose because it is trying to show the format for a stereo recorded wav file, but what if I have a mono recorded wav file? Are the left and right channel values equal in this case, or one of the channel values is 0? How does it work?
Yes, for 16bit stereo you need 4 bytes. For mono, you just need two bytes for 16bit PCM. Check this out:
http://www.codeproject.com/Articles/501521/How-to-convert-between-most-audio-formats-in-NET
Also read here:
http://wiki.multimedia.cx/index.php?title=PCM