Can an audio file contain line breaks? - string

From what I understand, \r\n are special characters (correct?). Is there any possibility that an audio file can contain line breaks?
What I'm trying to do is send both line broken strings and full audio files to my socket and I'm just curious if detecting line breaks will ever stop me in the middle of the file.

An audio file may contain any byte or byte sequence, so trying to detect any specific sequence in the middle of the stream is bound to fail eventually. If we assume the audio is essentially random, the odds of any 2-byte sequence equaling \r\n is about 1/65536.

What's a "special character"? There is no such thing except in some relative sense you specify. Your audio stream might well contain bytes with value 10 (\n) or 13 (\r). You need some way to distinguish text from audio data in your stream other than looking for newlines.

A line break is just chr(11) (or some other number). Your data will likely contain a byte with a value of 11. However, you shouldn't be converting it to characters anyway, just read it as a byte.
But yeah, injecting line breaks would corrupt the file, but if you just read is as a stream of bytes treated as bytes you'll be fine.

Interpretation of line break bytes only applies when processing textual data. Audio is binary data instead. You should not be processing audio as text.

Related

What is the purpose of a N-byte 'magic' number?

When parsing NES roms, the first four bytes are a 'magic' number:
78/0x4E (N)
69/0x45 (E)
83/0x53 (S)
26/0x1A (DOS end of file character)
What purpose does this, or any other examples, provide?
So-called "magic numbers" are often specified by binary file formats as a kind of "signature" of the file format. Programs that read the file format can check for the magic number and reject the file as invalid if it doesn't match. This provides an easy way to catch garbage data early. It also allows programs that understand multiple formats to figure out what type of file they're reading, so it can be processed appropriately.
In the case of iNES format ROM images, it's no different. The purpose of the magic number in the header is to tell the emulator (or other program) that this is an iNES format image and that it should be parsed as such.

Find the file offset for a character index, ignoring newline

I have a text file of 3GB size (a FASTA file with DNA sequences). It contains about 50 million lines of differing
length, though the most lines are 70 characters wide. I want to extract a string from this file, given two character indices. The difficult
part is, that newlines shall not be counted as character.
For good speed, I want to use seek() to reach the beginning of the string and start reading, but I need the offset in bytes for that.
My current approach is to write a new file, with all the newlines removed, but that takes another 3GB on disk. I want to find a solution which requires less disk space.
Using a dictionary mapping each character count to a file offset is not practicable either, because there would be one key for every byte, therefore using at least 16bytes*3 billion characters = 48GB.
I think I need a data structure which allows to retrieve the number of newline characters that come before a character of certain index, then I can add their number and the character index to obtain the file offset in bytes.
The SamTools fai index was designed just for this purpose. Which makes a very small compact index file with enough information to quickly seek to any point in the fasta file for any record inside as long as the file is properly formatted
You can create a SamTools index using samtools faidx command.
You can then use other programs in the SamTools package to pull out subsequences or alignments very quickly using the index.
see http://www.htslib.org/doc/samtools.html for usage.

Reading a variable length record mainframe file

I have a mainframe data file in binary format, with variable records. No copybook works in this case, nor do I know end of line. How do I read such a file?
Assuming you're reading this file in a COBOL program running on the Mainframe, this is really no problem. COBOL doesn't write null-delimited output. It writes variable length records with the length embedded in the first two bytes of 4-byte prefix area called a (R)ecord (D)escriptor (W)ord, which is NOT included in the record layout copybook. To read such a record back into another COBOL, you just need a properly coded copybook.

File handling in tinyos or tossim

I need to read data from a text file in a tinyos file (nesc file). I searched lot on Internet but couldn't find a way.
Is there any way?
I don't know about TOSSIM, but using a real sensor board its possible to do so.
What you could do, is to write a program using Java, C#, etc that reads the file and passes the acquired data to the serial/usb port as a SERIAL PACKET. But you are limited to maximum of 255 byte for each packet.
So you should make a simple protocol that takes care of data chunks.
Of course you should know that how you can create a serial packet to be readable by sensor boards.
For that you need to read the TEP#113. But short story, a serial packet is consisted of:
HEADER + CONTENT + FOOTER
header contains protocol byte, destination and source address etc...
content is your message_t struct
footer has CRC and some other stuff
You have to take care of CRC calculation and also escaping start/end delimeters (I believe byte 126 or 127 is the delimiter, I mean indicator of starting and ending a packet).

Distinguish between PCM and BWF file format?

How can we distinguish between PCM and BWF format?
Is it necessary for BWF to have "bext" header?
I have some streams that don't have "bext" header but contains "JUNK" header... Are these files BWF files?
Thanks you.
The JUNK chunk is reserved space to allow a BWF file to be converted into an RF64 file on the fly if the size goes over 4GB. The JUNK chunk is the same size as a ds64 chunk, and will be replaced with a ds64 chunk if the conversion to RF64 is needed. Read more about it here.
My reading of the BWF spec is that you have to have a bext chunk for it to be a BWF.
As far as I know, a broadcast wave file will have the 'bext' header extension.
If a file does not have the 'bext' header extension, it will be a normal WAV/AIFF or whatever file.
Broadcast wave headers are used especially if you want to give a file more information about itself in the header which isn't to be seen immediately from its name.
For playing back, this info isn't necessary to know. Just if you want to show or search the meta information somehow.
PCM isn't a file format. All files that handle uncompressed data are PCM files.
Such as WAV/BWF, AIFF or SD2 for example.
With encoded files like MP3 or AAC you get the raw PCM values after decoding.
Yes. The 'bext' chunk is what distinguishes a BWF file from a wav file.
Some manufacturers actually use '.bwf' as a file extension but mostly the '.wav' extension will be used. It is only the presence of this chunk that makes the difference.
Other chunks can also be present and a well designed player will ignore chunks that it doesn't recognize.
Generally the 'data' chunk containing the audio data will be the last one in the file. However I have seen a few examples of other chunks, usually xml metadata, appearing after the 'data' chunk. This confuses some players.
For more information search for tech3285.pdf from the European Broadcasting Union website (tech.EBU.ch).

Resources