xentop VBD_RD & VBD_WR output - linux

I'm writing a perl script that track the output of xentop tool, I'm not sure what is the meaning for VBD_RD & VBD_WR
Following http://support.citrix.com/article/CTX127896
VBD_RD number displays read requests
VBD_WR number displays write requests
Dose anyone know how read & write requests are measured in bytes, kilobytes, megabytes??
Any ideas?
Thank you

As far as I understood, xentop shows you two different measuers (two for read and write).
VBD_RD and VBD_WR's measures are unit. The number of times you tried to access to the block device. This does not say anything about the the number of bytes you have read or written.
The second measure read (VBD_RSECT) and write (VBD_WSECT) are measured in "sectors". You can find the size of the sectors by using xenstore-ls (https://serverfault.com/questions/153196/xen-find-vbd-id-for-physical-disks) (in my case it was 512).
The unit of a sector is in bytes (http://xen.1045712.n5.nabble.com/xen-3-3-testing-blkif-Clarify-units-for-sector-sized-blkif-request-params-td2620172.html).
So if the VBD_WR value is 2, VBD_WSECT value is 10, sector size is 512. We have written 10 * 512 bytes in two different requests (you tried to access to the block device two times but we know nothing about how much bytes were written in each request but we only know the total). To find disk I/O you can periodically check these values and take the derivative between those values.
I suppose the sector size might change for each block device somehow, so it might be worthy to check the xenstore-ls output for each domain but I'm not sure. You can probably define it in the cfg file too.
This is what I found out and understood so far. I hope this helps.

Related

Decoding incomplete audio file

I was given an uncompressed .wav audio file (360 mb) which seems to be broken. The file was recorded using a small usb recorder (I don't have more information about the recorder at this moment). It was unreadable by any player and I've tried GSpot (https://www.headbands.com/gspot/) to detect whether it was perhaps of a different format than wav but to no avail. The file is big, which hints at it being in some uncompressed format. It misses the RIFF-WAVE characters at the start of the file though, which can be an indication this is some other format or perhaps (more likely in this case) the header is missing.
I've tried converting the bytes of the file directly to audio and this creates a VERY noisy audio file, though voices could be made out and I was able to determine the sample rate was probably 22050hz (given a sample size of 8-bits) and a file length of about 4 hours and 45 minutes. Running it through some filters in Audition resulted in a file that was understandable in some places, but still way too noisy in others.
Next I tried running the data through some java code that produces an image out of the bytes, and it showed me lots of noise, but also 3 byte separations every 1024 bytes. First a byte close to either 0 or 255 (but not 100%), then a byte representing a number distributed somewhere around 25 (but with some variation), and then a 00000000 (always, 100%). The first 'chunk header' (as I suppose these are) is located at 513 bytes into the file, again close to a 2-power, like the chunk size. Seems a bit too perfect for coincidence, so I'm mentioning it as it could be important. https://imgur.com/a/sgZ0JFS, the first image shows a 1024x1024 image showing the first 1mb of the file (row-wise) and the second image shows the distribution of the 3 'chunk header' bytes.
Next to these headers, the file also has areas that clearly show structure, almost wave-like structures. I suppose this is the actual audio I'm after, but it's riddled with noise: https://imgur.com/a/sgZ0JFS, third image, showing a region of the file with audio structures.
I also created a histogram for the entire file (ignoring the 3-byte 'chunk headers'): https://imgur.com/a/sgZ0JFS, fourth image. I've flipped the lower half of the range as I think audio data should be centered around some mean value, but correct me if I'm wrong. Maybe the non-symmetric nature of the histogram has something to do with signed/unsigned data or two's-complement. Perhaps the data representation is in 8-bit floats or something similar, I don't know.
I've ran into a wall now. I have no idea what else I can try. Is there anyone out there that sees something I missed. Perhaps someone can give me some pointers what else to try. I would really like to extract the audio data out of this file, as it contains some important information.
Sorry for the bother. I've been able to track down the owner of the voice recorder and had him record me a minute of audio with it and send me that file. I was able to determine the audio was IMA 4-bit ADPCM encoded, 16-bit audio at 48000hz. Looking at the structure of the file I realized simple placing the header of the good file in front of the data of the bad file should be possible, and lo and behold I had a working file again :)
I'm still very much interested how that ADPCM works and if I can write my own decoder, but that's for another day when I'm strolling on wikipedia again. Have a great day everyone!

What is a quick way to check if file contents are null?

I have a rather large file (32 GB) which is an image of an SD card, created using dd.
I suspected that the file is empty (i.e. filled with the null byte \x00) starting from a certain point.
I checked this using python in the following way (where f is an open file handle with the cursor at the last position I could find data at):
for i in xrange(512):
if set(f.read(64*1048576))!=set(['\x00']):
print i
break
This worked well (in fact it revealed some data at the very end of the image), but took >9 minutes.
Has anyone got a better way to do this? There must be a much faster way, I'm sure, but cannot think of one.
Looking at a guide about memory buffers in python here I suspected that the comparator itself was the issue. In most non-typed languages memory copies are not very obvious despite being a killer for performance.
In this case, as Oded R. established, creating a buffer from read and comparing the result with a previously prepared nul filled one is much more efficient.
size = 512
data = bytearray(size)
cmp = bytearray(size)
And when reading:
f = open(FILENAME, 'rb')
f.readinto(data)
Two things that need to be taken into account is:
The size of the compared buffers should be equal, but comparing bigger buffers should be faster until some point (I would expect memory fragmentation to be the main limit)
The last buffer may not be the same size, reading the file into the prepared buffer will keep the tailing zeroes where we want them.
Here the comparison of the two buffers will be quick and there will be no attempts of casting the bytes to string (which we don't need) and since we reuse the same memory all the time, the garbage collector won't have much work either... :)

CRC16 collision (2 CRC values of blocks of different size)

The Problem
I have a textfile which contains one string per line (linebreak \r\n). This file is secured using CRC16 in two different ways.
CRC16 of blocks of 4096 bytes
CRC16 of blocks of 32768 bytes
Now I have to modify any of these 4096 byte blocks, so it (the block)
contains a specific string
does not change the size of the textfile
has the same CRC value as the original block (same for the 32k block, that contains this 4k block)
Depart of that limitations I may do any modifications to the block that are required to fullfill it as long as the file itself does not break its format. I think it is the best to use any of the completly filled 4k blocks, not the last block, that could be really short.
The Question
How should I start to solve that problem? The first thing I would come up is some kind of bruteforce but wouldn't it take extremly long to find the changes that will result in both CRC values stay the same? Is there probably a mathematical way to solve that?
It should be done in seconds or max. few minutes.
There are math ways to solve this but I don't know them. I'm proposing a brute-force solution:
A block looks like this:
SSSSSSSMMMMEEEEEEE
Each character represents a byte. S = start bytes, M = bytes you can modify, E = end bytes.
After every byte added to the CRC it has a new internal state. You can reuse the checksum state up to that position that you modify. You only need to recalculate the checksum for the modified bytes and all following bytes. So calculate the CRC for the S-part only once.
You don't need to recompute the following bytes either. You just need to check whether the CRC state is the same or different after the modification you made. If it is the same, the entire block will also be the same. If it is different, the entire block is likely to be different (not guaranteed, but you should abort the trial). So you compute the CRC of just the S+M' part (M' being the modified bytes). If it equals the state of CRC(S+M) you won.
That way you have much less data to go through and a recent desktop or server can do the 2^32 trials required in a few minutes. Use parallelism.
Take a look at spoof.c. That will directly solve your problem for the CRC of the 4K block. However you will need to modify the code to solve the problem simultaneously for both the CRC of the 4K block and the CRC of the enclosing 32K block. It is simply a matter of adding more equations to solve. The code is extremely fast, running in O(log(n)) time, where n is the length of the message.
The basic idea is that you will need to solve 32 linear equations over GF(2) in 32 or more unknowns, where each unknown is a bit location that you are permitting to be changed. It is important to provide more than 32 unknowns with which to solve the problem, since if you pick exactly 32, it is not at all unlikely that you will end up with a singular matrix and no solution. The spoof code will automatically find non-singular choices of 32 unknown bit locations out of the > 32 that you provide.

Serial port : not able to write big chunk of data

I am trying to send text data from one PC to other using Serial cable. One of the PC is running linux and I am sending data from it using write(2) system call. The log size is approx 65K bytes but the write(2) system call returns some 4K bytes (i.e. this much amount of data is getting transferred). I tried breaking the data in chunks of 4K but write(2) returns -1.
My question is that "Is there any buffer limit for writing data on serial port? or can I send data of any size?. Also do I need to continously read data from other PC as I write 4K chunk of data"
Do I need to do any special configuration in termios structure for sending (huge) data?
The transmit buffer is one page (took a look at Linux 2.6.18 sources) - which is 4K in most (if not all) cases.
The other end must read (don't know the size of the receive buffer), but more importantly you should not write faster than the serial port can transmit, if you are using 115200 bps 8-N-1 you can write the 4K chunk approximately 3 times a second. (115200 / 9 / 4096 = 3.125)
Yes, there is a buffer limit - but when you reach that limit, the write() should block.
When write() returns -1, what is errno set to?
Make sure that the receiver is reading.
You should update the current position it your buffer from the write(), and continue the next write from there. (Applies to all writes(), regardless if the fd is a serial port, tcp socket or a file.)
If you get an error back for subsequent writes. Judging by the manpage, its safe to retry the writes for the following errnos: EAGAIN, EINTR, and probably ENOSPC. Use perror() to see what you get. (..and post it, I am curious.)
EFBIG would seem to indicate that you are trying to write using a buffer (or rather count) that is too large, but that is probably much larger than 64k.
If the internal buffer is filled up, because you are writing to fast, try to (nano)sleep a little between the writes. There are several clever ways of doing this (like tcp does), but if the rate is known, just write at a fixed rate.
If you think the receiver is actually reading, but not much happens, have a look at the serial ports flow-control options and if the cable is wired for DTS/RTS.

Purpose of ibs/obs/bs in dd

I have a script that creates file system in a file on a linux machine. I see that to create the file system, it uses 'dd' with bs=x option, reads from /dev/zero and writes to a file. I think usually specifying ibs/obs/bs is useful to read from real hardware devices as one has specific block size constraints. In this case however, as it is reading from virtual device and writing to a file, I don't see any point behind using 'bs=x bytes' option. Is my understanding wrong here?
(Just in case if it helps, this file system is later on used to boot a qemu vm)
To understand block sizes, you have to be familiar with tape drives. If you're not interested in tape drives - for example, you don't think you're ever going to use one - then you can go back to sleep now.
Remember the tape drives from films in the 60s, 70s, maybe even 80s? The ones where the reel went spinning around, and so on? Not your Exabyte or even QIC - quarter-inch cartridge - tapes; your good old fashioned reel-to-reel half-inch tape drives? On those, block size mattered.
The data on a tape was written in blocks. Each block was separated from the next by an inter-record gap.
----+-------+-----+-------+-----+----
... | block | IRG | block | IRG | ...
----+-------+-----+-------+-----+----
Depending on the tape drive hardware and software, there were a variety of problems that could happen. For example, if the tape was written with a block size of 5120 bytes and you read the tape with a block size of 512 bytes, then the tape drive might read the first block, return you 512 bytes of it, and then discard the remaining data; the next read would start on the next block. Conversely, if the tape was written with a block size of 512 bytes and you requested blocks of 5120 bytes, you would get short reads; each read would return just 512 bytes, and if your software wasn't paying attention, you'd be reading garbage. There was also the issue that the tape drive had to get up to speed to read the block, and then slow down. The ASCII art suggests that the IRG was smaller than the data blocks; that was not necessarily the case. And it took time to read one block, overshoot the IRG, rewind backwards to get to the next block, and start forwards again. And if the tape drive didn't have the memory to buffer data - the cheaper ones did not - then you could seriously affect your tape drive performance.
War story: work prepared on newer machine with a slightly more modern tape drive. I wrote a tape using tar without a sensible block size (so it defaulted to 512 bytes). It was a large bit of software - must have been, oh, less than 100 MB in total (a long time ago, in other words). The tape wrote nicely because the machine was modern enough, and it took just a few seconds to do so. But, I had to get the material off the tape on a machine with an older tape drive, one that did not have any on-board buffer. So, it read the material, 512 bytes at a time, and the reel rocked forward, reading one block, and then rocked back all but maybe half an inch, and then read forwards to get to the next block, and then rocked back, and ... well, you could see it doing this, and since it took appreciable portions of a second to read each 512 byte block, the total time taken was horrendous. My flight was due to leave...and I needed to get that data across too. (It was long enough ago, and in a land far enough away, that last minute changes to flights weren't much of an option either.) To cut a long story short, it did get read - but if I'd used a sensible block size (such as 5120 bytes instead of the default of 512), I would have been done much, much quicker and with much less danger of missing the plane (but I did actually catch the plane, with maybe 20 minutes to spare, despite a taxi ride across Paris in the rush hour).
With more modern tape drives, there was enough memory on the drive to do buffering and getting a tape drive to stream - write continuously without reversing - was feasible. It used to be that I'd use a block size like 256 KB to get QIC tapes to stream. I've not done much with tape drives recently - let's see, not this millennium and not much for a few years before that, either; certainly not much since CD and DVD became the software distribution mechanisms of choice (when electronic download wasn't used).
But the block size really did matter in the old days. And dd provided good support for it. You could even transfer data from a tape drive that was written with, say, 4 KB block to another that you wanted to write with, say, 16 KB blocks, by specifying the ibs (input block size) separately from the obs (output block size). Darned useful!
Also, the count parameter is in terms of the (input) block size. It was useful to say 'dd bs=1024 count=1024 if=/dev/zero of=/my/file/of/zeroes' to copy 1 MB of zeroes around. Or to copy 1 MB of a file.
The importance of dd is vastly diminished; it was an essential part of the armoury for anybody who worked with tape drives a decade or more ago.
The block size is the number of bytes that are read and written at a time. Presumably there is a count= option, and that is specified in units of the block size. If there is a skip= or seek= option, those will also be in block size units. However if you are reading and writing a regular file, and there are no disk errors, then the block size doesn't really matter as long as you can scale those parameters accordingly and they are still integers. However certain sizes may be more efficient than others.
For reading from /dev/zero, it doesn't matter. ibs/obs/bs specify how many bytes will be read at a time. It's helpful to choose a number based on the way bytes are read/written in the operating system. For instance, Linux usually reads from a hard drive in 4096 byte chunks. If you have at least some idea about how the underlying hardware reads/writes, it might be a good idea to specify ibs/obs/bs. By the way, if you specify bs, it will override whatever you specify for ibs and obs.
In addition to the great answer by Jonathan Leffler, keep in mind that the bs= option isn't always a substitute for using both ibs= and obs=, in particular for the old [ugly] days of tape drives.
The bs= option reserves the right for dd to write the data as soon as it's read. This can cause you to no longer have identically sized blocks on the output. Here is GNU's take on this, but the behavior dates back as far as I can remember (80's):
(bs=) Set both input and output block sizes to bytes. This makes dd read and write bytes per block, overriding any ‘ibs’ and ‘obs’ settings. In addition, if no data-transforming conv option is specified, input is copied to the output as soon as it’s read, even if it is smaller than the block size.
For instance, back in the QIC days on an old Sun system, if you did this:
tar cvf /dev/rst0c /bla
It would work, but cause an enormous amount of back and forth thrashing while the drive wrote a small block, and tried to backup and read to reposition itself properly for the next write.
If you swapped this with:
tar cvf - /bla | dd ibs=16K obs=16K of=/dev/rst0c
You'd get the QIC drive writing much larger chunks and not thrashing quite so much.
However, if you made the mistake of this:
tar cvf - /bla | dd bs=16K of=/dev/rst0c
You'd run the risk of having precisely the same thrashing you had before depending upon how much data was available at the time of each read.
Specifying both ibs= and obs= precludes this from happening.

Resources