Getting the percentage of used space and used inodes in a mount - linux

I need to calculate the percentage of used space and used inodes for a mount path (e.g. /mnt/mycustommount) in Go.
This is my attempt:
var statFsOutput unix.Statfs_t
err := unix.Statfs(mount_path, &statFsOutput)
if err != nil {
return err
}
totalBlockCount := statFsOutput.Blocks // Total data blocks in filesystem
freeSystemBlockCount = statFsOutput.Bfree // Free blocks in filesystem
freeUserBlockCount = statFsOutput.Bavail // Free blocks available to unprivileged user
Now the proportion I need would be something like this:
x : 100 = (totalBlockCount - free[which?]BlockCount) : totalBlockCount
i.e. x : 100 = usedBlockCount : totalBlockCount . But I don't understand the difference between Bfree and Bavail (what's 'unprivileged' user go to do with filesystem blocks?).
For inodes my attempt:
totalInodeCount = statFsOutput.Files
freeInodeCount = statFsOutput.Ffree
// so now the proportion is
// x : 100 = (totalInodeCount - freeInodeCount) : totalInodeCount
How to get the percentage for used storage?
And is the inodes calculation I did correct?

Your comment expression isn't valid Go, so I can't really interpret it without guessing. With guessing, I interpret it as correct, but have I guessed what you actually mean, or merely what I think you mean? In other words, without showing actual code, I can only imagine what your final code will be. If the code I imagine isn't the actual code, the correctness of the code I imagine you will write is irrelevant.
That aside, I can answer your question here:
(what's 'unprivileged' user go to do with filesystem blocks?)
The Linux statfs call uses the same fields as 4.4BSD. The default 4.4BSD file system (the one called the "fast file system") uses a blocks-with-fragmentation approach to allocate blocks in a sort of stochastic manner. This allocation process works very well on an empty file system, and continues to work well, without extreme slowdown, on somewhat-full file systems. Computerized modeling of its behavior, however, showed pathological slowdowns (amounting to linear search, more or less) were possible if the block usage exceeded somewhere around 90%.
(Later, analysis of real file systems found that the slowdowns generally did not hit until the block usage exceeded 95%. But the idea of a 10% "reserve" was pretty well established by then.)
Hence, if a then-popular large-size disk drive of 400 MB1 gave 10% for inodes and another 10% for reserved blocks, that meant that ordinary users could allocate about 320 MB of file data. At that point the drive was "100% full", but it could go to 111% by using up the remaining blocks. Those blocks were reserved to the super-user though.
These days, instead of a "super user", one can have a capability that can be granted or revoked. However, these days we don't use the same file systems either. So there may be no difference between bfree and bavail on your system.
1Yes, the 400 MB Fujitsu Eagle was a large (in multiple senses: it used a 19 inch rack mount setup) drive back then. People are spoiled today with their multi-terabyte SSDs. 😀

Related

What is the reason of Corrupted fields problem in SPLUNK?

I have a problem on this search below for last 25 days:
index=syslog Reason="Interface physical link is down" OR Reason="Interface physical link is up" NOT mainIfname="Vlanif*" "nw_ra_a98c_01.34_krtti"
Normally field7 values are like these ones:
Region field7 Date mainIfname Reason count
ASYA nw_ra_m02f_01.34pndkdv may 9 GigabitEthernet0/3/6 Interface physical link is up 3
ASYA nw_ra_m02f_01.34pldtwr may 9 GigabitEthernet0/3/24 Interface physical link is up 2
But recently they wee like this:
00:00:00.599 nw_ra_a98c_01.34_krtti
00:00:03.078 nw_ra_a98c_01.34_krtti
I think problem may be related to:
It started to happen after the disk free alarm. (-Cri- Swap reservation, bottleneck situation, current value: 95.00% exceeds configured threshold: 90.00%. : 07:17 17/02/20)
Especially This is not about disk, it's about swap space, the application finishes memory and then goes to swap use. There was memory increase before, but obviously it was insufficient, it is switching to swap again.
I need to understand: ''Why they use so many resources?''
Problematic one:
Normal one:
You need to provide example events, one from the normal situation, and one from the problematic situation.
It appears that someone in your environment has developed a field extraction for field7, which is incorrectly parsing the event.
Alternatively, it could the device that is sending the syslog data, may have an issue with it and it is reporting an error. Depending on the device, you may be better using a TA from splunkbase.splunk.com to extract the relevant information from the event

Buffer overflow exploitation 101

I've heard in a lot of places that buffer overflows, illegal indexing in C like languages may compromise the security of a system. But in my experience all it does is crash the program I'm running. Can anyone explain how buffer overflows could cause security problems? An example would be nice.
I'm looking for a conceptual explanation of how something like this could work. I don't have any experience with ethical hacking.
First, buffer overflow (BOF) are only one of the method of gaining code execution. When they occur, the impact is that the attacker basically gain control of the process. This mean that the attacker will be able to trigger the process in executing any code with the current process privileges (depending if the process is running with a high or low privileged user on the system will respectively increase or reduce the impact of exploiting a BOF on that application). This is why it is always strongly recommended to run applications with the least needed privileges.
Basically, to understand how BOF works, you have to understand how the code you have build gets compiled into machine code (ASM) and how data managed by your software is stored in memory.
I will try to give you a basic example of a subcategory of BOF called Stack based buffer overflows :
Imagine you have an application asking the user to provide a username.
This data will be read from user input and then stored in a variable called USERNAME. This variable length has been allocated as a 20 byte array of chars.
For this scenario to work, we will consider the program's do not check for the user input length.
At some point, during the data processing, the user input is copied to the USERNAME variable (20bytes) but since the user input is longer (let's say 500 bytes) data around this variable will be overwritten in memory :
Imagine such memory layout :
size in bytes 20 4 4 4
data [USERNAME][variable2][variable3][RETURN ADDRESS]
If you define the 3 local variables USERNAME, variable2 and variable3 the may be store in memory the way it is shown above.
Notice the RETURN ADDRESS, this 4 byte memory region will store the address of the function that has called your current function (thanks to this, when you call a function in your program and readh the end of that function, the program flow naturally go back to the next instruction just after the initial call to that function.
If your attacker provide a username with 24 x 'A' char, the memory layout would become something like this :
size in bytes 20 4 4 4
data [USERNAME][variable2][variable3][RETURN ADDRESS]
new data [AAA...AA][ AAAA ][variable3][RETURN ADDRESS]
Now, if an attacker send 50 * the 'A' char as a USERNAME, the memory layout would looks like this :
size in bytes 20 4 4 4
data [USERNAME][variable2][variable3][RETURN ADDRESS]
new data [AAA...AA][ AAAA ][ AAAA ][[ AAAA ][OTHER AAA...]
In this situation, at the end of the execution of the function, the program would crash because it will try to reach the address an invalid address 0x41414141 (char 'A' = 0x41) because the overwritten RETURN ADDRESS doesn't match a correct code address.
If you replace the multiple 'A' with well thought bytes, you may be able to :
overwrite RETURN ADDRESS to an interesting location.
place "executable code" in the first 20 + 4 + 4 bytes
You could for instance set RETURN ADDRESS to the address of the first byte of the USERNAME variable (this method is mostly no usable anymore thanks to many protections that have been added both to OS and to compiled programs).
I know it is quite complex to understand at first, and this explanation is a very basic one. If you want more detail please just ask.
I suggest you to have a look at great tutorials like this one which are quite advanced but more realistic

What does the IS_ALIGNED macro in the linux kernel do?

I've been trying to read the implementation of a kernel module, and I'm stumbling on this piece of code.
unsigned long addr = (unsigned long) buf;
if (!IS_ALIGNED(addr, 1 << 9)) {
DMCRIT("#%s in %s is not sector-aligned. I/O buffer must be sector-aligned.", name, caller);
BUG();
}
The IS_ALIGNED macro is defined in the kernel source as follows:
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
I understand that data has to be aligned along the size of a datatype to work, but I still don't understand what the code does.
It left-shifts 1 by 9, then subtracts by 1, which gives 111111111. Then 111111111 does bitwise-and with x.
Why does this code work? How is this checking for byte alignment?
In systems programming it is common to need a memory address to be aligned to a certain number of bytes -- that is, several lowest-order bits are zero.
Basically, !IS_ALIGNED(addr, 1 << 9) checks whether addr is on a 512-byte (2^9) boundary (the last 9 bits are zero). This is a common requirement when erasing flash locations because flash memory is split into large blocks which must be erased or written as a single unit.
Another application of this I ran into. I was working with a certain DMA controller which has a modulo feature. Basically, that means you can allow it to change only the last several bits of an address (destination address in this case). This is useful for protecting memory from mistakes in the way you use a DMA controller. Problem it, I initially forgot to tell the compiler to align the DMA destination buffer to the modulo value. This caused some incredibly interesting bugs (random variables that have nothing to do with the thing using the DMA controller being overwritten... sometimes).
As far as "how does the macro code work?", if you subtract 1 from a number that ends with all zeroes, you will get a number that ends with all ones. For example, 0b00010000 - 0b1 = 0b00001111. This is a way of creating a binary mask from the integer number of required-alignment bytes. This mask has ones only in the bits we are interested in checking for zero-value. After we AND the address with the mask containing ones in the lowest-order bits we get a 0 if any only if the lowest 9 (in this case) bits are zero.
"Why does it need to be aligned?": This comes down to the internal makeup of flash memory. Erasing and writing flash is a much less straightforward process then reading it, and typically it requires higher-than-logic-level voltages to be supplied to the memory cells. The circuitry required to make write and erase operations possible with a one-byte granularity would waste a great deal of silicon real estate only to be used rarely. Basically, designing a flash chip is a statistics and tradeoff game (like anything else in engineering) and the statistics work out such that writing and erasing in groups gives the best bang for the buck.
At no extra charge, I will tell you that you will be seeing a lot of this type of this type of thing if you are reading driver and kernel code. It may be helpful to familiarize yourself with the contents of this article (or at least keep it around as a reference): https://graphics.stanford.edu/~seander/bithacks.html

How to create a large file on a VFAT partition efficiently in embedded Linux

I'm trying to create a large empty file on a VFAT partition by using the `dd' command in an embedded linux box:
dd if=/dev/zero of=/mnt/flash/file bs=1M count=1 seek=1023
The intention was to skip the first 1023 blocks and write only 1 block at the end of the file, which should be very quick on a native EXT3 partition, and it indeed is. However, this operation turned out to be quite slow on a VFAT partition, along with the following message:
lowmem_shrink:: nr_to_scan=128, gfp_mask=d0, other_free=6971, min_adj=16
// ... more `lowmem_shrink' messages
Another attempt was to fopen() a file on the VFAT partition and then fseek() to the end to write the data, which has also proved slow, along with the same messages from the kernel.
So basically, is there a quick way to create the file on the VFAT partition (without traversing the first 1023 blocks)?
Thanks.
Why are VFAT "skipping" writes so slow ?
Unless the VFAT filesystem driver were made to "cheat" in this respect, creating large files on FAT-type filesystems will always take a long time. The driver, to comply with FAT specification, will have to allocate all data blocks and zero-initialize them, even if you "skip" the writes. That's because of the "cluster chaining" FAT does.
The reason for that behaviour is FAT's inability to support either:
UN*X-style "holes" in files (aka "sparse files")
that's what you're creating on ext3 with your testcase - a file with no data blocks allocated to the first 1GB-1MB of it, and a single 1MB chunk of actually committed, zero-initialized blocks) at the end.
NTFS-style "valid data length" information.
On NTFS, a file can have uninitialized blocks allocated to it, but the file's metadata will keep two size fields - one for the total size of the file, another for the number of bytes actually written to it (from the beginning of the file).
Without a specification supporting either technique, the filesystem would always have to allocate and zerofill all "intermediate" data blocks if you skip a range.
Also remember that on ext3, the technique you used does not actually allocate blocks to the file (apart from the last 1MB). If you require the blocks preallocated (not just the size of the file set large), you'll have to perform a full write there as well.
How could the VFAT driver be modified to deal with this ?
At the moment, the driver uses the Linux kernel function cont_write_begin() to start even an asynchronous write to a file; this function looks like:
/*
* For moronic filesystems that do not allow holes in file.
* We may have to extend the file.
*/
int cont_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata,
get_block_t *get_block, loff_t *bytes)
{
struct inode *inode = mapping->host;
unsigned blocksize = 1 << inode->i_blkbits;
unsigned zerofrom;
int err;
err = cont_expand_zero(file, mapping, pos, bytes);
if (err)
return err;
zerofrom = *bytes & ~PAGE_CACHE_MASK;
if (pos+len > *bytes && zerofrom & (blocksize-1)) {
*bytes |= (blocksize-1);
(*bytes)++;
}
return block_write_begin(mapping, pos, len, flags, pagep, get_block);
}
That is a simple strategy but also a pagecache trasher (your log messages are a consequence of the call to cont_expand_zero() which does all the work, and is not asynchronous). If the filesystem were to split the two operations - one task to do the "real" write, and another one to do the zero filling, it'd appear snappier.
The way this could be achieved while still using the default linux filesystem utility interfaces were by internally creating two "virtual" files - one for the to-be-zerofilled area, and another for the actually-to-be-written data. The real file's directory entry and FAT cluster chain would only be updated once the background task is actually complete, by linking its last cluster with the first one of the "zerofill file" and the last cluster of that one with the first one of the "actual write file". One would also want to go for a directio write to do the zerofilling, in order to avoid trashing the pagecache.
Note: While all this is technically possible for sure, the question is how worthwhile would it be to do such a change ? Who needs this operation all the time ? What would side effects be ?
The existing (simple) code is perfectly acceptable for smaller skipping writes, you won't really notice its presence if you create a 1MB file and write a single byte at the end. It'll bite you only if you go for filesizes on the order of the limits of what the FAT filesystem allows you to do.
Other options ...
In some situations, the task at hand involves two (or more) steps:
freshly format (e.g.) a SD card with FAT
put one or more big files onto it to "pre-fill" the card
(app-dependent, optional)
pre-populate the files, or
put a loopback filesystem image into them
One of the cases I've worked on we've folded the first two - i.e. modified mkdosfs to pre-allocate/ pre-create files when making the (FAT32) filesystem. That's pretty simple, when writing the FAT tables just create allocated cluster chains instead of clusters filled with the "free" marker. It's also got the advantage that the data blocks are guaranteed to be contiguous, in case your app benefits from this. And you can decide to make mkdosfs not clear the previous contents of the data blocks. If you know, for example, that one of your preparation steps involves writing the entire data anyway or doing ext3-in-file-on-FAT (pretty common thing - linux appliance, sd card for data exchange with windows app/gui), then there's no need to zero out anything / double-write (once with zeroes, once with whatever-else). If your usecase fits this (i.e. formatting the card is a useful / normal step of the "initialize it for use" process anyway) then try it out; a suitably-modified mkdosfs is part of TomTom's dosfsutils sources, see mkdosfs.c search for the -N command line option handling.
When talking about preallocation, as mentioned, there's also posix_fallocate(). Currently on Linux when using FAT, this will do essentially the same as a manual dd ..., i.e. wait for the zerofill. But the specification of the function doesn't mandate it being synchronous. The block allocation (FAT cluster chain generation) would have to be done synchronously, but the VFAT on-disk dirent size update and the data block zerofills could be backgrounded / delayed (i.e. either done at low-prio in background or only done if explicitly requested via fdsync() / sync() so that the app can e.g. alloc blocks, write the contents with non-zeroes itself ...). That's technique / design though; I'm not aware of anyone having done that kernel modification yet, if only for experimenting.

Purpose of ibs/obs/bs in dd

I have a script that creates file system in a file on a linux machine. I see that to create the file system, it uses 'dd' with bs=x option, reads from /dev/zero and writes to a file. I think usually specifying ibs/obs/bs is useful to read from real hardware devices as one has specific block size constraints. In this case however, as it is reading from virtual device and writing to a file, I don't see any point behind using 'bs=x bytes' option. Is my understanding wrong here?
(Just in case if it helps, this file system is later on used to boot a qemu vm)
To understand block sizes, you have to be familiar with tape drives. If you're not interested in tape drives - for example, you don't think you're ever going to use one - then you can go back to sleep now.
Remember the tape drives from films in the 60s, 70s, maybe even 80s? The ones where the reel went spinning around, and so on? Not your Exabyte or even QIC - quarter-inch cartridge - tapes; your good old fashioned reel-to-reel half-inch tape drives? On those, block size mattered.
The data on a tape was written in blocks. Each block was separated from the next by an inter-record gap.
----+-------+-----+-------+-----+----
... | block | IRG | block | IRG | ...
----+-------+-----+-------+-----+----
Depending on the tape drive hardware and software, there were a variety of problems that could happen. For example, if the tape was written with a block size of 5120 bytes and you read the tape with a block size of 512 bytes, then the tape drive might read the first block, return you 512 bytes of it, and then discard the remaining data; the next read would start on the next block. Conversely, if the tape was written with a block size of 512 bytes and you requested blocks of 5120 bytes, you would get short reads; each read would return just 512 bytes, and if your software wasn't paying attention, you'd be reading garbage. There was also the issue that the tape drive had to get up to speed to read the block, and then slow down. The ASCII art suggests that the IRG was smaller than the data blocks; that was not necessarily the case. And it took time to read one block, overshoot the IRG, rewind backwards to get to the next block, and start forwards again. And if the tape drive didn't have the memory to buffer data - the cheaper ones did not - then you could seriously affect your tape drive performance.
War story: work prepared on newer machine with a slightly more modern tape drive. I wrote a tape using tar without a sensible block size (so it defaulted to 512 bytes). It was a large bit of software - must have been, oh, less than 100 MB in total (a long time ago, in other words). The tape wrote nicely because the machine was modern enough, and it took just a few seconds to do so. But, I had to get the material off the tape on a machine with an older tape drive, one that did not have any on-board buffer. So, it read the material, 512 bytes at a time, and the reel rocked forward, reading one block, and then rocked back all but maybe half an inch, and then read forwards to get to the next block, and then rocked back, and ... well, you could see it doing this, and since it took appreciable portions of a second to read each 512 byte block, the total time taken was horrendous. My flight was due to leave...and I needed to get that data across too. (It was long enough ago, and in a land far enough away, that last minute changes to flights weren't much of an option either.) To cut a long story short, it did get read - but if I'd used a sensible block size (such as 5120 bytes instead of the default of 512), I would have been done much, much quicker and with much less danger of missing the plane (but I did actually catch the plane, with maybe 20 minutes to spare, despite a taxi ride across Paris in the rush hour).
With more modern tape drives, there was enough memory on the drive to do buffering and getting a tape drive to stream - write continuously without reversing - was feasible. It used to be that I'd use a block size like 256 KB to get QIC tapes to stream. I've not done much with tape drives recently - let's see, not this millennium and not much for a few years before that, either; certainly not much since CD and DVD became the software distribution mechanisms of choice (when electronic download wasn't used).
But the block size really did matter in the old days. And dd provided good support for it. You could even transfer data from a tape drive that was written with, say, 4 KB block to another that you wanted to write with, say, 16 KB blocks, by specifying the ibs (input block size) separately from the obs (output block size). Darned useful!
Also, the count parameter is in terms of the (input) block size. It was useful to say 'dd bs=1024 count=1024 if=/dev/zero of=/my/file/of/zeroes' to copy 1 MB of zeroes around. Or to copy 1 MB of a file.
The importance of dd is vastly diminished; it was an essential part of the armoury for anybody who worked with tape drives a decade or more ago.
The block size is the number of bytes that are read and written at a time. Presumably there is a count= option, and that is specified in units of the block size. If there is a skip= or seek= option, those will also be in block size units. However if you are reading and writing a regular file, and there are no disk errors, then the block size doesn't really matter as long as you can scale those parameters accordingly and they are still integers. However certain sizes may be more efficient than others.
For reading from /dev/zero, it doesn't matter. ibs/obs/bs specify how many bytes will be read at a time. It's helpful to choose a number based on the way bytes are read/written in the operating system. For instance, Linux usually reads from a hard drive in 4096 byte chunks. If you have at least some idea about how the underlying hardware reads/writes, it might be a good idea to specify ibs/obs/bs. By the way, if you specify bs, it will override whatever you specify for ibs and obs.
In addition to the great answer by Jonathan Leffler, keep in mind that the bs= option isn't always a substitute for using both ibs= and obs=, in particular for the old [ugly] days of tape drives.
The bs= option reserves the right for dd to write the data as soon as it's read. This can cause you to no longer have identically sized blocks on the output. Here is GNU's take on this, but the behavior dates back as far as I can remember (80's):
(bs=) Set both input and output block sizes to bytes. This makes dd read and write bytes per block, overriding any ‘ibs’ and ‘obs’ settings. In addition, if no data-transforming conv option is specified, input is copied to the output as soon as it’s read, even if it is smaller than the block size.
For instance, back in the QIC days on an old Sun system, if you did this:
tar cvf /dev/rst0c /bla
It would work, but cause an enormous amount of back and forth thrashing while the drive wrote a small block, and tried to backup and read to reposition itself properly for the next write.
If you swapped this with:
tar cvf - /bla | dd ibs=16K obs=16K of=/dev/rst0c
You'd get the QIC drive writing much larger chunks and not thrashing quite so much.
However, if you made the mistake of this:
tar cvf - /bla | dd bs=16K of=/dev/rst0c
You'd run the risk of having precisely the same thrashing you had before depending upon how much data was available at the time of each read.
Specifying both ibs= and obs= precludes this from happening.

Resources