Why are file sizes all different?
In Windows 10 I can see all of these sizes:
11,116 KB
10.8 MB
11,382,240 Bytes
11,382,784 Bytes
If I use the Console Window:
D:\My Programs\2017\MeetSchedAssist\Inno\Output>dir *.exe
Volume in drive D is DATA
Volume Serial Number is A8B0-A5C6
Directory of D:\My Programs\2017\MeetSchedAssist\Inno\Output
03/04/2018 08:50 11,382,240 MeetSchedAssistSetup.exe
1 File(s) 11,382,240 bytes
0 Dir(s) 719,837,487,104 bytes free
D:\My Programs\2017\MeetSchedAssist\Inno\Output>
I understand that perhaps on the physical media it has to round it to physically take a certain amount of space, but that line above:
Size: 10.8 MB (11,382,240 bytes)
Huh? Why does it not say 11.38 MB?
Once upon a time it has been defined that
1 kB = 1024 B
1 MB = 1024 kB
If you divide your bytes figure all the way down to MB, you'll get all those figures.
Now that they noticed that many people tend to walk into that trap, they have redefined the unit multiples and defined new ones
1 kiB = 1024 B
1 MiB = 1024 kiB
1 kB = 1000 B
1 MB = 1000 kB
but this scheme is not so widespread (seems to be more common with total size specs of storage media).
Funny sidenote: I guess I am not the only one who has learned it the old way and now mixes it up with the current definition all the time. I'd say problems like this are the root cause for humanity being mostly conservatively oriented.
Related
According to this documentation, CPU shares is calculated as
process_cpu.shares = min( 1024*(application_memory / 8 GB), 1024)
According to this formula, if an application is assigned 1GB memory, then it should get 128 CPU shares. 1024*(1/8). However, if we SSH into the application and check the cpu.shares we get 122
cat /sys/fs/cgroup/cpu/cpu.shares
122
Here are the observations:
Memory
Calculated cpu.share
Observed cpu.share
Difference
1GB
128
122
~5% difference
1.5GB/1536MB
192
184
~5% difference
2GB
256
256
3GB
384
384
4GB
512
512
5GB
640
634
~1% difference
5.5GB/5632MB
704
696
~2% difference
8GB
1024
1024
Why is it that for some values there is this discrepancy (Such as 1G,1.5G,5G etc..) while others (like 2,3,4 - 6,7,8) are consistent with the calculation. I believe that I am missing something from Cgroups perspective for this calculation. Is this specific to CF or is it something to do with the way Linux calculates resource allocation in Cgroups in general? Is there a headroom always reserved?
I read the article about Linux Huge pages technology and misunderstood some important detail.
Here is the phrase:
For example, if you use HugePages with 64-bit hardware, and you want
to map 256 MB of memory, you may need one page table entry (PTE). If
you do not use HugePages, and you want to map 256 MB of memory, then
you must have 256 MB * 1024 KB/4 KB = 65536 PTEs.
I don't understand what is 1024 KB in this formula. I think it should be just 256 MB / 4 KB to calculate the number of table entries. Is there a typo in formula or am I wrong?
I agree that it is confusing. After reading it several times, I believe that it is as simple as a matter of unit conversion. At school the mathematics/physics/chemistry teachers always told us to use the same units when doing operations in order to obtain coherent results.
The value 256 is expressed in megabytes (MB). To divide it by 4 expressed in kilo-bytes (KB), you need to convert it into kilo-bytes. Hence, the multiplication by 1024KB (= 1MB). So, literally the operation is: (256 x 1024) / 4 = 65536 which is the simplification of: (256 x 1024 x 1024) / (4 x 1024)
I'm trying to make sense from GHC profiler. There is a rather simple app, which uses werq and lens-aeson libraries, and while learning about GHC profiling, I decided to play with it a bit.
Using different options (time tool, +RTS -p -RTS and +RTS -p -h) I acquired entirely different numbers of my memory usage. Having all those numbers, I'm now completely lost trying to understand what is going on, and how much memory the app actually uses.
This situation reminds me the phrase by Arthur Bloch: "A man with a watch knows what time it is. A man with two watches is never sure."
Can you, please, suggest me, how I can read all those numbers, and what is the meaning of each of them.
Here are the numbers:
time -l reports around 19M
#/usr/bin/time -l ./simple-wreq
...
3.02 real 0.39 user 0.17 sys
19070976 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
21040 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
71 messages sent
71 messages received
2991 signals received
43 voluntary context switches
6490 involuntary context switches
Using +RTS -p -RTS flag reports around 92M. Although it says "total alloc" it seems strange to me, that a simple app like this one can allocate and release 91M
# ./simple-wreq +RTS -p -RTS
# cat simple-wreq.prof
Fri Oct 14 15:08 2016 Time and Allocation Profiling Report (Final)
simple-wreq +RTS -N -p -RTS
total time = 0.07 secs (69 ticks # 1000 us, 1 processor)
total alloc = 91,905,888 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
main.g Main 60.9 88.8
MAIN MAIN 24.6 2.5
decodeLenient/look Data.ByteString.Base64.Internal 5.8 2.6
decodeLenientWithTable/fill Data.ByteString.Base64.Internal 2.9 0.1
decodeLenientWithTable.\.\.fill Data.ByteString.Base64.Internal 1.4 0.0
decodeLenientWithTable.\.\.fill.\ Data.ByteString.Base64.Internal 1.4 0.1
decodeLenientWithTable.\.\.fill.\.\.\.\ Data.ByteString.Base64.Internal 1.4 3.3
decodeLenient Data.ByteString.Base64.Lazy 1.4 1.4
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 443 0 24.6 2.5 100.0 100.0
main Main 887 0 0.0 0.0 75.4 97.4
main.g Main 889 0 60.9 88.8 75.4 97.4
object_ Data.Aeson.Parser.Internal 925 0 0.0 0.0 0.0 0.2
jstring_ Data.Aeson.Parser.Internal 927 50 0.0 0.2 0.0 0.2
unstream/resize Data.Text.Internal.Fusion 923 600 0.0 0.3 0.0 0.3
decodeLenient Data.ByteString.Base64.Lazy 891 0 1.4 1.4 14.5 8.1
decodeLenient Data.ByteString.Base64 897 500 0.0 0.0 13.0 6.7
....
+RTS -p -h and hp2ps show me the following picture and two numbers: 114K in the header and something around 1.8Mb on the graph.
And, just in case, here is the app:
module Main where
import Network.Wreq
import Control.Lens
import Data.Aeson.Lens
import Control.Monad
main :: IO ()
main = replicateM_ 10 g
where
g = do
r <- get "http://httpbin.org/get"
print $ r ^. responseBody
. key "headers"
. key "User-Agent"
. _String
UPDATE 1: Thank everyone for incredible good responses. As was suggested, I add +RTS -s output, so the entire picture builds up for everyone who read it.
#./simple-wreq +RTS -s
...
128,875,432 bytes allocated in the heap
32,414,616 bytes copied during GC
2,394,888 bytes maximum residency (16 sample(s))
355,192 bytes maximum slop
7 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 194 colls, 0 par 0.018s 0.022s 0.0001s 0.0022s
Gen 1 16 colls, 0 par 0.027s 0.031s 0.0019s 0.0042s
UPDATE 2: The size of the executable:
#du -h simple-wreq
63M simple-wreq
A man with a watch knows what time it is. A man with two watches is never sure.
Ah, but what do does two watches show? Are both meant to show the current time in UTC? Or is one of them supposed to show the time in UTC, and the other one the time on a certain point on Mars? As long as they are in sync, the second scenario wouldn't be a problem, right?
And that is exactly what is happening here. You compare different memory measurements:
the maximum residency
the total amount of allocated memory
The maximum residency is the highest amount of memory your program ever uses at a given time. That's 19MB. However, the total amount of allocated memory is a lot more, since that's how GHC works: it "allocates" memory for objects that are garbage collected, which is almost everything that's not unpacked.
Let us inspect a C example for this:
int main() {
int i;
char * mem;
for(i = 0; i < 5; ++i) {
mem = malloc(19 * 1000 * 1000);
free(mem);
}
return 0;
}
Whenever we use malloc, we will allocate 19 megabytes of memory. However, we free the memory immediately after. The highest amount of memory we ever have at one point is therefore 19 megabytes (and a little bit more for the stack and the program itself).
However, in total, we allocate 5 * 19M, 95M total. Still, we could run our little program with just 20 megs of RAM fine. That's the difference between total allocated memory and maximum residency. Note that the residency reported by time is always at least du <executable>, since that has to reside in memory too.
That being said, the easiest way to generate statistics is -s, which will show how what was the maximum residency from the Haskell's program point of view. In your case, it will be the 1.9M, the number in your heap profile (or double the amount due to profiling). And yeah, Haskell executables tend to get extremely large, since libraries are statically linked.
time -l is displaying the (resident, i.e. not swapped out) size of the process as seen by the operating system (obviously). This includes twice the maximum size of the Haskell heap (due to the way that GHC's GC works), plus anything else allocated by the RTS or other C libraries, plus the code of your executable itself plus the libraries it depends on, etc. I'm guessing in this case the primary contributor to the 19M is the size of your exectuable.
total alloc is the total amount allocated onto the Haskell heap. It is not at all a measure of maximum heap size (which is what people usually mean by "how much memory is my program using"). Allocation is very cheap and allocation rates of around 1GB/s are typical for a Haskell program.
The number in the header of the hp2ps output "114,272 bytes x seconds" is something completely different again: it is the integral of the graph, and is measured in bytes * seconds, not in bytes. For example if your program holds onto a 10 MB structure for 4 seconds then that will cause this number to increase by 40 MB*s.
The number around 1.8 MB shown in the graph is the actual maximum size of the Haskell heap, which is probably the number you're most interested in.
You've omitted the most useful source of numbers about your program's execution, which is running it with +RTS -s (this doesn't even require it to have been built with profiling).
A file is mapped with the the system call:
mmap(65536, 32768, READ, FLAGS, fd, 0)
Pages are 8 KB, so 4 pages worth of the file were mapped (32768/8k = 4 pages). Then the following call is carried out:
munmap(65536, 8192)
Which removes the specified part of the memory map. Which bytes of the file remain mapped? The answer key says that pages 2 and 3 remain, so only bytes 16384 through 32767 remain; however, I'm not sure this is right. Since the len argument (second arg) of the munmap command is 8192 byes, shouldn't only page 0 be removed, leaving bytes 8192 through 32767?
Both you and the answer key are wrong, but in different ways.
Memory pages on most systems are 4 KB (4096 bytes), not 8 KB. I have never heard of a system with 8 KB memory pages.
This makes the entire mapping of 32768 bytes come out to 8 pages. Unmapping the first 8192 bytes (2 pages) would leave the remaining 6 pages (2 through 8) in place.
I have computer with 2 Intel Xeon CPUs and 48 GB of RAM. RAM is divided between CPUs - two parts 24 GB + 24 GB. How can I check how much of each specific part is used?
So, I need something like htop, which shows how fully each core is used (see this example), but rather for memory than for cores. Or something that would specify which part (addresses) of memory are used and which are not.
The information is in /proc/zoneinfo, contains very similar information to /proc/vmstat except broken down by "Node" (Numa ID). I don't have a NUMA system here to test it for you and provide a sample output for a multi-node config; it looks like this on a one-node machine:
Node 0, zone DMA
pages free 2122
min 16
low 20
high 24
scanned 0
spanned 4096
present 3963
[ ... followed by /proc/vmstat-like nr_* values ]
Node 0, zone Normal
pages free 17899
min 932
low 1165
high 1398
scanned 0
spanned 223230
present 221486
nr_free_pages 17899
nr_inactive_anon 3028
nr_active_anon 0
nr_inactive_file 48744
nr_active_file 118142
nr_unevictable 0
nr_mlock 0
nr_anon_pages 2956
nr_mapped 96
nr_file_pages 166957
[ ... more of those ... ]
Node 0, zone HighMem
pages free 5177
min 128
low 435
high 743
scanned 0
spanned 294547
present 292245
[ ... ]
I.e. a small statistic on the usage/availability total followed by the nr_* values also found on a system-global level in /proc/vmstat (which then allow a further breakdown as of what exactly the memory is used for).
If you have more than one memory node, aka NUMA, you'll see these zones for all nodes.
edit
I'm not aware of a frontend for this (i.e. a numa vmstat like htop is a numa-top), but please comment if anyone knows one !
The numactl --hardware command will give you a short answer like this:
node 0 cpus: 0 1 2 3 4 5
node 0 size: 49140 MB
node 0 free: 25293 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 49152 MB
node 1 free: 20758 MB
node distances:
node 0 1
0: 10 21
1: 21 10