unmarked block header after Garbage Collection sweep phase

unmarked block header after Garbage Collection sweep phase - garbage-collection

I'm new to Garbage Collection, just a question on unmarked block header after sweep. Below is from my textbook:
Initially, the heap in the Figure consists of six allocated blocks, each of which
is unmarked. Block 3 contains a pointer to block 1. Block 4 contains pointers
to blocks 3 and 6. The root points to block 4. After the mark phase, blocks 1,
3, 4, and 6 are marked because they are reachable from the root. Blocks 2 and
5 are unmarked because they are unreachable. After the sweep phase, the two
unreachable blocks are reclaimed to the free list.
But I don't know what's the status of block 3 and block 6, why they are not marked as "unmarked" just like block 1?

Related

Why is the size of a tuple or struct not the sum of the members?

assert_eq!(12, mem::size_of::<(i32, f64)>()); // failed
assert_eq!(16, mem::size_of::<(i32, f64)>()); // succeed
assert_eq!(16, mem::size_of::<(i32, f64, i32)>()); // succeed
Why is it not 12 (4 + 8)?
Does Rust have special treatment for tuples?

Why is it not 12 (4 + 8)? Does Rust have special treatment for tuples?
No. A regular struct can (and does) have the same "problem".
The answer is padding: on a 64-bit system, an f64 should be aligned to 8 bytes (that is, its starting address should be a multiple of 8). A structure normally has the alignment of its most constraining (largest-aligned) member, so the tuple has an alignment of 8.
This means your tuple must start at an address that's a multiple of 8, so the i32 starts at a multiple of 8, ends on a multiple of 4 (as it's 4 bytes), and the compiler adds 4 bytes of padding so the f64 is properly aligned:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
[ i32 ] padding [ f64 ]
"But wait", you shout, "if I reverse the fields of my tuple the size doesn't change!".
That's true: the schema above is not accurate because by default rustc will reorder your fields to compact structures, so it will really do this:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
[ f64 ] [ i32 ] padding
which is why your third attempt is 16 bytes:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
[ f64 ] [ i32 ] [ i32 ]
rather than 24:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
[ 32 ] padding [ f64 ] [ 32 ] padding
"Hold your horses" you say, keen eyed that you are, "I can see the alignment for the f64, but then why is there padding at the end? There's no f64 there!"
Well that's so the computer has an easier time with sequences: a struct with a given alignment should also have a size that's a multiple of its alignment, this way when you have multiple of them:
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
[ f64 ] [ i32 ] padding [ f64 ] [ i32 ] padding
they're properly aligned and the computations of how to lay the next one are simple (just offset by the size of the struct), it also avoids putting this information everywhere. Basically, an array / vec is never itself padded, instead the padding is in the struct it stores. This allows packing to be a struct property and not infect arrays as well.
Using the repr(C) attribute, you can tell Rust to lay your structures in exactly the order you gave (it's not an option for tuples FWIW).
That is safe, and while it is not usually useful there are some edge-cases where it's important, those I know of (there are probably others) are:
Interfacing with foreign (FFI) code, which expects a very specific layout, that is in fact the origin of the flag's name (it makes Rust behave like C).
Avoiding false sharing in high-performance code.
You can also tell rustc to not pad the structure using repr(packed).
That is much riskier, it will generally degrade performances (most CPUs are rather cross with unaligned data) and might crash the program or return the wrong data entirely on some architectures. That is highly dependent on the CPU architecture, and on the system (OS) running on it: per the kernel's Unaligned Memory Accesses document
Some architectures are able to perform unaligned memory accesses
transparently, but there is usually a significant performance cost.
Some architectures raise processor exceptions when unaligned accesses
happen. The exception handler is able to correct the unaligned access,
at significant cost to performance.
Some architectures raise processor exceptions when unaligned accesses
happen, but the exceptions do not contain enough information for the
unaligned access to be corrected.
Some architectures are not capable of unaligned memory access, but will
silently perform a different memory access to the one that was requested,
resulting in a subtle code bug that is hard to detect!
So "Class 1" architectures will perform the correct accesses, possibly at a performance cost.
"Class 2" architectures will perform the correct accesses, at a high performance cost (the CPU needs to call into the OS, and the unaligned access is converted into an aligned access in software), assuming the OS handles that case (it doesn't always in which case this resolves to a class 3 architecture).
"Class 3" architectures will kill the program on unaligned accesses (since the system has no way to fix it up.
"Class 4" will perform nonsense operations on unaligned accesses and are by far the worst.
An other common pitfall or unaligned access is that they tend to be non-atomic (since they need to expand into a sequence of aligned memory operations and manipulations of those), so you can get "torn" reads or writes even for otherwise atomic accesses.

While #Masklinn already provided a general answer that it's due to alignment, here's The Rust Reference:
Size and Alignment
All values have an alignment and size.
The alignment of a value specifies what addresses are valid to store the value at. A value of alignment n must only be stored at an address that is a multiple of n. For example, a value with an alignment of 2 must be stored at an even address, while a value with an alignment of 1 can be stored at any address. Alignment is measured in bytes, and must be at least 1, and always a power of 2. The alignment of a value can be checked with the align_of_val function.
The size of a value is the offset in bytes between successive elements in an array with that item type including alignment padding. The size of a value is always a multiple of its alignment. The size of a value can be checked with the size_of_val function.
[...]
– The Rust Reference - Type Layout - Size and Alignment
(emphasis mine)

Bounded Buffer problem using semaphores in Linux

I'm trying to understand the bounded buffer problem(consumer/producer) more clearly, As I know one of the solutions to this problem is using 3 semaphores
1. FULL-which holds full places in the array
2. EMPTY-which holds the available places in the array
3. MUTEX-which holds number 1 or 0
could it be possible to explain further what does that mean for example if the number of FULL is negative? or the number of empty is negative?
Could that be that mutex will not be 1 or 0? if so what does that mean?

Stress-ng: RAM testing commands

Stress-ng: Can we test RAM using stress-ng? What are the commands used to test RAM on a MIPS 32 device?

There are many memory based stressors in stress-ng:
stress-ng --class memory?
class 'memory' stressors: atomic bsearch context full heapsort hsearch
lockbus lsearch malloc matrix membarrier memcpy memfd memrate memthrash
mergesort mincore null numa oom-pipe pipe qsort radixsort remap
resources rmap stack stackmmap str stream tlb-shootdown tmpfs tsearch
vm vm-rw wcs zero zlib
Alternatively, one can also use VM based stressors too:
stress-ng --class vm?
class 'vm' stressors: bigheap brk madvise malloc mlock mmap mmapfork mmapmany
mremap msync shm shm-sysv stack stackmmap tmpfs userfaultfd vm vm-rw
vm-splice
I suggest looking at the vm stressor first as this contains a large range of stressor methods that exercise memory patterns and can possibly find broken memory:
-m N, --vm N
start N workers continuously calling mmap(2)/munmap(2) and writ‐
ing to the allocated memory. Note that this can cause systems to
trip the kernel OOM killer on Linux systems if not enough physi‐
cal memory and swap is not available.
--vm-bytes N
mmap N bytes per vm worker, the default is 256MB. One can spec‐
ify the size as % of total available memory or in units of
Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g.
--vm-ops N
stop vm workers after N bogo operations.
--vm-hang N
sleep N seconds before unmapping memory, the default is zero
seconds. Specifying 0 will do an infinite wait.
--vm-keep
do not continually unmap and map memory, just keep on re-writing
to it.
--vm-locked
Lock the pages of the mapped region into memory using mmap
MAP_LOCKED (since Linux 2.5.37). This is similar to locking
memory as described in mlock(2).
--vm-madvise advice
Specify the madvise 'advice' option used on the memory mapped
regions used in the vm stressor. Non-linux systems will only
have the 'normal' madvise advice, linux systems support 'dont‐
need', 'hugepage', 'mergeable' , 'nohugepage', 'normal', 'ran‐
dom', 'sequential', 'unmergeable' and 'willneed' advice. If this
option is not used then the default is to pick random madvise
advice for each mmap call. See madvise(2) for more details.
--vm-method m
specify a vm stress method. By default, all the stress methods
are exercised sequentially, however one can specify just one
method to be used if required. Each of the vm workers have 3
phases:
1. Initialised. The anonymously memory mapped region is set to a
known pattern.
2. Exercised. Memory is modified in a known predictable way.
Some vm workers alter memory sequentially, some use small or
large strides to step along memory.
3. Checked. The modified memory is checked to see if it matches
the expected result.
The vm methods containing 'prime' in their name have a stride of
the largest prime less than 2^64, allowing to them to thoroughly
step through memory and touch all locations just once while also
doing without touching memory cells next to each other. This
strategy exercises the cache and page non-locality.
Since the memory being exercised is virtually mapped then there
is no guarantee of touching page addresses in any particular
physical order. These workers should not be used to test that
all the system's memory is working correctly either, use tools
such as memtest86 instead.
The vm stress methods are intended to exercise memory in ways to
possibly find memory issues and to try to force thermal errors.
Available vm stress methods are described as follows:
Method Description
all iterate over all the vm stress methods
as listed below.
flip sequentially work through memory 8
times, each time just one bit in memory
flipped (inverted). This will effec‐
tively invert each byte in 8 passes.
galpat-0 galloping pattern zeros. This sets all
bits to 0 and flips just 1 in 4096 bits
to 1. It then checks to see if the 1s
are pulled down to 0 by their neighbours
or of the neighbours have been pulled up
to 1.
galpat-1 galloping pattern ones. This sets all
bits to 1 and flips just 1 in 4096 bits
to 0. It then checks to see if the 0s
are pulled up to 1 by their neighbours
or of the neighbours have been pulled
down to 0.
gray fill the memory with sequential gray
codes (these only change 1 bit at a time
between adjacent bytes) and then check
if they are set correctly.
incdec work sequentially through memory twice,
the first pass increments each byte by a
specific value and the second pass
decrements each byte back to the origi‐
nal start value. The increment/decrement
value changes on each invocation of the
stressor.
inc-nybble initialise memory to a set value (that
changes on each invocation of the stres‐
sor) and then sequentially work through
each byte incrementing the bottom 4 bits
by 1 and the top 4 bits by 15.
rand-set sequentially work through memory in 64
bit chunks setting bytes in the chunk to
the same 8 bit random value. The random
value changes on each chunk. Check that
the values have not changed.
rand-sum sequentially set all memory to random
values and then summate the number of
bits that have changed from the original
set values.
read64 sequentially read memory using 32 x 64
bit reads per bogo loop. Each loop
equates to one bogo operation. This
exercises raw memory reads.
ror fill memory with a random pattern and
then sequentially rotate 64 bits of mem‐
ory right by one bit, then check the
final load/rotate/stored values.
swap fill memory in 64 byte chunks with ran‐
dom patterns. Then swap each 64 chunk
with a randomly chosen chunk. Finally,
reverse the swap to put the chunks back
to their original place and check if the
data is correct. This exercises adjacent
and random memory load/stores.
move-inv sequentially fill memory 64 bits of mem‐
ory at a time with random values, and
then check if the memory is set cor‐
rectly. Next, sequentially invert each
64 bit pattern and again check if the
memory is set as expected.
modulo-x fill memory over 23 iterations. Each
iteration starts one byte further along
from the start of the memory and steps
along in 23 byte strides. In each
stride, the first byte is set to a ran‐
dom pattern and all other bytes are set
to the inverse. Then it checks see if
the first byte contains the expected
random pattern. This exercises cache
store/reads as well as seeing if neigh‐
bouring cells influence each other.
prime-0 iterate 8 times by stepping through mem‐
ory in very large prime strides clearing
just on bit at a time in every byte.
Then check to see if all bits are set to
zero.
prime-1 iterate 8 times by stepping through mem‐
ory in very large prime strides setting
just on bit at a time in every byte.
Then check to see if all bits are set to
one.
prime-gray-0 first step through memory in very large
prime strides clearing just on bit
(based on a gray code) in every byte.
Next, repeat this but clear the other 7
bits. Then check to see if all bits are
set to zero.
prime-gray-1 first step through memory in very large
prime strides setting just on bit (based
on a gray code) in every byte. Next,
repeat this but set the other 7 bits.
Then check to see if all bits are set to
one.
rowhammer try to force memory corruption using the
rowhammer memory stressor. This fetches
two 32 bit integers from memory and
forces a cache flush on the two
addresses multiple times. This has been
known to force bit flipping on some
hardware, especially with lower fre‐
quency memory refresh cycles.
walk-0d for each byte in memory, walk through
each data line setting them to low (and
the others are set high) and check that
the written value is as expected. This
checks if any data lines are stuck.
walk-1d for each byte in memory, walk through
each data line setting them to high (and
the others are set low) and check that
the written value is as expected. This
checks if any data lines are stuck.
walk-0a in the given memory mapping, work
through a range of specially chosen
addresses working through address lines
to see if any address lines are stuck
low. This works best with physical mem‐
ory addressing, however, exercising
these virtual addresses has some value
too.
walk-1a in the given memory mapping, work
through a range of specially chosen
addresses working through address lines
to see if any address lines are stuck
high. This works best with physical mem‐
ory addressing, however, exercising
these virtual addresses has some value
too.
write64 sequentially write memory using 32 x 64
bit writes per bogo loop. Each loop
equates to one bogo operation. This
exercises raw memory writes. Note that
memory writes are not checked at the end
of each test iteration.
zero-one set all memory bits to zero and then
check if any bits are not zero. Next,
set all the memory bits to one and check
if any bits are not one.
--vm-populate
populate (prefault) page tables for the memory mappings; this
can stress swapping. Only available on systems that support
MAP_POPULATE (since Linux 2.5.46).
So to run 1 vm stressor that uses 75% of memory using all the vm stressors with verification for 10 minutes with verbose mode enabled, use:
stress-ng --vm 1 --vm-bytes 75% --vm-method all --verify -t 10m -v

How to divide block in piece when they overlap

Some input I'm looking to build a simple minimal bittorrent client.
I reading the protocol spec for a 2-3 days now.
here what my understanding on it thus far . Assuming that torrent has a piece length of 26000 bytes and according to non official spec block size is 16384. Something like this.
Now upon request of a block of piece message would look like this
piece 0
block offset 0
block length 16484
So far so good.
Now, for next block which overlap in piece 0 and 1 what should the request look like
piece 0 ## since the start of byte is in piece 0 use piece 0 instead of piece 1
block offset 16384
block length 16384
Now on the receiving end I need to recreate the piece of 26000 bytes so that I can compare that with pieces (hash) to match the piece for correctness.
Is my understanding correct ?
Also I'm let suppose the piece verification failed and may be it because of the first block i.e Block 0 (which is faulty or corrupt)
then I should requeue Block 0 and Block 1 (which was valid btw and also a part of piece 1) to retransmit again.
And now suddenly the piece and block distribution become a bit complex then what I assume it be. and I hoping there is a simpler solution to this.
Any thought

Will use the more distinct term 'chunk' instead of the ambiguous 'block'.
A torrent is divided into pieces.
A piece is divided into chunks.
A chunk is cut from one piece.
A torrent is divided into pieces when it's created. With the Request message, a piece is in turn further divided into chunks by the downloading BitTorrent client.
How the client cut the chunks out from a piece doesn't matter, as long as no single chunk is larger than 16 KB (16384 bytes).
The simplest and most rational way to divide a piece, is to do it in as few chunks as possible, by dividing it in 16 KB chunks and let the last chunk of the piece be smaller if necessary.
The Request message format: <len=0013><id=6><Piece_index><Chunk_offset><Chunk_length>
<Piece_index > integer specifying the zero-based piece index
<Chunk_offset> integer specifying the zero-based byte offset within the piece
<Chunk_length> integer specifying the requested number of bytes
When requesting a chunk:
the whole chunk must be within the piece specified by the Piece_index,
ie Chunk_offset+Chunk_length must be less or equal to the size of that specific piece*.
the Chunk_length can not be larger than 16 KB (16384 bytes) and must be at least 1 byte
the peer that get the request must have the piece specified by the Piece_index
If any of the conditions is not met, the peer receiving the request will close the connection.
* For all pieces except the very last one that is the 'piece length' defined in the info-dictionary.
The size of the last piece can by calculated as:
size_last_piece = size_of_torrent - (number_of_pieces - 1) * 'piece length'

The maximum block size commonly accepted by clients is 16KiB. Clients are free to make smaller requests.
Pieces are commonly a multiple of 16KiB, but the current spec does not require it (this changes with BEP52) and some people use prime numbers or similar things for fun, so they do exist in the wild.
Blocks only exist in the sense that you need multiple requests to get a complete piece that is larger than 16KiB. In other words, blocks are the same thing as whatever you decide to request. You could request 500 bytes, then 1017 bytes and then 13016 bytes, ... until you got a complete piece. They are arbitrary subdivisions within a piece - there is no overlap - that you need to keep track of between the start of downloading a piece and finishing the piece.
They do not participate in hashing, they do not factor into the HAVE or BITFIELD messages. Only REQUEST, PIECE, CANCEL and REJECT messages concern themselves with blocks. And instead of blocks you could also call them sub-piece offset-length tuples or something to that effect.

Last block in a piece may be smaller than the transfer block size. I.e. 26000 - 16384 = 9616 bytes should be requested in the second PIECE message. As soon as all 26000 bytes have been received, SHA-1 hash should be calculated and compared with the corresponding checksum from the pieces section of metainfo dictionary. If the checksum does not match, you have no means to know which block contained invalid data and should re-download all blocks from this piece.
My advice would be not to depend on some particular partitioning of the piece, because:
1) peers may use a different transfer block size when requesting data
2) SHA-1 algorithm is block-based, and the digester better use a bigger block size (otherwise calculations will take more time)
A proper abstraction for a piece would be a generic data range with the following methods:
read(from:int, length:int):byte[]
write(offset:int, block:byte[]):()
Then you'll be able to read/write arbitrary subranges of data.

FIO Flexible IO tester for repetitive data access patterns

I am currently working on a project and I need to test my prototype with repetitive data access patterns. I came across fio which is a flexible I/O tester for Linux (1).
Fio has many options and I want it to produce a workload which accesses the same blocks of a file, the same number of times over and over again. I also need those accesses to not be equal among these blocks. For instance, if fio creates a file named "test.txt"
and this file is divided on 10 blocks, I need the workload to read a specific number of these blocks, with different number of IOs each, over and over again. Let's say that it chooses to access block 3, 7 and 9. Then I want to access these in a specific order and a specific number of times each, over and over again. If this workload can be described by N passes, then I want to be something like this:
1st pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
2nd pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
...
N-pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
Question 1: Can the above workload be produced with Fio? If yes, How?
Question 2: Is there a mailing list, forum, website, community for Fio users?
Thank you,
Nick

http://www.spinics.net/lists/fio/index.html This is the website you can follow mailing list.
http://www.bluestop.org/fio/HOWTO.txt link will also help you.

This is actually quite a tricky thing to do. The closest you'll get with parameters is using one of the non-uniform distributions (see random_distribution in the HOWTO) but you'll be saying re-read blocks A, B, C more than blocks X, Y, Z and you won't be able to control the exact counts.
An alternative is to write an iolog that can be replayed that has the exact sequence you're looking for (see Trace file format v2 in the HOWTO).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string