Game Boy: What is the purpose of instructions that don't modify anything (e.g. AND A)? - emulation

I've been working on a Game Boy emulator, and I've noticed that there are certain opcodes that exist that would never change any values, such as LD A, A, LD B, B, etc. and also AND A. The first ones obviously don't change anything as they load the value of registers into the same registers, and since the AND is being compared with the A register, AND A will always return A. Is there any purpose for these operations, or are the essentially the same as NOP after each cycle?

As Jeffrey Bosboom and Hans Passant pointed out on their comments, the reason is simplicity. More specifically hardware simplicity.
LD r,r' instructions copy the content of source register (r') to destination register (r). LD r,r' opcodes follow this form:
-------------------------------
BIT | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
-------------------------------
OPCODE | 0 | 1 | r | r' |
-------------------------------
Destination and source registers can assume these values:
-----------
| BIT | REG |
-----------
| 111 | A |
-----------
| 000 | B |
-----------
| 001 | C |
-----------
| 010 | D |
-----------
| 011 | E |
-----------
| 100 | H |
-----------
| 101 | L |
-----------
In order to implement these instructions in hardware we just need a multiplexer that receives bits 0-2 to select the source register and another multiplexer that receives bits 3-5 to select the destination register.
If you want to verify if bits 0-2 and bits 3-5 are pointing to the same register you would have to add more logic to the CPU. And as we all know, ressources were more limited in the 80's :P
Please note that loading instructions such as LD A,A, LD B,B, LD C,C, LD D,D, LD E,E, LD H,H, and LD L,L behave like NOP. However AND A and OR A DO NOT behave like NOP, since they affect the flag register, and their execution might change the internal machine state.

Instructions like LD A,A and AND A may appear to be NOPs but they might also change the processor flags and be used for testing the value of a register.
Be sure to check the instruction set documentation carefully for such side effects.

There is actually purpose in AND A (as well as OR A) instruction -- it sets flag Z when A is zero and clears otherwise. So both AND A and OR A are frequently used for this purpose.

Related

Perf output strange memory addresses

I was using perf on a profiling work. But I got some problems:
a lot of weird output memory addresses
there is some addresses from user not translate to symbols
I compiled the problem with -fno-omit-frame-pointer ... and -g, but still got this problem.
Can anyone help to have a look? How to fix these two problems?
The perf output is:
9.28% gserver gserver [.] 0x000000000013bb20
2.36% gserver libpthread-2.19.so [.] pthread_mutex_lock
|
--- pthread_mutex_lock
|
|--28.31%-- 0x0
| |
| |--38.16%-- 0x3
| |
| |--37.72%-- 0x0
| | |
| | |--90.05%-- 0x25
| | | |
| | | |--53.41%-- 0x100000001
| | | | std::_Sp_counted_ptr<Buffer*, (__gnu_cxx::_Lock_policy)2>::~_Sp_counted_ptr()
| | | | 0x1f0fc35de58948
It doesn't matter, because those are in library code, which you didn't build, and you can't fix.
You can see it's spending 2.63% of its time in a mutex_lock, meaning it's waiting for something.
That's insignificant.
I assume you're looking for significant stuff.
I use this technique.

How to generate a table in Haddock documentation

I am writing some documentation with Haddock, and I need to put a multi-column table with some values in it. How can I do it with Haddock? I cannot find info about it. Embedding some html as alternative looks no possible too.
Haddock bundled with GHC 8.4 or newer (Haddock version >= 2.18.2) supports tables. As per the pull request where this was added, the syntax is based on RST Grid tables.
Sample use:
module Sample where
-- | A table:
--
-- +------------------------+------------+----------+----------+
-- | Header row, column 1 | Header 2 | Header 3 | Header 4 |
-- | (header rows optional) | | | |
-- +========================+============+==========+==========+
-- | body row 1, column 1 | column 2 | column 3 | column 4 |
-- +------------------------+------------+----------+----------+
-- | body row 2 | Cells may span columns. |
-- +------------------------+------------+---------------------+
-- | body row 3 | Cells may | \[ |
-- +------------------------+ span rows. | f(n) = \sum_{i=1} |
-- | body row 4 | | \] |
-- +------------------------+------------+---------------------+
sample :: ()
sample = ()
Turns into
Haddock "markup" doesn't currently support tables, see also Haddock User Guide - Chapter 3. Documentation and Markup. There is an open issue to add support for simple tables.

How do I find value of register and flags in Assembler?

Programmcode | Zero-Flag | Sign-Flag | Register A | Register HL
| 0 | 0 | 00h | 00 00h
--------------|-----------|-----------|------------|------------
MOV HL, 00ffh | | | |
DEC HL | | | |
ADD 81h | | | |
CP A | | | |
SUB 02h | | | |
I have part of program in MC8-Assembler (The CPU of the MC8 is formed by the 8bit processor Zilog Z80 of the Training Board) What value is going to have flag after execution and whatvalue is being found in register? Values before execution are given in first column.
Can someone do it + write an explanation?
What Intel calls MOV, Zilog calls LD; otherwise you can work out the answer by looking at e.g. this instruction table.
MOV HL, 00ffh is LD HL, 00ffh in Zilog terms, so look up the appropriate LD HL. It's instruction 0x21 and it tells you that it loads the value into HL and doesn't affect any flags. So that's row one sorted. DEC HL over at 0x2b also doesn't affect any flags (which almost always catches me out, for the record) but ADD A,* does so that's where sign and zero might change, depending on what you think happens to A.
Just look up each instruction in turn, see what it does, do that thing, then consider what the flags will be if that's an instruction that affects the flags.

Flipped switch statements?

Consider you have 10 boolean variables of which only one can be true at a time, and each time any one is 'switched on', all others must be 'turned off'. One of the problems that immediately arises is;
How can you quickly test which variable is true without necessarily
having to linearly check all the variable states each time?
For this, I was thinking if it was possible to have something like:
switch(true)
{
case boolean1:
//do stuff
...
//other variables
}
This looks like a bad way of testing for 10 different states of an object, but I think there're cases where this kind of feature may prove useful and would like to know if there's any programming language that supports this kind of feature?
There isn't a language feature that offers this behavior. But as an alternative, you could use the Command Pattern, in conjunction with a Priority Queue. This assumes that you would be able to prioritize what checks should be done.
Traditionally, when you have such radio button boolean values you use an integer to represent them:
+------------+---------+--------------------+
| BINARY | DECIMAL | BINARY-LOGARITHMIC |
+------------+---------+--------------------+
| 0000000001 | 1 | 0 |
| 0000000010 | 2 | 1 |
| 0000000100 | 4 | 2 |
| 0000001000 | 8 | 3 |
| 0000010000 | 16 | 4 |
| 0000100000 | 32 | 5 |
| 0001000000 | 64 | 6 |
| 0010000000 | 128 | 7 |
| 0100000000 | 256 | 8 |
| 1000000000 | 512 | 9 |
+------------+---------+--------------------+
Let's call the variable holding this boolean value flag. We can quickly jump to some code based on the flag by indexing a random access array of functions:
var functions = [ function0
, function1
, function2
, function3
, function4
, function5
, function6
, function7
, function8
, function9
];
functions[flag](); // quick jump
However, you will have to pay for the function call overhead.

how is a memory barrier in linux kernel is used

There is an illustration in kernel source Documentation/memory-barriers.txt, like this:
CPU 1 CPU 2
======================= =======================
{ B = 7; X = 9; Y = 8; C = &Y }
STORE A = 1
STORE B = 2
<write barrier>
STORE C = &B LOAD X
STORE D = 4 LOAD C (gets &B)
LOAD *C (reads B)
Without intervention, CPU 2 may perceive the events on CPU 1 in some
effectively random order, despite the write barrier issued by CPU 1:
+-------+ : : : :
| | +------+ +-------+ | Sequence of update
| |------>| B=2 |----- --->| Y->8 | | of perception on
| | : +------+ \ +-------+ | CPU 2
| CPU 1 | : | A=1 | \ --->| C->&Y | V
| | +------+ | +-------+
| | wwwwwwwwwwwwwwww | : :
| | +------+ | : :
| | : | C=&B |--- | : : +-------+
| | : +------+ \ | +-------+ | |
| |------>| D=4 | ----------->| C->&B |------>| |
| | +------+ | +-------+ | |
+-------+ : : | : : | |
| : : | |
| : : | CPU 2 |
| +-------+ | |
Apparently incorrect ---> | | B->7 |------>| |
perception of B (!) | +-------+ | |
| : : | |
| +-------+ | |
The load of X holds ---> \ | X->9 |------>| |
up the maintenance \ +-------+ | |
of coherence of B ----->| B->2 | +-------+
+-------+
: :
I don't understand, since we have a write barrier, so, any store must take effect when C = &B is executed, which means whence B would equals 2. For CPU 2, B should have been 2 when it gets the value of C, which is &B, why would it perceive B as 7. I am really confused.
The key missing point is the mistaken assumption that for the sequence:
LOAD C (gets &B)
LOAD *C (reads B)
the first load has to precede the second load. A weakly ordered architectures can act "as if" the following happened:
LOAD B (reads B)
LOAD C (reads &B)
if( C!=&B )
LOAD *C
else
Congratulate self on having already loaded *C
The speculative "LOAD B" can happen, for example, because B was on the same cache line as some other variable of earlier interest or hardware prefetching grabbed it.
From the section of the document titled "WHAT MAY NOT BE ASSUMED ABOUT MEMORY BARRIERS?":
There is no guarantee that any of the memory accesses specified before a
memory barrier will be complete by the completion of a memory barrier
instruction; the barrier can be considered to draw a line in that CPU's
access queue that accesses of the appropriate type may not cross.
and
There is no guarantee that a CPU will see the correct order of effects
from a second CPU's accesses, even if the second CPU uses a memory
barrier, unless the first CPU also uses a matching memory barrier (see
the subsection on "SMP Barrier Pairing").
What memory barriers do (in a very simplified way, of course) is make sure neither the compiler nor in-CPU hardware perform any clever attempts at reordering load (or store) operations across a barrier, and that the CPU correctly perceives changes to the memory made by other parts of the system. This is necessary when the loads (or stores) carry additional meaning, like locking a lock before accessing whatever it is we're locking. In this case, letting the compiler/CPU make the accesses more efficient by reordering them is hazardous to the correct operation of our program.
When reading this document we need to keep two things in mind:
That a load means transmitting a value from memory (or cache) to a CPU register.
That unless the CPUs share the cache (or have no cache at all), it is possible for their cache systems to be momentarily our of sync.
Fact #2 is one of the reasons why one CPU can perceive the data differently from another. While cache systems are designed to provide good performance and coherence in the general case, but might need some help in specific cases like the ones illustrated in the document.
In general, like the document suggests, barriers in systems involving more than one CPU should be paired to force the system to synchronize the perception of both (or all participating) CPUs. Picture a situation in which one CPU completes loads or stores and the main memory is updated, but the new data had yet to be transmitted to the second CPU's cache, resulting in a lack of coherence across both CPUs.
I hope this helps. I'd suggest reading memory-barriers.txt again with this in mind and particularly the section titled "THE EFFECTS OF THE CPU CACHE".

Resources