Is there a way to programmatically determine addressing mode from an opcode for the 6502? - emulation

I.e. are the different addressing modes encoded somehow in the opcodes? Can they be extracted programmatically or does this info only exist in the documentation of the 6502? I'm writing an emulator and I'm not concerned with performance. It would be nice to have a function that takes an opcode and returns the addressing mode, if possible.
So far I've not come across any indication that there's a pattern in the codes, except that all zero page instructions seem to have their third bit set.

Yes there is. The addressing mode is encoded in 3 bits at positions 4-2 in the opcode byte (i.e. xxxAAAxx). Decoding the addressing mode is dependent on the other bits, but they conform to a regular pattern (mostly) which can be dropped through a lookup table to determine the mode for each instruction type.
This page has a full description of the various patterns and how to decode in each case.

Related

Is there a way to program do not care bits in python for an embedded system?

I am writing a driver for a DDS(AD9959, analog devices) chip in python. Some of the modes written in the datasheet require writing the register bits as do not care. What is the correct way to implement the same? I tried to use 'x', but the bits are not being written on the chip.
The term "don't care" does not mean some special value for a bit. A bit can only be 0 or 1.
When reading it means that you can ignore the value. Commonly you mask it away.
When writing it means that you are free to choose any value. Most developers write a 0.

How do text editors store data above 1 byte?

The basic question is, how does notepad (or other basic text editors) store data. I ran into this because I was trying to compare file size of different compression techniques, and realized something isn't quite right.
To elaborate..
If I save a text file with the following contents:
a
The file is 1 byte. This one happens to be 97, or 0x61.
I create a text file with the following contents:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Which is all the characters from 0-255, or 0x00 to 0xFF.
This file is 256 bytes. 1 byte for each character. This makes sense to me.
Then I append the following character to the end of the above string.
†
A character not contained in the above string. All 8 bit characters were already used. This character is 8224, or 0x2020. A 2 bytes character.
And yet, the file size has only changed from 256 to 257 bytes. In fact, the above character saved by itself only shows 1 byte.
What am I missing?
Edit: Please note that in the second text block, many of the characters do not show up on here.
In ANSI encoding (This 8-bit Microsoft-specific encoding), you save each character in one byte (8-bit).
ANSI also called Windows-1252, or Windows Latin-1
You should have a look at ANSI table in ANSI Character Codes Chart or Windows-1252
So for † character, its code is 134, byte 0x86.
Using one byte to encode a character only makes sense on the surface. Works okay if you speak English, it is a fair disaster is you speak Chinese or Japanese. Unicode today has definitions for 110,187 typographic symbols with room to grow up to 1.1 million. A byte is not a good way to store a Unicode symbol since it can encode only 256 distinct values.
Accordingly, text editors must always encode text when they store it to a file. Encoding is required to map 110,187 values onto a byte-oriented storage medium. Inevitably that takes more than 1 byte per character if you speak Chinese.
There have been lots and lots of encoding schemes in common use. Popular in the previous century were code pages, a scheme that uses a character set. A language-specific mapping that tries as hard as it can to need only 1 byte of storage per character by picking 256 characters that are likely to be needed in the language. Japanese, Korean and Chinese used a multi-byte mapping because they had to, other languages used 1.
Code pages have been an enormous disaster, a program cannot properly read a text file that was encoded in another language's code page. It worked when text files stayed close to the machine that created it, the Internet in particular broke that usage. Japanese was particularly prone to this disaster since it had more than one code page in common use. The result is called mojibake, the user looks at gibberish in the text editor. Unicode came around in 1992 to try solve this disaster. One new standard to replace all the other ones, tends to invoke another kind of disaster.
You are subjected to that kind of disaster, particularly if you use Notepad. A program that tries to be compatible with text files that were created in the past 30 years. Google "bush hid the facts" for a hilarious story about that. Note the dialog you get when you use File > Save As, the dialog has an extra combobox titled "Encoding". The default is ANSI, a broken name from the previous century that means "code page". As you found out, that character indeed only needed 1 byte in your machine's default code page. Depends where you live, it is 1252 in Western Europe and the Americas. You'd get 0x86 if you look at the file with a hex viewer.
Given that the dialog gives you a choice and you should not favor ANSI's mojibake anymore, always favor UTF-8 instead. Maybe they'll update Notepad some day so it uses a better default, very hard to do.

Addressing mode in IA-32

I have searched for Addressing modes in IA-32,but I haven't seen any website or an article which have explained the addressing modes simply.
I need an article or something which explain the matter simply by some picture of the memory during it changes and specifying the address mode by pictures.
I know that in IA-32 general form of addressing follows the following form :
Segment + Base + (index * scale) + displacement
I want to know the exact meaning of the displacement,scale,index and finally the base.
As I don't know English as well I forced to search them but I didn’t find the technical mean of the words for this case ( In assembly programming language I mean ).
Finally, I want an explanation of addressing modes in IA-32 simply and preferably have been represented by pictures about The memory and its offset and ...
I learn assembly programming language by A guide to assembly programming in Linux's book.
So thanks.
Found this image from this power point presentation.
This means that you can have addresses like [eax + ecx * 2 + 100]. You don't necessarily have to use all of these fields.
See also Referencing the contents of a memory location. (x86 addressing modes)
The scale factor is encoded into machine code as a 2-bit shift count. ESP can't be an index because of special cases for indicating the presence of a SIB byte and for a SIB byte with no index. See rbp not allowed as SIB base? for a rundown on the special cases.
Segmentation can be ignored in 32/64-bit mode under normal OSes like Linux.
The segment register is selected automatically depending on the base register in the addressing mode, or with segment override prefix (e.g. ds:, cs:).
But Linux uses a flat memory model so the segment base is always 0 for all segments (other than fs or gs, used for thread-local storage). The segment base is added to the "offset" calculated from base, index, scale and displacement to get the final linear address. So normally the "offset" part is the whole linear address.
That linear address is a virtual address, which the hardware translates to physical via the page tables / TLB (managed by the kernel).

What are the advantage and disadvantages of right justified I2S mode over Left justified I2S mode and Basic I2S mode?

I2S specifications provide different modes of operation. I tried looking for advantages of one mode over the other but could not find any good literature. Can anyone provide or explain the advantage of one mode over the other ?
With samples that are aligned to the most significant bit, the sample length does not matter (because if the sent and expected sample lengths differ, any ignored or read-as-zero bits are the LSBs).
Otherwise, all choices are arbitrary and do not have any advantage whatsoever over each other.
(This is why hardware designers choose one I²S mode at random, instead of selecting an obviously best mode.)

aksing a statement in "BASIC" language

I don't know if any one have experience in programming in BASIC language. I am reading a manual regarding a device which used enhanced BASIC language. I have no experience at all. The statement is like
OUTPUT 621 USING "#, K, 1024(W)";
I wonder what's the using statement is for? what's '#', 'K' and 1024(W) really mean? Sorry that the manual is so old and some of the pages lost and I can't even tell more information from the context.
In BASIC, USING statement was typically used for output formatting. So you can read that as "Output number 621 using formatting "#, K, 1024(W)"
What that formatting means, I think that's totally dependent on the BASIC dialect, though. You have to consult it's reference manual. # means "number", for sure, and the rest probably specifies how that number should be formatted.
Example of PRINT USING in TrueBasic manual (PDF, found by google).
As a totally wild guess, it could mean, use suffix K after dividing by 1024 and rounding as specified by (W). If this is so, then number 621 is probably number of bytes, and output is wanted in kilobytes.
It looks to me like this is a statement to write output to some type of external storage, any of the myriad types of tapes and disks that existed 35 or 40 years ago. Before things became more standardized with the advent of operating systems like CP/M and MS-DOS, there were hundreds (I'm guessing at the number) of companies building and marketing computers with their own proprietary operating systems. Each one would have its own commands and syntax for reading and writing to peripherals (as any storage outside the RAM was called in those days).
621 probably is the code for the particular tape drive, disk pack or floppy disk that they wanted to write output to. K is probably just a parameter for an option of some sort. I'm pretty sure that 1024(W) refers to the length in bytes to be allocated on the disk or tape for each instance that is written, and I'm even more certain that (W) means to access the device in write-only mode.

Resources