Acquring Processor ID in Linux - linux

On Microsoft Windows you can get Processor ID (not Process ID) via WMI which is based in this case (only when acquiring Processor ID) on CPUID instruction
Is there a similar method to acquire this ID on Linux ?

I do not know what WMI is and MS-Windows "CPUID instruction", since I do not know or use MS-Windows (few users here do). So I cannot say for sure if this is offering the same information, but have a try with cat /proc/cpuinfo. If you require a specific value you can grep that out easily.
If you need do to this from within a program then you can use the file utils to read such information. Always keep in mind one of the most basic principles of 'unix' style operating systems: everything is a file.

For context of the OP's question, ProcessorID value returned by WMI is documented thus:
Processor information that describes the processor features. For an
x86 class CPU, the field format depends on the processor support of
the CPUID instruction. If the instruction is supported, the property
contains 2 (two) DWORD formatted values. The first is an offset of
08h-0Bh, which is the EAX value that a CPUID instruction returns with
input EAX set to 1. The second is an offset of 0Ch-0Fh, which is the
EDX value that the instruction returns. Only the first two bytes of
the property are significant and contain the contents of the DX
register at CPU reset—all others are set to 0 (zero), and the contents
are in DWORD format.
As an example, on my system:
C:\>wmic path Win32_Processor get ProcessorId
ProcessorId
BFEBFBFF000206A7
Note that the ProcessorID is simply a binary-encoded format of information usually available in other formats, specifically the signature (Family/Model/Stepping/Processor type) and feature flags. If you only need the information, you may not actually need this ID -- just get the already-decoded information from /proc/cpuinfo.
If you really want these 8 bytes, there are a few ways to get the ProcessorID in Linux.
With root/sudo, the ID is contained in the output of dmidecode:
<snip>
Handle 0x0004, DMI type 4, 35 bytes
Processor Information
Socket Designation: CPU Socket #0
Type: Central Processor
Family: Other
Manufacturer: GenuineIntel
ID: A7 06 02 00 FF FB EB BF
<snip>
Note the order of bytes is reversed: Windows returns the results in Big-Endian order, while Linux returns them in Little-Endian order.
If you don't have root permissions, it is almost possible to reconstruct the ProcessorID from /proc/cpuinfo by binary-encoding the values it returns. For the "signature" (the first four bytes in Windows/last four bytes in Linux) you can binary encode the Identification extracted from /proc/cpuinfo to conform to the Intel Documentation Figure 5-2 (other manufacturers use it for compatibility).
Stepping is in bits 3-0
Model is in bits 19-16 and 7-4
Family is in bits 27-20 and 11-8
Processor type is in bits 13-12 (/proc/cpuinfo doesn't tell you this, assume 0)
Similarly, you can populate the remaining four bytes by iterating over the feature flags (flags key in /proc/cpuinfo) and setting bits as appropriate per Table 5-5 of the Intel doc linked above.
Finally, you can install the cpuid package (e.g., on Ubuntu, sudo apt-get install cpuid). Then by running the cpuid -r (raw) command you can parse its output. You would combine the values from the EAX and EDX registers for an initial EAX value of 1:
$ cpuid -r
CPU 0:
0x00000000 0x00: eax=0x0000000d ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
0x00000001 0x00: eax=0x000206a7 ebx=0x00020800 ecx=0x9fba2203 edx=0x1f8bfbff
<snip>

Related

What is the final signature in the boot sector? Is it 0xaa55 or 0x55aa?

As I was studying the BIOS and booting I found that 0xaa55 is used as a signature to check whether the 512-byte boot sector is corrupted or not. Now In some cases, it is mentioned as 0xaa55 and in some 0x55aa. I am confused. I also think it is because of the endianness of the CPU. But shouldn't the endianness should be of the word. Hence either 0000aa55 or 55aa0000 (assuming a 32 bit word). Or is it that inherently the boot sector has 16-bit words?
Historically, that functionality of the PC BIOS predates 32-bit x86 CPUs and the boot process as we know it dates back to the 16-bit 8086. Some elements may be older than that, but that's a bit before my time.
You're right that the confusion is endian related. The last byte of the 512-byte boot sector is aa and the byte before it is 55. So it's common to refer to the signature as 55aa as that's the order it appears byte-wise in the boot sector. It's also the order it appears in memory as individual bytes - when the boot sector is loaded at 0000:7c00, the byte at 7dff is aa.
However, when expressed as a 16-bit word on the little-endian x86, it's aa55. Because of that ordering, it's common to put the end-of-block signature in an assembly file like this:
.word 0xaa55 # this is AT&T syntax
because it's simpler than two .byte directives, which would be in reverse order:
.byte 0x55
.byte 0xaa
so people sometimes refer to the signature in that manner.

Linux x86 CPU Instruction Layout Confusion

In x86, I understand multi-byte objects are stored in memory little endian style.
Now generally speaking, when it comes to CPU instructions, the OPCODE determines the purpose of the instruction and data/memory addresses may follow the opcode in it's encoded format. My understanding is the Opcode portion of the instruction should be the most significant byte and thus appear at the highest address of any given instruction encoding representation.
Can someone explain the memory layout on this x86 linux gdb example? I would imagine that the opcode 0xb8 would appear at a higher address due to it being the most significant byte.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x08048080 <+0>: mov eax,0x11223344
(gdb) x/1xb _start+0
0x8048080 <_start>: 0xb8
(gdb) x/1xb _start+1
0x8048081 <_start+1>: 0x44
(gdb) x/1xb _start+2
0x8048082 <_start+2>: 0x33
(gdb) x/1xb _start+3
0x8048083 <_start+3>: 0x22
(gdb) x/1xb _start+4
0x8048084 <_start+4>: 0x11
It appears the instruction mov eax, 0x11223344 is encoding as 0x11 0x22 0x33 0x44 0xb8.
Questions.
1.) How does the CPU know how many bytes the instruction will take up if the first byte it see's is not an opcode?
2.) I'm wondering if perhaps x86 cpu instructions do not even have endian-ness and are considering some type of string? (probably way off here)
x86 is a variable length instruction set, you start with a single byte which has no endianness, it is wherever it is.
Then depending on the opcode there may be more bytes and those might for example be a 32 bit immediate, and (if that group of bytes is an immediate or address of some sort) THOSE bytes will be little endian. Say you have the five bytes ABCDE (no endianess, think array or string). The A byte is the opcode, the B byte would then be the lower 8 bits of the immediate and the E the upper 8 bits of the immediate.
Opcode is a hard to use term, in these older 8/16 bit CISC processors like x86 the entire byte was an opcode that you basically looked up in a table to see what it meant (and inside the processor they did use a table to figure out how to execute it). When you look at MIPS or ARM or other (certainly RISC) instruction sets like those, only a portion of the 32 bits are the "opcode" and in neither of those cases is it the same set of bits from one instruction to another, you have to look at various places in the instruction (yes there is overlap to make the decoding sane), MIPS is a lot more consistent you have one blob in one place you look at but one of those patterns requires you to look at another blob of bits to fully decode. ARM you start at a common bit and as you work your way across you are further decoding the instruction, then you may have to grab some random looking spots to fully decode. The rest of the bits are operands, what register to use or immediate or whatever the kind of thing that in a CISC you needed a look up table for (are implied by the opcode but not defined by bits in the opcode).
1) the next byte after the prior instruction will be interpreted as an opcode even if not intended to be one (if execution continues to that byte and doesnt branch). I dont remember my x86 table off hand to know if there are any undefined instructions or not, if undefined then it will hit a handler, otherwise it will decode what it finds as machine code and if it is not properly formed instructions will likely crash, sometimes you get lucky and it just messes something up and keeps going, or even more lucky and you cant tell that it almost crashed.
2) you are right for these 8/16 bit CISC or similar instruction sets they are treated more like strings that you parse through linearly.

What exactly is the class byte in JavaCard?

I've started to work with the JavaCards and trying to grasp the sense of CLA byte.
If to read RFC 5.4.1 Class byte
5.4.1 Class byte
According to table 8 used in conjunction with table 9, the class byte
CLA of a command is used to indicate to what extent the command and the response comply with this part of ISO/IEC 7816 and when applicable (see table 9), the format of secure messaging and the logical channel number.
So... CLA flag is used for the indication, but what exact? Because, the table and description as for the beginner is rather difficult, I understand that usually are used the next CLA bytes: 0x00, 0x80, 0x84.
For e.g. if to read the content from table:
0X' Structure and coding of command and response according to this part of ISO/IEC 7816 (for coding of 'X' see table 9)
10 to 7F RFU
Reserved for PTS
I understand that for the fine developing - I should read GlobalPlatform specification, the specification about the exact card (mine is NXP one) and other related materials, but I want to admit, that it's difficult to understand the content.
I've expected the following (pseudo-table):
0x00 -> for reading streams from file system
0x01 -> for writing byte buffer to memory blocks
0x02 -> call AES/RSA methods
The CLASS byte is defined in ISO 7816-4. The first bit indicates the interindustry class. Java Card applets shall operate in this interindustry standard. Global Platform is another specification to manage and maintain the card and all commands will have class byte 0x80 - 0x8F. Class byte 0xFF is used for communication with the card reader in some cases and is otherwise invalid for a card.
The interindustry meaning for the CLA serves 3 major functions:
Function 1: Chaining
bit5 = 1 signalizes that the current command is not the last command of a chain, meaning that multiple APDUs all belong together and the card may therefore do additional things
Function 2: Secure Messaging
bit4+3 serve to signalize the secure messaging status of the current command. This means that the APDU is authenticated(e.g. MACed) and the data is encrypted(e.g. block cipher). The command header is never encrypted.
Function 3: Logical Channel
bit2+1 serve to identify the logical channel number. Logical channels are parallel communication interfaces through the card, therefore an applet A can be selected on Channel 0 and an applet B can be selected on Channel 1 while both applets remain in their internal state(no RAM is reset). Most cards do not support logical channels or you have to enable them explicitly.
CLA byte is a typical trap for Java Card beginners and its usually best that you leave at 0x00 for the start.

How to disassemble these instructions

I am writing a little disassembler using riscv-spec-v2.0 and have some questions about the following instructions and how to correctly disassemble them:
1.
FENCE instruction has "pred" and "succ" bit fields in imm
2.
AMO instructions have "aq" and "rl" bits in in funct7
3.
Float instructions have a "rm" bit field in funct3
All of these bit fields seem to lack mappings in the assembler.
E.g. page 50 just says "FENCE" but not what to do with the intermediate.
Or page 33 has an example of putting .aq or .rl at the end but not what to do if both are present.
4.
SCALL, SBREAK are the same as ECALL, EBREAK
but there is also ERET: so why not drop SCALL and SBREAK
and just use ECALL, EBREAK and ERET because other wise it
is hard to disassemble these opcodes.
The current RISC-V assembler is terse for common defaults:
"FENCE" with no arguments is treated as a full fence (all bits set)
OK to have both on same instruction
Rounding mode not shown if not specified
ECALL and EBREAK will be the new standard names (will be clarified in the revised user ISA manual)

x64 opcodes and scaled byte index

I think I'm getting the Mod R/M byte down but I'm still confused by the effective memory address/scaled indexing byte. I'm looking at these sites: http://www.sandpile.org/x86/opc_rm.htm, http://wiki.osdev.org/X86-64_Instruction_Encoding. Can someone encode an example with the destination address being in a register where the SIB is used? Say for example adding an 8-bit register to an address in a 8-bit register with SIB used?
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?'
Is the SIB always used with a source or destination address?
A memory address is never in an 8-bit register, but here's an example of using SIB:
add byte [rax + rdx], 1
This is an instance of add rm8, imm8, 80 /0 ib. /0 indicates that the r field in the ModR/M byte is zero. We must use a SIB here but don't need an immediate offset, so we can use 00b for the mod and 100b for the rm, to form 04h for the ModR/M byte (44h and 84h also work, but wastes space encoding a zero-offset). Looking in the SIB table now, there are two registers both with "scale 1", so the base and index are mostly interchangeable (rsp can not be an index, but we're not using it here). So the SIB byte can be 10h or 02h.
Just putting the bytes in a row now:
80 04 10 01
; or
80 04 02 01
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?
Yes. You saw the note, I'm sure. So it can be either, depending on whether you used an address size override or not. In every reasonable case, it will be rip + sdword. Using the other form gives you a truncated result, I can't immediately imagine any circumstances under which that makes sense to do (for general lea math sure, but not for pointers). Probably (this is speculation though) that possibility only exists to make the address size override work reasonably uniformly.
Is the SIB always used with a source or destination address?
Depends on what you mean. Certainly, if you have a SIB, it will encode a source or destination (because what else is there?) (you might argue that the SIB that can appear in nop rm encodes nothing because nop has neither sources nor destinations). If you mean "which one does it encode", it can be either one. Looking over all instructions, it can most often appear in a source operand. But obviously there are many cases where it can encode the destination - example: see above. If you mean "is it always used", well no, see that table that you were looking at.

Resources