How does the instructor pointer work with a stack? - befunge

I recently read about a programming language called Befunge (esoteric programming language). It deals with operations with a stack. The Wikipedia article contains this text:
The instruction pointer begins at a set location (the upper-left corner of the playfield) and is initially travelling in a set direction (right). As it encounters instructions, they are executed. Befunge-93 has no jumps farther than two cells, so control flow is done by altering the direction of the program counter, sending it to different literal code paths. (Befunge-98 has far jumps, but because of inertia in the community, these are rarely used.) The following, for example, is an infinite loop: >v ^<
I don’t understand what "instructor pointer" means. Can somebody explain it to me in simpler terms?

This is what the linked Wikipedia page says right at the beginning. Emphasis is mine.
A Befunge program is laid out on a two-dimensional playfield of fixed size. The playfield is a rectangular grid of ASCII characters, each generally representing an instruction. The playfield is initially loaded with the program.
Execution proceeds by the means of a program counter (-93) or instruction pointer (-98). This points to a grid cell on the playfield.
Both terms program counter and instruction pointer mean the same thing. It is an abstract "variable" that realizes a pointer to the next instruction to execute. You can literally use your pointer finger (sic!) and use it to read the next character of the program (grid cell), which encodes the instruction.
In this case the instruction pointer needs to hold at least two indexes, one for the row and one for the column of the current cell. One would also add the current direction to be able to advance.

Related

Many of the Rust Compilers target definitions use "p270:32:32-p271:32:32-p272:64:64" inside the data-layout - what does it mean?

A few days ago, the basic understanding of the data layout strings of Rust compiler, or to be more specific, the underlying LLVM, was already mostly resolved on Stack Overflow. Unfortunately, one thing is still unclear.
Many Rust compiler targets include p270:32:32-p271:32:32-p272:64:64 inside their data layout string. Examples are i686-unknown-uefi, x86_64-uwp_windows-msvc, x86_64-unknown-uefi, x86_64-unknown-linux_gnu, x86_64-fuchsia, or 86_64-apple-darwin.
(These targets can be found here https://github.com/rust-lang/rust/tree/1.52.1/compiler/rustc_target/src/spec.)
The LLVM Language Reference explains:
p[n]:<size>:<abi>:<pref>:<idx>
This specifies the size of a pointer and its and erred alignments for address space n. The fourth parameter is a size of index that used for address calculation. If not specified, the default index size is equal to the pointer size. All sizes are in bits. The address space, n, is optional, and if not specified, denotes the default address space 0. The value of n must be in the range [1,2^23).
I don't understand this. What is so special about p270 to p272? To which "address space" are they referring to?
These data layout strings were commited to Rust on 2020-01-07. The commit message says "Update data layouts to include new X86 address spaces". After more research I found that the underlying functionality was merged into LLVM in 2019. These new address spaces are MSVC's __ptr32, __ptr64, __sptr, and __uptr extensions [1].
Quote from the LLVM discussion:
The numbers 270-272 are more or less arbitrary; I picked them because they're near 256-258, which are the current existing address spaces.
If you look into X86.h in LLVM's source code, one can see that this number is used as identifier and is chosen at will but not for technical reasons.

How to represent a method call in a loop in a sequence diagram?

I know that one use a box with "loop" in the corner, when we have a loop which uses several objects. Is it the same with just one object?
Yes. You can place a loop fragment over a self-call and that simply means a repetitive call. A guard placed in square brackets (top left of the fragment) tells when the loop will end.
However, graphical programming isn't the best idea at all. For such trivial cases some note or pseudo code are better suited. In most cases that info is obvious from the context anyway and should not be presented graphically. Don't underestimate the programmers doing the coding in the end.

Conflicting alignment rules for structs vs. arrays

As the title implies, the question is regarding alignment of aggregate types in x86-64 on Linux.
In our lecture, the professor introduced alignment of structs (and the elements thereof) with the attached slide. Hence, I would assume (in accordance with wikipedia and other lecture material) that for any aggregate type, the alignment is in accordance to its largest member. Unfortunately, this does not seem to be the case in a former exam question, in which it said:
"Assuming that each page table [4kB, each PTE 64b] is stored in memory
at a “naturally aligned” physical address (i.e. an address which is an
integer multiple of the size of the table), ..."
How come that for a page table (which afaik is basically an array of 8 byte values in memory), alignment rules are not according to the largest element, but to the size of the whole table?
Clarification is greatly appreciated!
Felix
Why page tables are aligned on their size
For a given level on the process of translating the virtual address, requiring the current page table to be aligned on its size in bytes speeds up the indexing operation.
The CPU doesn't need to perform an actual addition to find the base of the next level page table, it can scale the index and then replace the lowest bits in the current level base.
You can convince yourself this is indeed the case with a few examples.
It's not a coincidence x86s follow this alignment too.
For example, regarding the 4-level paging for 4KiB pages of the x86 CPUs, the Page Directory Pointer field of a 64-bit address is 9 bits wide.
Each entry in that table (a PDPTE) is 64 bits, so the page size is 4096KiB and the last entry has offset 511 * 8 = 4088 (0xff8 in hex, so only 12 bits used at most).
The address of a Page Directory Pointer table is given by a PML4 entry, these entries have don't specify the lower 12 bits of the base (which are used for other purposes), only the upper bits.
The CPU can then simply replace the lower 12 bits in the PML4 entry with the offset of the PDPTE since we have seen it has size 12 bits.
This is fast and cheap to do in hardware (no carry, easy to do with registers).
Assume that a country has ZIP codes made of two fields: a city code (C) and a block code (D), added together.
Also, assume that there can be at most 100 block codes for a given city, so D is 2 digits long.
Requiring that the city code is aligned on 100 (which means that the last two digits of C are zero) makes C + D like replacing the last two digits of C with D.
(1200 + 34 = 12|34).
Relation with the alignment of aggregates
A page table is not regarded as an aggregate, i.e. as an array of 8 byte elements. It is regarded as a type of its own, defined by the ISA of the CPU and that must satisfy the requirement of the particular part of the CPU that uses it.
The page walker finds convenient to have a page table aligned on their size, so this is the requirement.
The alignment of aggregates is a set of rules used by the compiler to allocate objects in memory, it guarantees that every element alignment is satisfied so that instructions can access any element without alignment penalties/fault.
The execution units for loads and stores are a different part of the CPU than the page walker, so different needs.
You should use the aggregates alignment to know how the compiler will align your structs and then check if that's enough for your use case.
Exceptions exist
Note that the professor went a long way with explaining what alignment on their natural boundary means for page tables.
Exceptions exist, if you are told that a datum must be aligned on X, you can assume there's some hardware trick/simplification involved and try to see which one but in the end you just do the alignment and move on.
Margaret explained why page tables are special, I'm only answer this other part of the question.
according to the largest element.
That's not the rule for normal structs either. You want max(alignof(member)) not max(sizeof(member)). So "according to the most-aligned element" would be a better way to describe the required alignment of a normal struct.
e.g. in the i386 System V ABI, double has sizeof = 8 but alignof = 4, so alignof(struct S1) = 41
Even if the char member had been last, sizeof(struct S1) still has to be padded to a multiple of its alignof(), so all the usual invariants are maintained (e.g. sizeof( array ) = N * sizeof(struct S1)), and so stepping by sizeof always gets you to a sufficiently-aligned boundary for the start of a new struct.
Footnote 1: That ABI was designed before CPUs could efficiently load/store 8 bytes at once. Modern compilers try to give double and [u]int64_t 8-byte alignment, e.g. as globals or locals outside of structs. But the ABI's struct layout rules fix the layout based on the minimum guaranteed alignment for any double or int64_t object, which is alignof(T) = 4 for those types.
x86-64 System V has alignof(T) = sizeof(T) for all the primitive types, including the 8-byte ones. This makes atomic operations on any properly-aligned int64_t possible, for example, simplifying the implementation of C++20 std::atomic_ref to not have to check for sufficient alignment. (Why is integer assignment on a naturally aligned variable atomic on x86?)

STEP (ISO 10303-21) Unset Attributes

I've been building a parser for STEP-formatted data (specifically the ISO 10303-21 standard), but I've run into a roadblock revolving around a single character - '$'.
A quick Google search reveals that in STEP, this character denotes an 'unset' value, I interpreted this as an uninitialized value, however I don't know exactly what I should do with it.
For example, take the following definitions:
#111=AXIS2_PLACEMENT_3D('Circle Axis2P3D',#109,#110,$) ;
#109=CARTESIAN_POINT('Axis2P3D Location',(104.14,0.,0.)) ;
#110=DIRECTION('Axis2P3D Direction',(1.,-0.,0.)) ;
To me I cannot see how this would even work, as the reference direction is uninitialized, and therefore an x-axis cannot be derived, meaning that anything using this Axis2Placement would also have undefined data.
Normally when I reach this point, I would just define some sort of default data for the given data-type (Vertices (0,0,0), Directions(1,0,0), etc.), however this doesn't seem to work, because there's the chance that my default direction would cause conflicts with the supplied data.
I have Googled what to do in this scenario, only to come up with nothing.
I also have a PDF for a very similar STEP format (ISO-10303-42), but it too doesn't mention any sort of default data, or provide any more insight in to how this works.
So, to explicitly state my question: what do I do with uninitialized data in STEP (ISO 10303-21) data?
You need to be able to represent 'unset' as a distinct value. It doesn't mean the same thing as an uninitialized value or a default value. For example you might represent AXIS2_PLACEMENT_3D as an object with data members that are pointers to point to CARTESIAN_POINT and DIRECTION, and the $ means that that pointer will be null in your representation.
Dealing with null values will depend on the context. It may be some kind of error if the data is really necessary. Or it may be that the data isn't necessary, such as if you don't need the axis to be oriented, and a point and direction are sufficient to represent the data.
In this case: #111 is local coordinate system with following 4 attributes:
character name;
pointer to origin (#109);
pointer to axis (#110);
pointer to second axis (reference direction).
If #111 is a coordinate system of circle (I guess it with 'name' value), axis is normal of circle plane, while reference direction points to start of circle (position of zero t parameter of circle). Since a circle is closed curve, you can locate this zero t position in arbitrary place, it has no influence on circle geometric shape, reference direction in this case is not mandatory, and is omitted. And it is your choice how you handle this situation.
When $ sign is used , the value is not required. Specifically, if there are a series of optional values and you want to specify a value for, let's say, 3rd optional argument and you don't want to specify values for the 1st and 2nd optional arguments, you can use $ sign for those two.
Take a look here for a better description.

How is integer overflow exploitable?

Does anyone have a detailed explanation on how integers can be exploited? I have been reading a lot about the concept, and I understand what an it is, and I understand buffer overflows, but I dont understand how one could modify memory reliably, or in a way to modify application flow, by making an integer larger than its defined memory....
It is definitely exploitable, but depends on the situation of course.
Old versions ssh had an integer overflow which could be exploited remotely. The exploit caused the ssh daemon to create a hashtable of size zero and overwrite memory when it tried to store some values in there.
More details on the ssh integer overflow: http://www.kb.cert.org/vuls/id/945216
More details on integer overflow: http://projects.webappsec.org/w/page/13246946/Integer%20Overflows
I used APL/370 in the late 60s on an IBM 360/40. APL is language in which essentially everything thing is a multidimensional array, and there are amazing operators for manipulating arrays, including reshaping from N dimensions to M dimensions, etc.
Unsurprisingly, an array of N dimensions had index bounds of 1..k with a different positive k for each axis.. and k was legally always less than 2^31 (positive values in a 32 bit signed machine word). Now, an array of N dimensions has an location assigned in memory. Attempts to access an array slot using an index too large for an axis is checked against the array upper bound by APL. And of course this applied for an array of N dimensions where N == 1.
APL didn't check if you did something incredibly stupid with RHO (array reshape) operator. APL only allowed a maximum of 64 dimensions. So, you could make an array of 1-64 dimension, and APL would do it if the array dimensions were all less than 2^31. Or, you could try to make an array of 65 dimensions. In this case, APL goofed, and surprisingly gave back a 64 dimension array, but failed to check the axis sizes.
(This is in effect where the "integer overflow occurred"). This meant you could create an array with axis sizes of 2^31 or more... but being interpreted as signed integers, they were treated as negative numbers.
The right RHO operator incantation applied to such an array to could reduce the dimensionaly to 1, with an an upper bound of, get this, "-1". Call this matrix a "wormhole" (you'll see why in moment). Such an wormhole array has
a place in memory, just like any other array. But all array accesses are checked against the upper bound... but the array bound check turned out to be done by an unsigned compare by APL. So, you can access WORMHOLE[1], WORMHOLE[2], ... WORMHOLE[2^32-2] without objection. In effect, you can access the entire machine's memory.
APL also had an array assignment operation, in which you could fill an array with a value.
WORMHOLE[]<-0 thus zeroed all of memory.
I only did this once, as it erased the memory containing my APL workspace, the APL interpreter, and obvious the critical part of APL that enabled timesharing (in those days it wasn't protected from users)... the terminal room
went from its normal state of mechanically very noisy (we had 2741 Selectric APL terminals) to dead silent in about 2 seconds.
Through the glass into the computer room I could see the operator look up startled at the lights on the 370 as they all went out. Lots of runnning around ensued.
While it was funny at the time, I kept my mouth shut.
With some care, one could obviously have tampered with the OS in arbitrary ways.
It depends on how the variable is used. If you never make any security decisions based on integers you have added with input integers (where an adversary could provoke an overflow), then I can't think of how you would get in trouble (but this kind of stuff can be subtle).
Then again, I have seen plenty of code like this that doesn't validate user input (although this example is contrived):
int pricePerWidgetInCents = 3199;
int numberOfWidgetsToBuy = int.Parse(/* some user input string */);
int totalCostOfWidgetsSoldInCents = pricePerWidgetInCents * numberOfWidgetsToBuy; // KA-BOOM!
// potentially much later
int orderSubtotal = whatever + totalCostOfWidgetInCents;
Everything is hunky-dory until the day you sell 671,299 widgets for -$21,474,817.95. Boss might be upset.
A common case would be code that prevents against buffer overflow by asking for the number of inputs that will be provided, and then trying to enforce that limit. Consider a situation where I claim to be providing 2^30+10 integers. The receiving system allocates a buffer of 4*(2^30+10)=40 bytes (!). Since the memory allocation succeeded, I'm allowed to continue. The input buffer check won't stop me when I send my 11th input, since 11 < 2^30+10. Yet I will overflow the actually allocated buffer.
I just wanted to sum up everything I have found out about my original question.
The reason things were confusing to me was because I know how buffer overflows work, and can understand how you can easily exploit that. An integer overflow is a different case - you cant exploit the integer overflow to add arbitrary code, and force a change in the flow of an application.
However, it is possible to overflow an integer, which is used - for example - to index an array to access arbitrary parts of memory. From here, it could be possible to use that mis-indexed array to override memory and cause the execution of an application to alter to your malicious intent.
Hope this helps.

Resources