Deadlock questions (OS related) - resources

Consider a computer with five individual resources name R1 …. R5. Let five processes P1, …. P5 make requests in order, as follows:
i. P1 requests R2
ii. P4 requests R3
iii. P3 requests R1
iv. P2 requests R4
v. P5 requests R5
vi. P4 requests R2
vii. P5 requests R3
viii. P3 requests R5
Assume Process P_i gets R_i if R_i is currently available.
Is there deadlock and if so at what point did it occur and which processes did it involve?
Can anyone please help me out? For the first one I was thinking there's no deadlock, but I'm not sure how to prove.
Thanks!

Assuming that's the actual sequence of events, there is no deadlock there. Initially, all resources are free, Running that code in sequence:
1. P1 requests R2: p1 has r2.
2. P4 requests R3: p4 has r3.
3. P3 requests R1: p3 has r1.
4. P2 requests R4: p2 has r4.
5. P5 requests R5: p5 has r5.
6. P4 requests R2: p4 has r3, awaiting r2(p1).
7. P5 requests R3: p5 has r5, awaiting r3(p4).
8. P3 requests R5: p3 has r1, awaiting r5(p5).
So the current state is:
p1 has r2.
p2 has r4.
p3 has r1, awaiting r5(p5).
p4 has r3, awaiting r2(p1).
p5 has r5, awaiting r3(p4).
and the chain of waits is (waiter -> blocker):
p4 -> p1, p5 -> p4, p3 -> p5
or:
p3 -> p5 -> p4 -> p1
Because there is no cycle in there, deadlock has not happened.
Further proof can be obtained by simply having the non-waiters releasing their resources and following the chain of events:
9. p1 releases r2, frees it up for p4 (p3 and p5 blocked, p4 has r3/r2).
10. p4 releases r3, frees it up for p5 (p3 blocked, p5 has r5/r3).
11. p5 releases r5, frees it up for p3 (nobody is blocked, p3 has r5).
Final state after those steps is that:
p1 has nothing.
p2 has r4.
p3 has r1/r5.
p4 has r2.
p5 has r3.
Now there are no blockages and each thread can simply release whatever resources they still have allocated.
Now, were you to expand your question to ask if deadlock were possible if those operations could happen in any order (while maintaining order within each thread), the answer is still no.
It's a basic tenet of mult-threading that you can avoid deadlock by ensuring all your resources are allocated in the same order in every thread. From the operations you gave, the individual threads allocate their resources as follows (order must be maintained within a thread):
P1: R2
P2: R4
P3: R1 R5
P4: R3 R2
P5: R5 R3
So, how can we ensure they're all allocating in the same sequence? We just need to find a sequence that matches. First, we start with the above list but add spaces so that like resources are lined up and no resource is on both sides of another resource:
P1: R2
P2: R4
P3: R1 R5
P4: R3 R2
P5: R5 R3
R4 R1 R5 R3 R2 <==
And there's your sequence. Every thread is allocating resources in the order 4, 1, 5, 3, 2. Not every thread allocates every resource but that's irrelevant here.
That's also not the only solution (R4 is stand-alone so it could go anywhere in the list - every other resource is involved in a single dependency chain (1,5,3,2) so their relative positions are fixed).
However, it's sufficient to prove that every thread is allocating the resources in a specific order, hence deadlock is impossible.

Related

Can more than seven arguments be passed to system call in arm linux?

In arm linux(EABI), system call number is passed in r7 and the arguments can be passed in r0-r6 registers
Below table from (syscall(2)) shows the registers used to pass the system call
arguments.
arch/ABI arg1 arg2 arg3 arg4 arg5 arg6 arg7 Notes
──────────────────────────────────────────────────────────────
alpha a0 a1 a2 a3 a4 a5 -
arc r0 r1 r2 r3 r4 r5 -
arm/OABI a1 a2 a3 a4 v1 v2 v3
arm/EABI r0 r1 r2 r3 r4 r5 r6
I am just curious whether seven is the maximum number of arguments that can be passed to arm linux in a system call. Is it possible to pass more arguments ?
For system calls passing more than 3-4 arguments is normally a plus. The reason for using registers in passing arguments to a system call is that normally, in switching to kernel mode, you change the stack, so you have to access the parameters stored in the user stack by using poor efficiency means. When you need to pass more info than what fits in 7 registers, then you normally pass a pointer to a structure that has all the information (probably you have already seen this with some system calls in the system you use)
For normal procedure calls the stack is always there, so the maximum number of parameters is not an issue.

Why ADD is 4 cycles on Z80?

I use this ALU block diagram as a learning material : http://www.righto.com/2013/09/the-z-80-has-4-bit-alu-heres-how-it.html
I am not familiar with electronics. I am currently believing that a clock cycle is needed to move data from registers or latch to another register or latch, eventually throught a net of logical gates.
So here is my understanding of what happens for and ADD :
Cycle 1 : move registers to internal latchs
Cycle 2 : move low nibbles internal latchs to internal result latch (through the ALU)
Cycle 3, in parallell :
move high nibbles internal latchs to destination register (through the ALU)
move internal result latch to register
I think operations cycle 3 are done in parallell because there are two 4 bits bus (for high and low nibbles) and the register bus seems to be 8 bits.
Per the z80 data sheet:
The PC is placed on the address bus at the beginning of the M1 cycle.
One half clock cycle later the MREQ signal goes active. At this time
the address to the memory has had time to stabilize so that the
falling edge of MREQ can be used directly as a chip enable clock to
dynamic memories. The RD line also goes active to indicate that the
memory read data should be enabled onto the CPU data bus. The CPU
samples the data from the memory on the data bus with the rising edge
of the clock of state T3 and this same edge is used by the CPU to turn
off the RD and MREQ signals. Thus, the data has already been sampled
by the CPU before the RD signal becomes inactive. Clock state T3 and
T4 of a fetch cycle are used to refresh dynamic memories. The CPU uses
this time to decode and execute the fetched instruction so that no
other operation could be performed at this time.
So it appears mostly to be about memory interfacing to read the opcode rather than actually doing the addition — decode and execution occurs entirely within clock states T3 and T4. Given that the z80 has a 4-bit ALU, it would take two operations to perform an 8-bit addition. Which likely explains the use of two cycles.

Easiest way to access secure (TrustZone) instructions from privileged context on Cortex A8/DM3730

I have a pretty weird thing I need to do: Access some "secure" instructions for things that don't really need to be done in a secure context. In short: I need to get in to Secure Mode, but not because I want Hardware TPM-ish functionality or anything. I just need access to certain instructions that I wouldn't otherwise have.
We're doing this on Gumstix Overo FireSTORM COMs. It is my understanding these boot securely, but then somewhere (MLO? u-boot?) they switch to non-secure mode, but I could be wrong. The point is that we're certainly doing this from nonsecure (but privileged, see below) mode.
(I authored this question, about direct access to the GHB/BTB of the A8 branch predictor, if you're curious about what I'm looking to do: Write directly to the global history buffer (GHB) or BTB in the branch predictor of a ARM Cortex A8?)
Now, all of this will be done from u-boot (we've got Overo FireSTORM COMs), so luckily I have "privileged" execution. No worries there. And I've looked at other StackOverflow questions, but there doesn't seem to be anything on how, exactly, to get to secure mode. All I really wanna do is access some CP15 registers, and then go back to non-secure mode (and potentially repeat the process).
I've looked into the SMC instruction, but I can't find any documentation on how to appropriately trap the call/where the call goes to/how to set that up, etc.
Is that information anywhere?
To recap, here's what I want to do:
FROM PRIVILEGED EXECUTION:
Do stuff
Tweak GHB // requires secure execution
Do more stuff
Tweak GHB
Do more stuff
...
...
...
Do stuff
Any help would CERTAINLY be appreciated!
Thanks to #artlessnoise, I found this file in the u-boot source: /u-boot/arch/arm/cpu/armv7/nonsec_virt.S.
It contains the following code:
/*
* secure monitor handler
* U-boot calls this "software interrupt" in start.S
* This is executed on a "smc" instruction, we use a "smc #0" to switch
* to non-secure state.
* We use only r0 and r1 here, due to constraints in the caller.
*/
.align 5
_secure_monitor:
mrc p15, 0, r1, c1, c1, 0 # read SCR
bic r1, r1, #0x4e # clear IRQ, FIQ, EA, nET bits
orr r1, r1, #0x31 # enable NS, AW, FW bits
#ifdef CONFIG_ARMV7_VIRT
mrc p15, 0, r0, c0, c1, 1 # read ID_PFR1
and r0, r0, #CPUID_ARM_VIRT_MASK # mask virtualization bits
cmp r0, #(1 << CPUID_ARM_VIRT_SHIFT)
orreq r1, r1, #0x100 # allow HVC instruction
#endif
mcr p15, 0, r1, c1, c1, 0 # write SCR (with NS bit set)
#ifdef CONFIG_ARMV7_VIRT
mrceq p15, 0, r0, c12, c0, 1 # get MVBAR value
mcreq p15, 4, r0, c12, c0, 0 # write HVBAR
#endif
movs pc, lr # return to non-secure SVC
Presumably if I modified the mask for the mcr p15 instruction, I could simply "turn off" the move to nonsecure mode. This will probably kill u-boot, however.
So the question is, then: How do I set the appropriate vector so when I make the SMC call, I jump back into secure mode, and am able to do my GHB/BTB tinkering?
Any other help is appreciated!
The DM3730 on the Gumstix is a GP (general purpose) device, which means it has TrustZone disabled. There's no way you can get in to it.
See https://stackoverflow.com/a/8028948/6839

Should I use Parallel Computing Toolbox or Matlab Distributed Computing server?

I have the following pseudo code (a loop) that I am trying to implement it (variable step size implementation) by using Matlab Parallel computing toolbox or Matlab distributed server computing. Actually, I have a matlab code for this loop that works in ordinary matlab 2013a.
Given: u0, t_0, T (initial and ending time value), the initial step size: h0
while t_0 < T
% the fist step is to compute U1, U2 which depend on t_0 and some known parameters
U1(t_0, h0, u0, parameters)
U2(t_0, h0, u0, parameters)
% so U1 and U2 are independent, which can be computed in parallel using Matlab
% the next step is to compute U3, U4, U5, U6 which depends on t_0, U1, U2, and known parameters
U3(t_0, h0, u0, U1, U2, parameters)
U4(t_0, h0, u0, U1, U2, parameters)
U5(t_0, h0, u0, U1, U2, parameters)
U6(t_0, h0, u0, U1, U2, parameters)
% so U3, U4, U5, U6 are independent, which can be also computed in parallel using Matlab
%finally, compute U7 and U8 which depend on U1,U2,..,U6
U7(t0, u0,h0, U1,U2,U3,U4,U5,U6)
U8(t0, u0,h0,U1,U2,U3,U4,U5,U6)
% so U7 and U8 are also independent, and we can compute them in parallel as well.
%Do step size control here, then assign h0:=h_new
t0=t0+h_new
end
Could you please suggest me the best way to implement the above code using Matlab parallel?
By the best way I mean I want to get a speedup for the whole computation as fast as possible.
(I have an access to supercomputer LEO III which has 162 computer nodes (with a total of 1944 cores). So each node has 12 cores.)
My idea is to compute U1, U2 on two separate workers (cores) which have their own memory, at the same time. Using the obtained results for U1, U2, one can do the similar way for computing U3,U4,U5,U6, and finally for U7, U8. For that I think I need to use PARFOR within Matlabpool? But I do not know how many indices (corresponding to the number of cores/processors) I need for the loop.
My questions are:
I can use supercomputer as mentioned above, so I can use Matlab Distributed Computing server?
For this code, should I use Parallel Computing Toolbox or Matlab Distributed Computing server?
I mean with Parallel Computing Toolbox (local workers), I cannot specify which workers will compute U1 and U2 (also for U3, U4,...) since they share memory and run interactively, is it right?
If I would use the proposed idea, then how many workers that I will need? probably 8 cores?
Is this better to use 1 compute node and ask for 9 cores (8 for use and one for matlab session) or to use 8 computer nodes?
I am a beginner with Matlab Parallel Computing. Please give your suggestions!
Thanks!
Peter
I suggest to parallelize the while-loop, since you want to be distributing many iterations among the nodes. Parfor is the easiest way to start with parallel computing, and does a good job for straightforward problems as yours. Only go with server if there's a lot of time steps that each take some significant time, because any parallelization comes with a certain overhead.
Computing locally allows you to make use of 12 cores in recent versions of Matlab; make sure that you have enough RAM to keep 13 copies of your loop body in memory. With good processor architecture and with no other programs competing for resources, it is fine to run on all cores.
Thus:
timeSteps = t0:h:T;
parfor timeIdx = 1:length(timeSteps)
t0 = timeSteps(timeIdx);
%# calculate all your u's here
%# collect the output
result{timeIdx,1} = U7;
result{timeIdx,2} = U8;
end
I would say all computations of U1,..U8 will need to call a function for computing matrix-vector multiplications. Let say we do not care about how long do they take for the moment (not much in my case). The problem is that, for the previous methods, U1,..,U8 are not independent (they are dependent!). That means to compute U_{i+1} you need U_{i}. So you need to compute them sequentially one after other. Now I could construct such a method that allows to compute U1, U2 at the same time (independent), the same holds for U3,..,U6, and for U7, U8. So I want to save the cpu time for the whole computation. That why I think one could use matlab parallel computing.

need help writing a program

I am taking a class in microprocessing, and having some trouble writing a program that will hold a value in a port for two seconds before moving on to the next port.
Can any one help this make more sense?
I have thought of using NOP but realized thats a bit unrealistic, I have tried ACALL DELAY but for some reason its pulling up as an unknown command.
I am stumped at this point and would appreciate any help I could get.
I am using the DS89C450 With a clock of 11 MHz, i've tried asking the professor and he tells me its a peice of cake you should have this no problem, but reading and writing code is breand new to me ive only been doing it for two weeks. when i look at the book its almost like it written in chinese its hard to make sense of it, my fellow class mates are just as stummped as i am, i figured my final resort would be to ask someone online that might of had a similar problem or someone who has a little more insight that might be able to pont me in the right direction.
I know i need to load each port with the specified value my problems lies in the switching of the ports giving them the 2 second delay.
My program look likes this MOV P0, #33H MOV P1, #7FH MOV P2, B7H MOV P3, EFH so with these four ports being loaded with these values i need P0 to go to P1, P1-P2 and so on when getting to P3 its value needs to go to P0 and loop it all. i was going to use SJMP to loop it back to the start so the program is always running
While doing this there is the two second delay where each value only stays in each port for only two seconds thats what still fuzzy, does the rest sound right ?
i have done something similar in PIC 16f84 micro-controller
to make a delay you have 2 ways either use interrupts or loops
since you know the Instructions_per_second you can use a loop to generate the required number of instructions that takes the required time
this link illustrates how to determine the loop indexes (as you might need nested loops if the number of instructions required is required .. in PIC i had to make 1 million instruction to make a delay of 1 second)
I've never done this with that particular chip (and I don't know the assembly syntax it supports), but a pseudocode approach would be something like this:
Load initial values into ports
Initialize counter with (delay in seconds * clock ticks per second) / (clock ticks in loop)
While counter != 0
Decrement counter
Swap port values:
P3 -> temp, P2 -> P3, P1 -> P2, P0 -> P1, temp -> P0
Loop (4 times?)
I think this is all you really need for structure. Based on my 10 minute reading on 8051 assembly, the delay loop would look like:
MOV A, b6h ; ~91 ticks/sec # 11 ms/tick
DELAY: DEC A
JNZ DELAY ; NOP-type delay loop

Resources