Driver scheduling (public transportation): enforcing 30 min break after 4 h of driving time

Driver scheduling (public transportation): enforcing 30 min break after 4 h of driving time - constraint-programming

We're struggling with some aspects of the following problem:
a public transportation bus timetable consists of shifts (~ track sections) each with fixed start and end times
bus drivers need to be assigned to each of those shifts
[constraint in question] legal regulations demand that each bus driver has a 30 min break after 4 hours of driving (i.e. after driving shifts)
put differently, a driver accrues driving time when driving shifts that must not exceed 4h unless the driver takes a 30 min break in which case the accrued time is "reset to zero"
In summary, we need to track the accrued driving time of each driver in order to suppress shift assignments to enforce the 30 min break.
The underlying problem seems to sit halfway between a job shop and an assignment problem:
Like job shop problems, it has shifts (or tasks, jobs) with many no-overlap and precedence constraints between them...
...BUT our shifts (~tasks/jobs) are not pre-assigned to drivers; in contrast with job shop problems, the tasks (~shifts) need to be executed on specific machines (~drivers) and are therefore pre-assigned, so assigning them is not part of the problem
Like assignment tasks, we need to assign shifts to as few as possible drivers...
...BUT we also need to handle the aforementioned no-overlap and precedence constraints, that are not taken into account in assignment problems
So my question is, how to best model the above constraint in a constraint problem with the or-tools?
Thanks in advance!

One general technique for specifying patterns in constraint programming is the regular constraint (in Gecode, Choco, MiniZinc, among others, unsure of the status for or-tools), where patterns of variables are specified using finite automata (DFAs and NFAs) or regular expressions.
In your case, assuming that you have a sequence of variables representing what a certain driver does at each time-point, it is fairly straight-forward to specify an automaton that accepts any sequence of values that does not contain mora than four consecutive hours of driving. A sketch of such an automaton:
States:
Driving states Dn representing n time units driving (for some resolution of time units), up to n=4 hours.
Break states DnBm for a break of length m after n time units of driving, up to m=30 minutes.
Start state is D0.
Transitions:
Driving: When driving 1 unit of time, move from state Dn to D(n+1), and from a break shorter than 30 minutes from DnBm to D(n+1).
Break of 1 unit of time, move from DnBm to DnB(m+1), unless the 30 minutes break time has been reached, for which the transition goes back to D0.
Other actions handled mostly as self-loops, depending on desired semantics.
Of course, details will vary for your specific use-case.

Related

PDDL2.1: Purpose of `over all`

I'm working with PDDL2.1 durative-actions and I'm having difficulty understanding the purpose of over all.
I have a function charge_level which is updated with a value every 10Hz. In a durative-action move, I say condition: (over all (>= (charge_level) 12)).
I interpreted this as "while performing the action, verify that charge_level is greater than or equal to 12, otherwise, move fails and the planner should find a new action with the condition at start (< (charge_level) 12)". However, the planner does not seem to plan that way. I appreciate any clarity on this.
Thanks!

The semantics of the over all condition is indeed as #haz says in his answer (it prevents the planner from scheduling another action in parallel with your move action that would violate the over all condition), but what I think is confusing you is the difference between planning and plan execution. During plan execution, the (charge_level) may drop below 12 at any point unexpectedly due to malfunctioning battery, or faulty sensor, etc.. At such occasion, your plan execution should stop the move action (and therefore the whole plan) and re-plan. At that point, the planner could choose any action, which has a satisfied pre-condition in that new state. So not necessarily at start (< (charge_level) 12).
The PDDL durative action cannot be stopped or paused by the planner, while computing the plan. However, if you tell the planner, how is the (charge_level) changing over time, it could compute the longest possible duration of the move action, and then do something else e.g. recharge battery, before scheduling another instance of the move action into the same plan. In that approach, there are no failures involved, just reasoning about how long a given action could last in order to achieve the goal without violating any constraints, including the over all conditions.
If that is the behavior you want, you will need to model the (charge_level) as a continuously changing function. If you want to see an example, here is the power generator or the coffee machine. Here is a peek from the Generator domain:
The generator must not run out of fuel:
(over all (>= (fuel-level ?g) 0))
The fuel decreases by 1 unit every unit of time #t.
(decrease (fuel-level ?g) (* #t 1))
Given the initial (fuel-level), it is a simple calculation to figure out the maximum duration of the action. For that flexibility, you will need to specify the action duration unconstrained :duration (>= ?duration 0), as in the coffee machine domain.
Now, to be able to process such model including continuous numeric effects, you will need a planner that supports the :continuous-effects requirement, so for example OPTIC, or POPF.

If you just want to prevent the action from happening based on a condition, then you use at start. The over all is for conditions that must hold for the full duration of the action. So you could interpret your condition as, "for the entire duration of moving, never let the battery level go below 12".

Estimating WCET of a task on Linux

I want to approximate the Worst Case Execution Time (WCET) for a set of tasks on linux. Most professional tools are either expensive (1000s $), or don't support my processor architecture.
Since, I don't need a tight bound, my line of thought is that I :
disable frequency scaling
disbale unnecesary background services and tasks
set the program affinity to run on a specified core
run the program for 50,000 times with various inputs
Profiling it and storing the total number of cycles it had completed to
execute.
Given the largest clock cycle count and knowing the core frequency, I can get an estimate
Is this is a sound Practical approach?
Secondly, to account for interference from other tasks, I will run the whole task set (40) tasks in parallel with each randomly assigned a core and do the same thing for 50,000 times.
Once I get the estimate, a 10% safe margin will be added to account for unforseeble interference and untested path. This 10% margin has been suggested in the paper "Approximation of Worst Case Execution time in Preepmtive Multitasking Systems" by Corti, Brega and Gross

Some comments:
1) Even attempting to compute worst case bounds in this way means making assumptions that there aren't uncommon inputs that cause tasks to take much more or even much less time. An extreme example would be a bug that causes one of the tasks to go into an infinite loop, or that causes the whole thing to deadlock. You need something like a code review to establish that the time taken will always be pretty much the same, regardless of input.
2) It is possible that the input data does influence the time taken to some extent. Even if this isn't apparent to you, it could happen because of the details of the implementation of some library function that you call. So you need to run your tests on a representative selection of real life data.
3) When you have got your 50K test results, I would draw some sort of probability plot - see e.g. http://www.itl.nist.gov/div898/handbook/eda/section3/normprpl.htm and links off it. I would be looking for isolated points that show that in a few cases some runs were suspiciously slow or suspiciously fast, because the code review from (1) said there shouldn't be runs like this. I would also want to check that adding 10% to the maximum seen takes me a good distance away from the points I have plotted. You could also plot time taken against different parameters from the input data to check that there wasn't any pattern there.
4) If you want to try a very sophisticated approach, you could try fitting a statistical distribution to the values you have found - see e.g. https://en.wikipedia.org/wiki/Generalized_Pareto_distribution. But plotting the data and looking at it is probably the most important thing to do.

Jmeter - how to get higher randomize effect?

I need to simulate "real traffic" on Web farm, by other words I need to generate high peaks but as well periods which less or even no HTTP requests (hits) at all. Reason for that is to test some atomized mechanisms for adding and reducing CPU and memory for Web servers itself (that is another story). That is why I need "totally random" sceneries when I have loads but as well period with zero or less traffic (so I can add or reduce compute power).
This is situation that I get now, as you can see I always have some avg load its always around some number of hits, even if I change 10 to 100 threads. Values (results) will always have some average value. There are no periods with less or more traffic which would be separated be +10 mints or so, only by few seconds.
Current situation
I would like to get "higher" variations by HITS/REQUESTS with some time breaks between it.
Situation that I want: i.stack.imgur.com/I4LhU.png
I tried several timers but no success and I do not want to use "Ultimate Thread Group" and similar components because I want test to be totaly randome and not predefined with time breaks and pause periods (thread delays). I would like test which will be totally randomized by it self - which could for example generate from 1 to 100 users per XY time.
This is my current Jmeter setup: i.stack.imgur.com/I4LhU.png
I do not know if I am missing some parameter in current setup or there is totally another way to do this.
Thanks a lot!

If this is something you really want (I strongly believe that the test needs to be repeatable, not random), I would suggest using Constant Throughput Timer for this. Despite the word "Constant" in its name you can use a Function or a Variable there, for instance __Random() and you will get different controllable "spikes" each iteration.
Moreover, you put a __P() function and amend its value via Beanshell Server while the test is running

Why are niceness values inversely related to process priority?

The niceness of a process decreases with increasing process priority.
Extract from Beginning Linux Programming 4th Edition, Pg 169 :
The default priority is 0. Positive priorities are used for background
tasks that run when no other higher priority task is ready to run.
Negative priorities cause a program to run more frequently, taking a
larger share of the available CPU time. The range of valid priorities
is -20 to +20. This is often confusing because the higher the
numerical value, the lower the execution precedence.
Is there any special reason for negative values corresponding to higher process priority (as opposed to increasing priority for higher niceness valued processes) ?

#Ewald's answer is correct, as is confirmed by Jerry Peek et al. in Unix Power Tools (O'Reilly, 2007, p. 507):
This is why the nice number is usually called niceness: a job with a high niceness is very kind to the users of your system (i.e., it runs at low priority), while a job with little niceness hogs the CPU. The term "niceness" is awkward, like the priority system itself. Unfortunately, it's the only term that is both accurate (nice numbers are used to compute the priorities but are not the priorities themselves) and avoids horrible circumlocutions ("increasing the priority means lowering the priority...").
Nice has had this meaning since at least V6 Unix, but the V6 manual never explains this explicitly. The range of allowed values was -220 through +20, with negative numbers reserved for the superuser. The range was changed to -20 through +20 in V7.

Hysterical reasons - I mean historical... I'm pretty sure it started with numbers going up from 0 .. 20, and the lowest available was taken first. Then someone came to the conclusion that "Hmm, what if we need to make some MORE important" - well we have to go negative.
You want priority to be a sortable value, so if you start with "default is zero", you have to either make higher priority a higher number (but "priority 1" in daily speak is higher then "priority 2" - when your boss says "Make this your number 1 priority", it does mean it's important, right?). Being a computer, clearly priority 0 is higher than priority 1, and priority -1 is higher than priority 0.
In the end, it's an arbitrary choice. Maybe Ken Thomson, Dennis Ritchie or one of those guys will be able to say for sure why they choose just that sequence, and not 0..255, for example.

First of all the answer is a little bit long but it is only for clarification.
As in the linux kernel every conventional process may have the priorities which are called static priority are from 100(highest) to 139(lowest). so there are basically 40 priorities which could be assigned to the process.
so when any process is created it gets the priority of it's parent but if the user wants to change it's priority then it could be done with the help of nice(nice_value) system call.
& the reason for your question is that every process wants base time quantum which is used as how much time the process will get the CPU for its execution in milliseconds and this is calculated as
time={
if static_priority<120
(140-static_priority)*20
if static_priority>=120
(140-static_priority)*5
so The sys_nice( )service routine handles the nice( )system call. Although the nice_value may have any value, absolute values larger than 40 are trimmed down to 40. Traditionally, negative values correspond to requests for priority increments and require superuser privileges, while positive ones correspond to requests for priority decreases. In the case of a negative nice_value, the function
invokes the capable( ) function to verify whether the process has a CAP_SYS_NICE capability. Moreover, the function invokes the security_task_setnice( )security hook. so in the end the nice_value is used to calculate the static priority & then this static priority is used for calculation of base time quantum.
so it's clear that -ve values are used for increment the priority so needs super user access & +ve values are used for decrease the priority so no need of super user access.

Yes - it gets NICER as the number goes up and MEANER as the number goes down. So the process is seen as "friendlier" when it's not taking up all the resources and "nasty" as it gets greedier with resources.
Think of it as "nice" points - the nicer you are to others, the more points you have.

Progress bar and multiple threads, decoupling GUI and logic - which design pattern would be the best?

I'm looking for a design pattern that would fit my application design.
My application processes large amounts of data and produces some graphs.
Data processing (fetching from files, CPU intensive calculations) and graph operations (drawing, updating) are done in seperate threads.
Graph can be scrolled - in this case new data portions need to be processed.
Because there can be several series on a graph, multiple threads can be spawned (two threads per serie, one for dataset update and one for graph update).
I don't want to create multiple progress bars. Instead, I'd like to have single progress bar that inform about global progress. At the moment I can think of MVC and Observer/Observable, but it's a little bit blurry :) Maybe somebody could point me in a right direction, thanks.

I once spent the best part of a week trying to make a smooth, non-hiccupy progress bar over a very complex algorithm.
The algorithm had 6 different steps. Each step had timing characteristics that were seriously dependent on A) the underlying data being processed, not just the "amount" of data but also the "type" of data and B) 2 of the steps scaled extremely well with increasing number of cpus, 2 steps ran in 2 threads and 2 steps were effectively single-threaded.
The mix of data effectively had a much larger impact on execution time of each step than number of cores.
The solution that finally cracked it was really quite simple. I made 6 functions that analyzed the data set and tried to predict the actual run-time of each analysis step. The heuristic in each function analyzed both the data sets under analysis and the number of cpus. Based on run-time data from my own 4 core machine, each function basically returned the number of milliseconds it was expected to take, on my machine.
f1(..) + f2(..) + f3(..) + f4(..) + f5(..) + f6(..) = total runtime in milliseconds
Now given this information, you can effectively know what percentage of the total execution time each step is supposed to take. Now if you say step1 is supposed to take 40% of the execution time, you basically need to find out how to emit 40 1% events from that algorithm. Say the for-loop is processing 100,000 items, you could probably do:
for (int i = 0; i < numItems; i++){
if (i % (numItems / percentageOfTotalForThisStep) == 0) emitProgressEvent();
.. do the actual processing ..
}
This algorithm gave us a silky smooth progress bar that performed flawlessly. Your implementation technology can have different forms of scaling and features available in the progress bar, but the basic way of thinking about the problem is the same.
And yes, it did not really matter that the heuristic reference numbers were worked out on my machine - the only real problem is if you want to change the numbers when running on a different machine. But you still know the ratio (which is the only really important thing here), so you can see how your local hardware runs differently from the one I had.
Now the average SO reader may wonder why on earth someone would spend a week making a smooth progress bar. The feature was requested by the head salesman, and I believe he used it in sales meetings to get contracts. Money talks ;)

In situations with threads or asynchronous processes/tasks like this, I find it helpful to have an abstract type or object in the main thread that represents (and ideally encapsulates) each process. So, for each worker thread, there will presumably be an object (let's call it Operation) in the main thread to manage that worker, and obviously there will be some kind of list-like data structure to hold these Operations.
Where applicable, each Operation provides the start/stop methods for its worker, and in some cases - such as yours - numeric properties representing the progress and expected total time or work of that particular Operation's task. The units don't necessarily need to be time-based, if you know you'll be performing 6,230 calculations, you can just think of these properties as calculation counts. Furthermore, each task will need to have some way of updating its owning Operation of its current progress in whatever mechanism is appropriate (callbacks, closures, event dispatching, or whatever mechanism your programming language/threading framework provides).
So while your actual work is being performed off in separate threads, a corresponding Operation object in the "main" thread is continually being updated/notified of its worker's progress. The progress bar can update itself accordingly, mapping the total of the Operations' "expected" times to its total, and the total of the Operations' "progress" times to its current progress, in whatever way makes sense for your progress bar framework.
Obviously there's a ton of other considerations/work that needs be done in actually implementing this, but I hope this gives you the gist of it.

Multiple progress bars aren't such a bad idea, mind you. Or maybe a complex progress bar that shows several threads running (like download manager programs sometimes have). As long as the UI is intuitive, your users will appreciate the extra data.
When I try to answer such design questions I first try to look at similar or analogous problems in other application, and how they're solved. So I would suggest you do some research by considering other applications that display complex progress (like the download manager example) and try to adapt an existing solution to your application.
Sorry I can't offer more specific design, this is just general advice. :)

Stick with Observer/Observable for this kind of thing. Some object observes the various series processing threads and reports status by updating the summary bar.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string