Continuous-time finite-horizon MDP - dynamic-programming

Is there any algorithm for solving a finite-horizon semi-Markov-Decision-Process?
I want to find the optimal policy for a sequential decision problem with a finite action space, a finite state space, and a deadline. Critically, different actions take different amounts of time and for one of the actions this duration is stochastic. I can model time as being discrete or continuous depending on which methods are available.
I am aware of algorithms for discounted infinite-horizon semi-MDPs, but I cannot find any work on finite-horizon semi-MDPs. Has this class of problems been studied before?

As with almost any MDP, backward dynamic programming should work. You could discretize your finite horizon in small steps from 0 to the deadline and then recursively update the values starting from the deadline. In the state space you'll have to track the current action, the total time spend on that action, and the already completed actions. The number of possible states may be quite large.
In the dynamic program you can maybe exploit that you can select the value function for the state at the time the action is completed.

Related

Traveling Salesman Alternate - How would one code it if the cities were all the same distances from each other?

First time asking question, apologies if incorrect.
What would be the best way to approach this problem (Similar to travelling salesman, but I'm not sure if it runs into the same issues).
You have a list of "tasks" at certain locations (Cities) and a group of "people" that can complete those tasks (Salesmen). This is structured over a day, where some tasks may need to be completed before a specific time and may require specific "tools" (Set number available). The difference is that the length between each location is the same in all circumstances, but they all have to return to the start. Therefore, rather than trying to minimise the distance travelled, instead you want to maximise the time each salesmen spends moving and stays at the initial staring node. This also gives you pre-defined requirements.
The program doesn't need to find an optimal solution, just an acceptable one (Greater than a certain value.) Would you just bash out each case? If so, what would be the best language to use for bashing out the solutions?
Thanks
EDIT - Just to confirm, the pre-requisite where all the cities are the same distance from each other is just for simplification of the problem, not reflective of real life.

Procedural modelling classical Chinese visions of political order

The problem I’m dealing with at the moment involves a system described in the Guanzi. A large section of the book is about how governments should work to extract a surplus from the economy which they can redistribute to ensure the loyalty of existing followers and gain new ones. Under this system, whoever can redistribute the most wealth becomes the overall leader. However, he also has to out-compete the other individuals in the system: they are all busy trying to establish their own redistribution networks.
The result is a series of pyramid-shaped redistribution networks, both independent and nested.
Simplified visual representation of the expected outcome
These are dynamic across time and space. Gaining resources lets you acquire more followers, which in turn gives you access to more resources. There is also a random component involved: a bad harvest or a war may wipe out your resources. If one leader runs out of resources (whether as a result of a disaster or because he redistributed them too generously among his followers), he will either be supplanted by a follower or his network will collapse and its members leave to join other networks.
I think it is possible to model this algorithmically.
We can assume that willingness to share resources is innate.
Generosity = propensity score
An individual acquires followers as a function of both the surplus resources he possesses and his willingness to share them.
Followers[tn] = Surplus[t-1] * Generosity
It is worth noting that growth is endogenous in this model. It is a product of whatever economic growth coefficient is deemed realistic given technology and natural resources (a), as well as of the previous cycle’s surplus and the number of followers an individual has, on the basis that these constitute factors of production. (Note: I'm not interested in getting actual monetary values out of this, just modelling the relationships. I understand that if you plugged real numbers into it people would end up redistributing more than they own.)
Growth = a (Surplus[t-1] * Followers[t-1])
At T=0 the surplus enjoyed by each individual in the system must be generated randomly.
Surplus[t0] = randomly generated number
Followers generate additional resources for their leader, but they also need to be remunerated, meaning that they simultaneously deplete their leader’s resources, proportional to his generosity propensity score. A random component must also be included, as mentioned above, to account for famines, bumper crops, wars etc.
Surplus[tn] = Random Component (Surplus[t-1] + Growth) – (Followers[t-1] * Generosity)
Once these relationships have been defined, then the algorithm is relatively simple:
T1:
Each individual checks the Surplus*Generosity score of the nearest individual who is not already following him. If Individual A’s SG > Individual B’s SG, then Individual B moves closer to Individual A and becomes his follower. (Note: If individual B has followers of his own, he carries them with him. Also: Followers automatically re-check their leader's SG in every round, since he is the closest individual to them. They will leave his network to become free agents once more if his SG drops below their own.)
Otherwise, he does nothing.
T2 :
Each individual’s stats (Followers, Surplus) are recalculated based on the new situation.
Step 1 is repeated.
T3 :
Repeat previous step
One would expect the individuals with the optimal generosity score to build the biggest networks, as they acquire followers without completely depleting their resources.
I suspect – but am not sure – that this model’s characteristics are similar to those of an L-system model.
Individuals are programmed with a simple instruction: “If the person closest to you has a higher S*G score than you do, approach and follow him.”
On the basis of this the individuals form structures (from the perspective of the individual with the optimal S*G score, they appear to cluster around him in a semi-structured way)
These structures grow with every successive time period
They collapse after depleting their own resources, or when a random disaster strikes.
After a collapse, the process automatically begins again.
However, I'm not a maths or a computing guy (I'm a Chinese philosophy guy) so I'm not sure if I'm just being fooled by a superficial resemblance or not. Is this a genuine example of string rewriting or am I just convincing myself it is because you get tree-like structures out of it? Is this even a model that can work at all? Have I totally messed up my equations? (I haven't done this since high school, so it's highly probable.)
All help is gratefully received.

Temporal logic for modelling events at discrete points in time causing states/changes over a period of time

I am looking for an appropriate formalism (i.e. a temporal logic) to model the following kind of situation
There can be events happening at discrete events in time (subject to conditions to be detailed below).
There is state. This state cannot be expressed by a fixed number of variables. However, it is possible to express it with a linear list/array, where each entry consists of a finite number of variables.
Before any events have happened, the state is fixed.
At any point in time, events are possible. They have a fixed structure (with a few variables). The possible events are constrained by the current state.
Events will cause an immediate change of the state.
Events can also cause continuous state changes. For example, a variable (of one of the entries of the array mentioned above) changes its value from 0 to 1 over some time (either immediately or after a specified delay).
It should also be possible to specify discrete points in time in the form "the earliest point in time after event E where some condition C holds", and to start a continuos state change at such a point.
Is there an existing temporal logic to model something like this?
It should also be possible to express desired conditions, like the following:
Referring to a certain point in time: The sum of a specific variables of all the entries of the array may not exceed a certain threshold.
Referring to change over time: For all possible time intervals, the value of a certain variable (again, from each entry of said array) [realistically, rather of some arithmetic expression computed for each entry] must not change faster than a given threshold.
There should exist a model checker that can check whether for all possible scenarios, all the conditions are met. If this is not the case, it should print one possible scenario and tell me which condition is not met. In other words, it should distinguish between conditions describing the possible scenarios, and conditions that have have to be fulfilled in those scenarios, and not just tell me "not possible".
You need a model checker with more flexible language. Technically speaking model checking of systems of infinite state space is open research problem and in general case algorithmically undecidable. The temporal logic is more typically related to propreties under the question.
Considering limited info you shared about your project, why do not you try Spin/Promela it is loosely inspired by C and has 'buffers' which can be considered to be arrays. At the least you might be able to simulate your system?

Given measurements from a event series as input, how do I generate an infinite input series with the same profile?

I'm currently working with a system that makes scheduling decisions based on a series of requests and the state of the system.
I would like to take the stream of real inputs, mock out some of the components, and run simulations against the rest. The idea is to use it for planning with respect to system capacity (i.e. when to scale certain components), tracking down certain failure modes, and analyzing the effects of changes to the codebase (i.e. simulations with version A compared to simulations with version B).
I can do everything related to this, except generate a suitable input stream. Replaying the exact input from production hasn't been very helpful because it's hard to get a long enough data stream to tease out some of the behavior that I'm trying to find. In other words, if production falls over at 300 days of input, I don't have enough data to find out until after it fell over. Repeating the same input set has been considered; but after a few initial tries, the developers all agree that the simulation seems to "need more random".
About this particular system:
The input is a series of irregularly spaced events (i.e. a stochastic process with discrete time and continuous state space).
Properties are not independent of each other.
Even the more independent of the properties are composites of other properties that will always be, by nature, invisible to me (leading to a multi-modal distribution).
Request interval is not independent of other properties (i.e. lots of requests for small amounts of resources come through in a batch, large requests don't).
There are feedback loops in it.
It's provably chaotic.
So:
Given a stream of input events with a certain distribution of various properties (including interval), how do I generate an infinite stream of events with the same distribution across a number of non-independent properties?
Having looked around, I think I need to do a Markov-Chain Monte-Carlo Simulation. My problem is figuring out how to build the Markov-Chain from the existing input data.
Maybe it is possible to model the input with a Copula. There are tools that help you doing so, e.g. see this paper. Apart from this, I would suggest to move the question to http://stats.stackexchange.com, as this is a statistical problem and will likely draw more attention over there.

How to predict when next event occurs based on previous events? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Basically, I have a reasonably large list (a year's worth of data) of times that a single discrete event occurred (for my current project, a list of times that someone printed something). Based on this list, I would like to construct a statistical model of some sort that will predict the most likely time for the next event (the next print job) given all of the previous event times.
I've already read this, but the responses don't exactly help out with what I have in mind for my project. I did some additional research and found that a Hidden Markov Model would likely allow me to do so accurately, but I can't find a link on how to generate a Hidden Markov Model using just a list of times. I also found that using a Kalman filter on the list may be useful but basically, I'd like to get some more information about it from someone who's actually used them and knows their limitations and requirements before just trying something and hoping it works.
Thanks a bunch!
EDIT: So by Amit's suggestion in the comments, I also posted this to the Statistics StackExchange, CrossValidated. If you do know what I should do, please post either here or there
I'll admit it, I'm not a statistics kind of guy. But I've run into these kind of problems before. Really what we're talking about here is that you have some observed, discrete events and you want to figure out how likely it is you'll see them occur at any given point in time. The issue you've got is that you want to take discrete data and make continuous data out of it.
The term that comes to mind is density estimation. Specifically kernel density estimation. You can get some of the effects of kernel density estimation by simple binning (e.g. count the number events in a time interval such as every quarter hour or hour.) Kernel density estimation just has some nicer statistical properties than simple binning. (The produced data is often 'smoother'.)
That only takes care of one of your problems, though. The next problem is still the far more interesting one -- how do you take a time line of data (in this case, only printer data) and produced a prediction from it? First thing's first -- the way you've set up the problem may not be what you're looking for. While the miracle idea of having a limited source of data and predicting the next step of that source sounds attractive, it's far more practical to integrate more data sources to create an actual prediction. (e.g. maybe the printers get hit hard just after there's a lot of phone activity -- something that can be very hard to predict in some companies) The Netflix Challenge is a rather potent example of this point.
Of course, the problem with more data sources is that there's extra legwork to set up the systems that collect the data then.
Honestly, I'd consider this a domain-specific problem and take two approaches: Find time-independent patterns, and find time-dependent patterns.
An example time-dependent pattern would be that every week day at 4:30 Suzy prints out her end of the day report. This happens at specific times every day of the week. This kind of thing is easy to detect with fixed intervals. (Every day, every week day, every weekend day, every Tuesday, every 1st of the month, etc...) This is extremely simple to detect with predetermined intervals -- just create a curve of the estimated probability density function that's one week long and go back in time and average the curves (possibly a weighted average via a windowing function for better predictions).
If you want to get more sophisticated, find a way to automate the detection of such intervals. (Likely the data wouldn't be so overwhelming that you could just brute force this.)
An example time-independent pattern is that every time Mike in accounting prints out an invoice list sheet, he goes over to Johnathan who prints out a rather large batch of complete invoice reports a few hours later. This kind of thing is harder to detect because it's more free form. I recommend looking at various intervals of time (e.g. 30 seconds, 40 seconds, 50 seconds, 1 minute, 1.2 minutes, 1.5 minutes, 1.7 minutes, 2 minutes, 3 minutes, .... 1 hour, 2 hours, 3 hours, ....) and subsampling them via in a nice way (e.g. Lanczos resampling) to create a vector. Then use a vector-quantization style algorithm to categorize the "interesting" patterns. You'll need to think carefully about how you'll deal with certainty of the categories, though -- if your a resulting category has very little data in it, it probably isn't reliable. (Some vector quantization algorithms are better at this than others.)
Then, to create a prediction as to the likelihood of printing something in the future, look up the most recent activity intervals (30 seconds, 40 seconds, 50 seconds, 1 minute, and all the other intervals) via vector quantization and weight the outcomes based on their certainty to create a weighted average of predictions.
You'll want to find a good way to measure certainty of the time-dependent and time-independent outputs to create a final estimate.
This sort of thing is typical of predictive data compression schemes. I recommend you take a look at PAQ since it's got a lot of the concepts I've gone over here and can provide some very interesting insight. The source code is even available along with excellent documentation on the algorithms used.
You may want to take an entirely different approach from vector quantization and discretize the data and use something more like a PPM scheme. It can be very much simpler to implement and still effective.
I don't know what the time frame or scope of this project is, but this sort of thing can always be taken to the N-th degree. If it's got a deadline, I'd like to emphasize that you worry about getting something working first, and then make it work well. Something not optimal is better than nothing.
This kind of project is cool. This kind of project can get you a job if you wrap it up right. I'd recommend you do take your time, do it right, and post it up as function, open source, useful software. I highly recommend open source since you'll want to make a community that can contribute data source providers in more environments that you have access to, will to support, or time to support.
Best of luck!
I really don't see how a Markov model would be useful here. Markov models are typically employed when the event you're predicting is dependent on previous events. The canonical example, of course, is text, where a good Markov model can do a surprisingly good job of guessing what the next character or word will be.
But is there a pattern to when a user might print the next thing? That is, do you see a regular pattern of time between jobs? If so, then a Markov model will work. If not, then the Markov model will be a random guess.
In how to model it, think of the different time periods between jobs as letters in an alphabet. In fact, you could assign each time period a letter, something like:
A - 1 to 2 minutes
B - 2 to 5 minutes
C - 5 to 10 minutes
etc.
Then, go through the data and assign a letter to each time period between print jobs. When you're done, you have a text representation of your data, and that you can run through any of the Markov examples that do text prediction.
If you have an actual model that you think might be relevant for the problem domain, you should apply it. For example, it is likely that there are patterns related to day of week, time of day, and possibly date (holidays would presumably show lower usage).
Most raw statistical modelling techniques based on examining (say) time between adjacent events would have difficulty capturing these underlying influences.
I would build a statistical model for each of those known events (day of week, etc), and use that to predict future occurrences.
I think the predictive neural network would be a good approach for this task.
http://en.wikipedia.org/wiki/Predictive_analytics#Neural_networks
This method is also used for predicting f.x. weather forecasting, stock marked, sun spots.
There's a tutorial here if you want to know more about how it works.
http://www.obitko.com/tutorials/neural-network-prediction/
Think of a markov chain like a graph with vertex connect to each other with a weight or distance. Moving around this graph would eat up the sum of the weights or distance you travel. Here is an example with text generation: http://phpir.com/text-generation.
A Kalman filter is used to track a state vector, generally with continuous (or at least discretized continuous) dynamics. This is sort of the polar opposite of sporadic, discrete events, so unless you have an underlying model that includes this kind of state vector (and is either linear or almost linear), you probably don't want a Kalman filter.
It sounds like you don't have an underlying model, and are fishing around for one: you've got a nail, and are going through the toolbox trying out files, screwdrivers, and tape measures 8^)
My best advice: first, use what you know about the problem to build the model; then figure out how to solve the problem, based on the model.

Resources