Weighted Interval Scheduling Variant with "multitasking" - dynamic-programming

I have a variant of weighted interval scheduling that I couldn't find anything about: the inputs are intervals in which people are present in target area, their importance (the "weight"), and permitted amount of visits to the target area. The output should be the times to visit the target area to meet people of most importance (visit is a single point in time, you meet all people present at the time of visit, if someone is present at multiple visits, they are counted only once).
I tried solving this recursively without any memoization, and it works, but I am sure there must be a better way. Also, does this type of problem have its own name?

Related

Traveling Salesman Alternate - How would one code it if the cities were all the same distances from each other?

First time asking question, apologies if incorrect.
What would be the best way to approach this problem (Similar to travelling salesman, but I'm not sure if it runs into the same issues).
You have a list of "tasks" at certain locations (Cities) and a group of "people" that can complete those tasks (Salesmen). This is structured over a day, where some tasks may need to be completed before a specific time and may require specific "tools" (Set number available). The difference is that the length between each location is the same in all circumstances, but they all have to return to the start. Therefore, rather than trying to minimise the distance travelled, instead you want to maximise the time each salesmen spends moving and stays at the initial staring node. This also gives you pre-defined requirements.
The program doesn't need to find an optimal solution, just an acceptable one (Greater than a certain value.) Would you just bash out each case? If so, what would be the best language to use for bashing out the solutions?
Thanks
EDIT - Just to confirm, the pre-requisite where all the cities are the same distance from each other is just for simplification of the problem, not reflective of real life.

Why can't I target the complement of my goal in Optimizely?

Optimizely's Sample Size calculator shows, that a higher baseline conversion rate leads to a smaller required sample size for an A/B-test. So, instead of maximizing my conversion goal, I'd like to minimize the opposite, i.e. not reaching the goal.
For every goal with a conversion rate less than 50%, its complement would be higher than 50% and would thus require a smaller sample size if targeted.
An example: instead of measuring all users that visit payment-success.html, I'd rather measure all users that don't visit it, and try minimizing that. Which would usually require a lot smaller sample size if my reasoning is correct!
Optimizely only lets me target pageviews as goals, not not-pageviewing.
I realize I'm probably missing or misunderstanding something important here, but if so, what is it?
Statistically there's nothing wrong with your approach, but unfortunately it won't have the desired effect of lowering the duration.
While you'll reduce the margin of error, you'll proportionately decrease the lift, causing you to take the same amount of time to reach confidence.
Since the lift is calculated as a percentage of the baseline conversion rate, the same change in conversion rate of a larger baseline will produce a smaller lift.
Say your real conversion rate is 10% and the test winds up increasing it to 12%. The inverse conversion rate would be 90% which gets lowered to 88%. In both cases it's a change of 2%, but 2% is a much greater change to 10% (it's a 20% lift) than it is to 90% (only a -2.22% lift).
Practically, you run a much larger risk of incorrectly bucketing people into the goal with the inverse. You know that someone who hits the success page should be counted toward the goal. I'm pretty sure what you're suggesting would cause every pageview that wasn't on the success page after the user saw the experiment to count as a goal.
Say you're testing the home page. Person A and B both land on the home page and view the experiment.
Person A visits 1 other pages and leaves
Person B visits 1 other pages and buys something
If your goal was setup on the success page, only person B would trigger the goal. If the goal was setup on all other pages, both people would trigger the goal. That's obviously not the inverse.
In fact, if there are any pages necessary to reach the success page after the user first sees the experiment (so unless you're testing the final step of checkout), everyone will trigger the inverse pageview goal (whether they hit the success page or not).
Optimizely pageview goals aren't just for pages included in the URL Targeting of your experiment. They're counted for anyone who's seen the experiment and at any point afterward hit that page.
Just to answer whether this is possible (not addressing whether your setup will result in the same outcome), you're right that Optimizely pageview goal doesn't allow for not, but you can probably use the Regex match type to achieve what you want (see 'URL Match Type' in point 3 here). In this case it would look like this, taken from this answer here (which also explains the complexity involved with not matching in Regex, suggesting why Optimizely hasn't built not pageviews into the product).
^((?!payment-success\.html).)*$
Hopefully that helps you get to where you want.

How to deal with violation of proportional hazards assumption in Cox PH, R 3.1.3 survfit

I'm performing survival analysis in R using the 'survival' package and coxph. My goal is to compare survival between individuals with different chronic diseases. My data are structured like this:
id, time, event, disease, age.at.dx
1, 342, 0, A, 8247
2, 2684, 1, B, 3879
3, 7634, 1, A, 3847
where 'time' is the number of days from diagnosis to event, 'event' is 1 if the subject died, 0 if censored, 'disease' is a factor with 8 levels, and 'age.at.dx' is the age in days when the subject was first diagnosed. I am new to using survival analysis. Looking at the cox.zph output for a model like this:
combi.age<-coxph(Surv(time,event)~disease+age.at.dx,data=combi)
Two of the disease levels violate the PH assumption, having p-values <0.05. Plotting the Schoenfeld residuals over time shows that for one disease the hazard falls steadily over time, and with the second, the line is predominantly parallel, but with a small upswing at the extreme left of the graph.
My question is how to deal with these disease levels? I'm aware from my reading that I should attempt to add a time interaction to the disease whose hazard drops steadily, but I'm unsure how to do this, given that most examples of coxph I've come across only compare two groups, whereas I am comparing 8. Also, can I safely ignore the assumption violation of the disease level with the high hazard at early time points?
I wonder whether this is an inappropriate way to structure my data, because it does not preclude a single individual appearing multiple times in the data - is this a problem?
Thanks for any help, please let me know if more information is needed to answer these questions.
I'd say you have a fairly good understanding of the data already and should present what you found. This sounds like a descriptive study rather than one where you will be presenting to the FDA with a request to honor your p-values. Since your audience will (or should) be expecting that the time-course of risk for different diseases will be heterogeneous, I'd think you can just describe these results and talk about the biological/medical reasons why the first "non-conformist" disease becomes less important with time and the other non-conforming condition might become more potent over time. You already done a more thorough analysis than most descriptive articles in the medical literature exhibit. I rarely see description of the nature of non-proportionality.
The last question regarding data "does not preclude a single individual appearing multiple times in the data" may require some more thorough discussion. The first approach would be to stratify by patient ID with the cluster()-function.

Crowdsourcing reliability measurements - spam/fraud detection

I'd like to collect some kind of geographical information from website users - for given set of data they will mark checkbox indicating whether place has or has not given property. Are there any tools/frameworks for detecting fraud or spam submissions based on whole colected data set (and possibly other info)? I'd like to get filtered, more reliable data.
Not sure if that's exactly what you're asking for, but here are some tips from my experience using Amazon Turk:
There are several academic papers dealing with such problems. here is a good one.
In addition, based on the following general recommendations, I've created a custom procedure which worked on my data:
a. Include an open question, and filter out cases where it wasn't answered. It's harder to answer such a question automatically, and it might also be more time-consuming, thus less attractive, for a fraudster.
b. If possible, don't use a binary scale (i.e. a checkbox), but some grade (e.g. 1-4 or 1-6). This would give you more data to work with.
c. If available, filter out cases where the time spent in filling your form was too short. (especially useful if you include that open question)
d. If you have multiplicity of inputs per user, check for repetitive answers, and for users which consistently give far-from-average answers.
If each user submits only a single "form", consider putting more than a single element/question in it, so you'll get multiple submissions per-user.
e. If you have only a single submission per user or user-id, your options are more limited. I can suggest filtering out outliars, (e.g. data points farther than 3 standard deviations from the average), in case you have enough data.
f. After all the filtering, check the agreement or disagreement in your data (e.g. by checking what proportion of your data points fall within x standard deviations from the average). In case of agreement, use the average; in case of disagreement, collect some more data.
Hope it helps,

How to change to use Story Points for estimations in Scrum [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Having used "days" as the unit for estimation of tasks in Scrum I find it hard to change to using Story Points. I believe story points should be used as they are more comparable to each other - being less dependent on the qualifications of whoever addresses the task etc. However, it isn't easy to make a team start using Story Points when they're used to estimating in days.
So, how to make a team change to Story Points? What should motivate the team members to do so, and how should we apply the switch?
When I switched to points, I decided to it only if I could meet the two following points; 1) find and argument that justify the switch and that will convince the team 2) Find an easy method to use it.
Convincing
It took me a lot of reading on the subject but a finally found the argument that convinced me and my team: It’s nearly impossible to find two programmers that will agree on the time a task will take but the same two programmers will almost always agree on which task is the biggest when shown two different tasks.
This is the only skill you need to ‘estimate’ your backlog. Here I use the word ‘estimate’ but at this early stage it’s more like ordering the backlog from tough to easy.
Putting Points in the Backlog
This step is done with the participation of the entire scrum team.
Start dropping the stories one by one in a new spreadsheet while keeping the following order: the biggest story at the top and the smallest at the bottom. Do that until all the stories are in the list.
Now it’s time to put points on those stories. Personally I use the Poker Planning Scale (1/2,1,2,3,5,8,13,20,40,100) so this is what I will use for this example. At the bottom of that list you’ll probably have micro tasks (things that takes 4 hours or less to do). Give to every micro tasks the value of 1/2. Then continue up the list by giving the value 1 (next in the scale) to the stories until it is clear that a story is much bigger (2 instead of 1, so twice bigger). Now using the value '2', continue up the list until you find a story that should clearly have a 3 instead of a 2. Continue this process all the way to the top of the list.
NOTE: Try to keep the vast majority of the points between 1 and 13. The first time you might have a bunch of big stories (20, 40 and 100) and you’ll have to brake them down into chunks smaller or equal to 13.
That is it for the points and the original backlog. If you ever have a new story, compare it to that list to see where it fits (bigger/smaller process) and give it the value of its neighbors.
Velocity & Estimation
To estimate how long it will take you to go through that backlog, do the first sprint planning. Make the total of the points attached to the stories the teams picked and VOILA!, that’s your first velocity measure. You can then divide the total of points in the backlog by that velocity, to know how many sprints will be needed.
That velocity will change and settle in the first 2-3 sprints so it's always good to keep an eye on that value
If you want to change to using story points instead of duration, you just got to start estimating in story points. (I'm assuming here you have the authority to make that decision for your team.)
Pick a scale, could be small, medium, large could be fibonacci sequence, could be 1 to 5, whatever pick one and use it for several sprints this will give you your velocity. If you start changing the scale from one to the other then velocity between scales is not going to be comparable (ie dont do it). These estimates should involve all your Scrum team.
Having said that you still need an idea of how much this is going to cost you. There arent many accountants who will accept the answer "I'll tell you how much this is going to cost in 6 months". So you still need to estimate the project in duration as well, this will give you the cost. This estimate is probably going to be done by a senior person on the team
Then every month your velocity will tell you and the accountants how accurate that first cost estimate was and you can adapt accordingly.
Start by making one day equal one point (or some strict ratio). It is a good way to get started. After a couple of sprints you can start encouraging them to use more relative points (ie. how big is this compared to that thing).
The problem is that story points define effort.
Days are duration.
The two have an almost random relationship. duration = f ( effort ). That function is based on the skill of the person actually doing the work.
A person knows how long they will take to do the work. That's duration. In days.
They don't know this abstract 'effort' thing. They don't know how long a hypothetical person of average skills will require to do it.
The best you can do is both -- story points (effort) and days (duration).
You can't replace one with the other. If you try to use only effort, then you'll eventually need to get to days for planning purposes. You'll have to apply a person to the story points and compute a duration from the effort.

Resources