Time-to-event analysis with missing data that could mean one event with a break in data or two separate events - survival-analysis

I am analyzing wolf pack "survival" data to determine what factors contribute to packs dissolving (i.e. "dying") versus surviving. I have annual presence/absence data for packs that sometimes has gaps which could mean the pack was monitored those years, or it could mean that the pack dissolved and a new pack formed a few years later. Are there any analysis methods that can account for these two possible scenarios in the data? Or is the only option for me to manually define when packs dissolved and when they were simply not monitored? An example of the data is presented below:
Here, pack #1 was present and observed every year. Pack #2 was not observed in years 4-6, which could mean either that it was not detected or that the pack dissolved in the third year and a new pack formed in that same area (hence receiving the same name as the original pack) in year 7. I would like to know if there is a special approach I could use to automatically differentiate between the two scenarios for pack #2 (I have several packs with data like this, and each case could be a different scenario). Any help is much appreciated!

Related

statistical test for an effect across groups, data are nested, not of normal distribution

What is the best statistical test for an effect across groups when the data are nested but may not be of normal distribution? I get a highly significant effect using Kruskall Wallis test, but using it there is no account to that the data points are from several locations, each contributed for several years, and in every year the data were pulled into age groups.
I think you can categorize the data by year, and change the data structure so that the data will be non-nested, making it easier to process. I agree that Kruskal-Wallis' test is a good choice of the cross-group effect test.

Adding weight ot variables in a line graph Tableau

I have a dataset consisting of calls going to agents (atually 10 of them) per day. These agents can either answer calls or transfer them to a call center. What we are interested in is whether each of these agents answers more calls than he transfers. In order to answer this, I have created a variable for each of these agents:
Answered/Transferred
I am using line graph to depict these variables per agent over time.
Now if this variable is less than 1 then this agent transferred more calls than he received. The problem now is that this is not a safe way to measure the overall impact of transferred calls. This is because the traffic pertaining to agents 1,2,3 is far greater than the one pertaining to agents 5,6,7 and so on. Therefore, I am trying to come up with a way to "weight" the variables I have created before. That is, somehow include the total number of calls reaching each agent (irrespectively of whether they are getting transferred or answered) in my calculations. That means that if an agent is getting 5 calls per day while another guy is getting 5.000 per day then I should find a way to depict this in my graphs.
Do you guys have any ideas?
Easiest would be to drag weight measure to colors and choose something like temperature diverging. Depending on your viz you can also drag weight measure to size, and for example, make bars or lines thicker to show there are more records there.

Is It Possible To Reference TFS Work Item Fields More Than Once Within The Same Work Item?

We are currently in the process of upgrading from TFS 2008 to TFS 2012. When TFS 2008 was set up, the people involved didn't understand a lot of what the work item fields were for, and we ended up with very heavily customised templates and in fact lost a lot of default fields. As part of the upgrade to 2012 we are trying to return to the out of the box templates as much as possible to ensure we get to use as many of the features as possible, however there are a small number of custom fields that we need to include for reporting purposes.
Our product development process involves a roadmap for upcoming releases which includes new work as well as bug fixes. When a bug is assigned to be worked on by the developers we would like to be able to choose which release we're targeting the fix for - as far as I can see, Iteration is best suited for this. At the point the bug is closed though, we would also like to track what release it was actually fixed in, since things often get bumped from one release to the next if higher priority bugs or change requests come in, but this is where we come unstuck since I can't seem to assign Iteration to both fields such that the two show different values.
If possible we would prefer not to have global lists that have to be constantly updated with release numbers across our product range (we have around 8 different products which are constantly in development, each with their own release numbers), and leaving one of them as a text field leaves open the possibility that we will get inconsistencies in what people enter, eg 1.01 versus 1.1 which will show up in reporting as 2 different releases. As the fields are just looking up a set of values in the background, is there no way that the iteration list can be used twice? Or does someone have an alternative suggestion as to how we get round this?
What I think I'd suggest in this case is using a COPY rule on a state change event, so that when you move your work item into the Closed state, it would populate your custom field with the value currently in your Iteration field.
This would give you a snapshot of the value at the right point in time which then wouldn't be altered if the iteration was later changed, along with a history entry if it was opened & closed multiple times over its lifetime.
As iteration is time limited and release is perpetual there is an inherent mismatch of purpose with using iteration here. Iteration is for planning.
You would be better creating a release list with the version that you release.
If you are sprinting for example you may not know up front which release you will end up on before you start. If you are not sprinting then you are just kidding yourself that your know.

Dynamics CRM 2011 Import Data Duplication Rules

I have a requirement in which I need to import data from excel (CSV) to Dynamics CRM regularly.
Instead of using some simple Data Duplication Rules, I need to implement a point system to determine whether a data is considered duplicate or not.
Let me give an example. For example these are the particular rules for Import:
First Name, exact match, 10 pts
Last Name, exact match, 15 pts
Email, exact match, 20 pts
Mobile Phone, exact match, 5 pts
And then the Threshold value => 19 pts
Now, if a record have First Name and Last Name matched with an old record in the entity, the points will be 25 pts, which is higher than the threshold (19 pts), therefore the data is considered as Duplicate
If, for example, the particular record only have same First Name and Mobile Phone, the points will be 15 pts, which is lower than the threshold and thus considered as Non-Duplicate
What is the best approach to achieve this requirement? Is it possible to utilize the default functionality of Import Data in the MS CRM? Is there any 3rd party Add-on that answer my requirement above?
Thank you for all the help.
Updated
Hi Konrad, thank you for your suggestions, let me elaborate here:
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Nice one but I don't think it is really workable in my case, the data will be coming regularly from client in moderate numbers (hundreds to thousands). Typically client won't check about the duplication on the data.
Workflow. Run a process removing any instance calculated as a duplicate.
Workflow is a good idea, however since it is being processed asynchronously, my concern is the user in some cases may already do some update/changes to the data inserted, before the workflow finish working.. therefore creating some data inconsistency or at the very least confusing user experience
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
I like this approach. So I just import like usual (for example, to contact entity), but I already have a plugin in place that getting triggered every time a record is created, the plugin will check whether the record is duplicat-ish or not and took necessary action.
I haven't been fiddling a lot with duplicate detection but looking at your criteria you might be able to make rules that match those, pretty much three rules to cover your cases, full name match, last name and mobile phone match and email match.
If you want to do the points system I haven't seen any out of the box components that solve this, however CRM Extensions have a product called Import Manager that might have that kind of duplicate detection. They claim to have customized duplicate checking. Might be worth asking them about this.
Otherwise it's custom coding that will solve this problem.
I can think of the following approaches to the task (depending on the number of records, repetitiveness of the import, automatization requirement etc.) they may be all good somehow. Would you care to elaborate on the current conditions?
Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.
Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).
Workflow. Run a process removing any instance calculated as a duplicate.
You also need to consider the implication of such elimination of data. There's a mathematical issue. Suppose that the uniqueness' radius (i.e. the threshold in this 1D case) is 3. Consider the following set of numbers (it's listed twice, just in different order).
1 3 5 7 -> 1 _ 5 _
3 1 5 7 -> _ 3 _ 7
Are you sure that's the intended result? Under some circumstances, you can even end up with sets of records of different sizes (only depending on the order). I'm a bit curious on why and how the setup came up.
Personally, I'd go with plugin, if the above is OK by you. If you need to make sure that some of the unique-ish elements never get omitted, you'd probably best of applying a test algorithm to a backup of the data. However, that may defeat it's purpose.
In fact, it sounds so interesting that I might create the solution for you (just to show it can be done) and blog about it. What's the dead-line?

Limit data to be generated in cognos cube generation by year

Ive been working with cognos from a year now but I stumble into another problem that needs urgent workaround resolution. I have this model that automatically generates cube (using a batch file) every day. the generation is successful although the main issue is the time it consumes to generate the cube. Although it is somehow acceptable considering the data involved (data includes historical transactions from about 7 years ago). Now the main idea we came up is to limit or somehow include the data from previous years in the model and be remained untouched in the everyday cube generation (since there are no expected changed int these data). Only data from current year should be processed and these will be added with the historical data. So far I have tried using manipulating iqd but the results do not include the historical data in the generated cube. Also I am using both 7.4 and later version ibm cognos 10 so I wish you could give me ideas on how to work on with this on both versions.
What you're looking for is time based parititioning. Search on transformer time based partition in Google. For example this is for 8.4 but you should be able to find doco on v10.
http://publib.boulder.ibm.com/infocenter/c8bi/v8r4m0/index.jsp?topic=/com.ibm.swg.im.cognos.ug_cogtr.8.4.0.doc/ug_cogtr_id7761DefiningTBPCube.html
This allows you to build your prior years once, which will exist as multiple individual MDC files. You then only regularly build your current year cube, which refreshes just one MDC file. These are all transparently sown together in a master dummy MDC and VCD file.
Apparently v10 has some build optimisations so I would do this in v10 rather than v7 if possible

Resources