Wednesday, September 16, 2009
survival II
a good survival analysis model is supposed to identify the potential candidates for any event and predict their event period correctly. say a model does all of that. then is the job done? there are hidden dangers in application of survival analysis to any data. for even when successful, it might identify people who dont want to know about or can do nothing about. For instance your top decile of people for whom the event is most likely to occur might contain people beyond any help in a disease, half broken machines, casual shoppers. No one is more likely than candidates in these groups to undergo the event, but is their identification useful? note that this question is more relevant when the event in question is negative and a treatment is appropriate for it to be countered. treatments cost money and it is better if the money is well spent. perhaps this is the reason why it is a general feeling that anti attrition policies do not work. maybe we are not targeting the right people. maybe people who get the highest scores are people who are over the edge for whatever reason, and its better if we target a group that is less likely to attrite but more likely to respond to an anti attrition offer. so how to decide who to target? the first decile, the second decile or the third, or should one go down lower? a natural answer would be to do a profiling and leave out those who fit the profile of most likely candidates for the event. each domain has some amount of past knowledge and this shouldnt be too difficult. however, our focus in this blog is to find abstract answers purely on the basis of logic and thought experiments. in other words, we want to see how far we can go with existing information before we allow new information to come in.
Tuesday, September 15, 2009
survivalI
just back from doing some survival analysis. the model produces good predictions having been fit a lognormal distribution. however, the timing predictions are typically off by one period. this would have been pretty good for any other kind of prediction I think. but not for timing. we dont want to give the treatment one peirod off. maybe its better to produce the probability of an event occurring in an interval rather than at a point. or maybe make the periods broader, which amounts to the same thing. what about producing the median failure time in each period? something like 50% of those surviving the 2nd period are likely to fail within the 8th period.
the chances of something happening or not is modelled using logistic regression. the chances of something happening in a particuler period is modelled by studying the distribution of occurences over time. what if we model this as a evolving bernoulli distribution over periods: a stochastic process?
Agenda:
1. read kalbfleisch and prentice
2. read other approaches to survival analysis like the book on viewing survival analysis as a case of stochastic processes
3. study what fader etc has done on this
Subscribe to:
Posts (Atom)