Unit 3: Nonparametric Estimation Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. Ramón V. León 9/3/2009 Stat 567: Unit 3 - Ramón V. León 1 Unit 3 Objectives Show the use of the binomial distribution to estimate F(t) from interval and singly right censored data, without assumptions on F(t). This is called nonparametric estimation Explain and illustrate how to compute standard error for F ˆ () t and approximate confidence intervals for F(t) Show how to extend nonparametric estimation to allow for multiply right-censored data Illustrate the Kaplan-Meier nonparametric estimator for data with observations reported as exact failures Describe and illustrate a generalization that provides a nonparametric estimator of F(t) with arbitrary censoring 9/3/2009 Stat 567: Unit 3 - Ramón V. León 2 1
Data for Plant 1 of the Heat Exchanger Tube Crack Data 9/3/2009 Stat 567: Unit 3 - Ramón V. León 3 A Nonparametric Estimator of F(t i ) Based on Binomial Theory for Interval Singly-Censored Data 9/3/2009 Stat 567: Unit 3 - Ramón V. León 4 2
Plant 1 Estimate of CDF 9/3/2009 Stat 567: Unit 3 - Ramón V. León 5 Comments on the Nonparametric Estimate of F(t i ) 9/3/2009 Stat 567: Unit 3 - Ramón V. León 6 3
Confidence Intervals 9/3/2009 Stat 567: Unit 3 - Ramón V. León 7 Some Characteristic Features of Confidence Intervals The level of confidence expresses one s confidence (not probability) that a specific interval contains the quantity of interest The actual coverage probability is the probability that the procedure will result in an interval containing the quantity of interest A confidence interval is approximate if the specified level of confidence is not equal to the actual coverage probability With censored data most confidence intervals are approximate. Better approximations require more computations 9/3/2009 Stat 567: Unit 3 - Ramón V. León 8 4
Pointwise Binomial-Based Based Confidence Interval for F(t i ) 9/3/2009 Stat 567: Unit 3 - Ramón V. León 9 Pointwise Normal-Approximation Confidence Interval for F(t i ) 9/3/2009 Stat 567: Unit 3 - Ramón V. León 10 5
Plant 1 Heat Exchanger Tube Crack Nonparametric Estimate with Conservative Pointwise 95% Confidence Intervals Based on Binomial Theory 9/3/2009 Stat 567: Unit 3 - Ramón V. León 11 Calculation of the Nonparametric Estimate of F(t i ) for Plant 1 from the Heat Exchanger Tube Crack Data 9/3/2009 Stat 567: Unit 3 - Ramón V. León 12 6
Integrated Circuit (IC) Failure Times in Hours Data from Meeker (1987) Lfp1370.ld 9/3/2009 Stat 567: Unit 3 - Ramón V. León 13 Nonparametric Estimator of F(t) Based on Binomial Theory for Exact Failures and Singly Right Censored Data 9/3/2009 Stat 567: Unit 3 - Ramón V. León 14 7
JMP Analysis 9/3/2009 Stat 567: Unit 3 - Ramón V. León 15 JMP Analysis Failing 0.020 0.018 0.016 0.014 0.012 0.010 0.008008 0.006 0.004 0.002 0.000 0 100200 300 400 500600 700 800900 1100 1300 Hours 9/3/2009 Stat 567: Unit 3 - Ramón V. León 16 8
Comments on the Nonparametric Estimate of F(t) 9/3/2009 Stat 567: Unit 3 - Ramón V. León 17 Delta Method and Derivative of the Logit of the CDF Delta Method: 2 Var f ( ˆ ) = f '( ˆ ) Var( ˆ ) Derivative of the Logit Function: x f ( x) log logxlog1x 1 x 1 1 1 f '( x) x 1 x x(1 x) logit Fˆ Fˆ se Fˆ 1 Fˆ 9/3/2009 Stat 567: Unit 3 - Ramón V. León 18 9
Pointwise Normal-Approximation Confidence Interval for F(t i ) Based on the Logit Transformation 9/3/2009 Stat 567: Unit 3 - Ramón V. León 19 Pointwise Normal-Approximation Confidence Interval for F(t i ) Based on the Logit Transformation 9/3/2009 Stat 567: Unit 3 - Ramón V. León 20 10
Nonparametric Estimate for the IC Data with Normal Approximation Pointwise 95% Confidence Interval Based on the Logit Transformation 9/3/2009 Stat 567: Unit 3 - Ramón V. León 21 Notation Example n 13 sample size th d 3 # of failures in the i interval i r 2 # of right censored observation at t i i-1 i-1 n 7 risk set at t n d r i i1 j j j0 j0 3 pˆ i estimate of the probability of 7 th failing in the i interval given that item has survived to the begining of the interval i 9/3/2009 Stat 567: Unit 3 - Ramón V. León 22 11
A Nonparametric Estimate of F(t i ) Based on Interval Data and Multiple Right Censoring 9/3/2009 Stat 567: Unit 3 - Ramón V. León 23 Pooling of the Heat Exchanger Tube Crack Data 9/3/2009 Stat 567: Unit 3 - Ramón V. León 24 12
Calculation of the Nonparametric Estimate of F(t i ) for the Heat Exchanger Tube Crack Data 0.0133, 0.9867 0.0254, 0.9746 0.0206, 0.9794 9/3/2009 Stat 567: Unit 3 - Ramón V. León 25 Nonparametric Estimate for the Heat Exchanger Tube Crack Data 9/3/2009 Stat 567: Unit 3 - Ramón V. León 26 13
Approximate Variance of Estimated CDF ˆ ˆ Recall, Ft ˆ( ) 1 St ˆ( ) the Var Ft ( ) Var St ( ) i i i i i i i Also St ˆ( ) 1 pˆ qˆ and St ( ) q i j1 j j1 j i j1 j Then a Taylor series first-order approximation of St ˆ( ) is ˆ St ( ) St ( ) St ( ) q q i i ˆ i i j 1 j j q j q St ( ) St ( ) q q i i ˆ i j1 j j q j j i 9/3/2009 Stat 567: Unit 3 - Ramón V. León 27 Approximate Variance of Estimated CDF Then it follows that 2 2 ˆ St ( ) Var ( ) ( ˆ ) ( ) i St qp i Sti Varq j 1 j j 1 q j q j nj because the qˆ are approximately j i i j j uncorrelated binomial proportions. (The qˆ values are asymtotically as nuncorrelated). j i ˆ pj Sti Sti Sti j 2 2 i pj Var ( ) ( ) ( ) nq n(1 p) 1 j1 j j j j 9/3/2009 Stat 567: Unit 3 - Ramón V. León 28 14
Estimating the Standard Error of the Estimated CDF 9/3/2009 Stat 567: Unit 3 - Ramón V. León 29 Standard Errors for the Estimated CDF of the Heat Exchanger Tube Crack Data 0.0133, 0.9867 0.0254, 0.9616 0.0206, 0.9418 9/3/2009 Stat 567: Unit 3 - Ramón V. León 30 15
Recall: Pointwise Normal-Approximation Confidence Interval for F(t i ) Based on the Logit Transformation 9/3/2009 Stat 567: Unit 3 - Ramón V. León 31 Normal-Approximation Pointwise Confidence Intervals of the Heat Exchanger Tube Crack Data 9/3/2009 Stat 567: Unit 3 - Ramón V. León 32 16
9/3/2009 Stat 567: Unit 3 - Ramón V. León 33 9/3/2009 Stat 567: Unit 3 - Ramón V. León 34 17
JMP Analysis 0 1 2 3 9/3/2009 Stat 567: Unit 3 - Ramón V. León 35 9/3/2009 Stat 567: Unit 3 - Ramón V. León 36 18
Recall: 9/3/2009 Stat 567: Unit 3 - Ramón V. León 37 Shock Absorber Failure Data First reported in O Connor (1985) Failure times in number of kilometers of use, of vehicle shock absorbers Two failure modes, denoted by M1 and M2 One might be interested in the distribution of time to failure for mode M1, mode M2, or the overall failure-time distribution of the part Data Table C.2 in the Appendix, page 630 Here we do not differentiate between mode M1 and M2. We will estimate the distribution of time to failure by either mode M1 or M2. 9/3/2009 Stat 567: Unit 3 - Ramón V. León 38 19
9/3/2009 Stat 567: Unit 3 - Ramón V. León 39 Failure Pattern in the Shock Absorber Data: Failure Mode Ignored 9/3/2009 Stat 567: Unit 3 - Ramón V. León 40 20
9/3/2009 Stat 567: Unit 3 - Ramón V. León 41 Nonparametric Estimates for the Shock Absorber Data up to 12,220 km 9/3/2009 Stat 567: Unit 3 - Ramón V. León 42 21
9/3/2009 Stat 567: Unit 3 - Ramón V. León 43 JMP Analysis 9/3/2009 Stat 567: Unit 3 - Ramón V. León 44 22
JMP Analysis 9/3/2009 Stat 567: Unit 3 - Ramón V. León 45 9/3/2009 Stat 567: Unit 3 - Ramón V. León 46 23
9/3/2009 Stat 567: Unit 3 - Ramón V. León 47 9/3/2009 Stat 567: Unit 3 - Ramón V. León 48 24
9/3/2009 Stat 567: Unit 3 - Ramón V. León 49 Theory of Simultaneous Confidence Bands 9/3/2009 Stat 567: Unit 3 - Ramón V. León 50 25
9/3/2009 Stat 567: Unit 3 - Ramón V. León 51 9/3/2009 Stat 567: Unit 3 - Ramón V. León 52 26
9/3/2009 Stat 567: Unit 3 - Ramón V. León 53 9/3/2009 Stat 567: Unit 3 - Ramón V. León 54 27
9/3/2009 Stat 567: Unit 3 - Ramón V. León 55 9/3/2009 Stat 567: Unit 3 - Ramón V. León 56 28
9/3/2009 Stat 567: Unit 3 - Ramón V. León 57 9/3/2009 Stat 567: Unit 3 - Ramón V. León 58 29
SPLIDA GRAPH: Turbine Wheel Crack Initiation Data with Nonparametric Pointwise 95% Confidence Bands 0.8 0.6 - - - - Fraction Failing 0.4 - - - - - - 0.2 0 - - - - - - - - - - 10 20 30 40 50 Hundreds of Hours Sat Aug 23 22:36:34 EDT 2003 9/3/2009 Stat 567: Unit 3 - Ramón V. León 59 SPLIDA GRAPH: Turbine Wheel Crack Initiation Data with Nonparametric Simultaneous 95% Confidence Bands 1 0.8 - - - - Fraction Failing 0.6 0.4 - - - - - - - - - 02 0.2 0 - - - - - - - 10 20 30 40 50 Hundreds of Hours Sat Aug 23 22:31:59 EDT 2003 9/3/2009 Stat 567: Unit 3 - Ramón V. León 60 30
JMP Analysis 9/3/2009 Stat 567: Unit 3 - Ramón V. León 61 m Combined Start Time End Time Survival Failure SurvStdEr 10.0000 10.0000 0.9302 0.0698 0.0337 14.0000 14.0000 0.9302 0.06980698 0.04730473 18.0000 18.0000 0.9041 0.0959 0.0345 22.0000 22.0000 0.8333 0.1667 0.0680 26.0000 26.0000 0.7778 0.2222 0.0657 30.0000 30.0000 0.7778 0.2222 0.0650 34.0000 34.0000 0.5385 0.4615 0.1383 38.0000 38.0000 0.4190 0.5810 0.0865 42.0000 42.0000 0.4190 0.5810 0.0766 46.0000 46.0000 0.4165 0.5835 0.0822 9/3/2009 Stat 567: Unit 3 - Ramón V. León 62 31
Omitted Topic in Chapter 3 Uncertain censoring time Have assumed that censoring takes place at the end of the observation intervals Can assume censoring happens in the middle of the observation intervals Leads to actuarial or life table nonparametric estimate of cdf. See Table 3.6 Page 64. 9/3/2009 Stat 567: Unit 3 - Ramón V. León 63 32