headbg headbg
 
     
Below is the Statement made by Professor Robin Sibson to the Inquiry on Wednesday 14th July. It can be downloaded as a pdf from here.

 Proposed Pressure Reduction Installation at Tirley

Appeal Hearing

Written statement by Professor Robin Sibson MA PhD CStat

1. Personal background

1.1 I am resident a few miles from the site of the proposed PRI, and accordingly have an interest in the proposal.

1.2 I hold the degrees of Master of Arts (Mathematical Tripos) and Doctor of Philosophy (pure mathematics and mathematical statistics) from the University of Cambridge, and am a Chartered Statistician.

1.3 Although I am not specifically an expert in risk analysis, an understanding of the statistical issues is part of the expertise of every professional statistician.

2. Scope of evidence and summary of conclusions

2.1 My evidence is concerned entirely with the safety case as presented in documents NG4, NG5 and NG6 submitted as proofs of evidence presented by David McCollum on behalf of National Grid, and further explored in correspondence between Joseph Gabbott (Secretary of CAPRI) and Hammonds LLP (solicitors acting on behalf of NG).

2.2 The safety issues most likely to be of concern to residents near the site of the proposed PRI are those the impact of which extends beyond the boundaries of the PRI itself. These are characterised by NG as ‘extremely infrequent risks’ and are contrasted with ‘credible risks’, those where the impact is contained within the site (NG6 Appendix 4 paragraph 5.1) – a somewhat tendentious use of terminology. My evidence focuses on the former.

2.3 The overall level of risk associated with an adverse event is composed of the likelihood (strictly, probability if it as a one-off event, or hazard rate if it is an event that can occur at any time) and the impact. The safety case is based on the idea that a risk where the impact is high can be tolerated if the likelihood is low enough.

2.4 Analysis of a risk accordingly requires: (a) a likelihood calculation; (b) an assessment of circumstances where that calculation may be bypassed (‘disorderly’ rather than ‘orderly’ failure); and (c) an impact assessment.

2.5 I shall show that there are serious defects in some aspects of the likelihood (failure rate) calculations carried out by NG, in an area where these calculations are susceptible to independent analysis. This leads to the suspicion that equally serious defects may lie hidden in areas that are not accessible to external scrutiny. If that is the case, the necessary adjustments might well produce figures for individual and societal risks that are a factor of ten worse than those calculated by NG, and thus approach or exceed unacceptable levels.

2.6 NG completely fail to provide any discussion of disorderly failure and how it might be mitigated. I shall illustrate by reference to a number of recent major accidents that disorderly rather than orderly failure needs to be given proper consideration.

2.7 I shall identify a number of weaknesses in the impact assessment, typically resulting in the safety case as presented by NG offering a more favourable picture than may be justified.

3. Failure rate calculations

3.1 A risk of high impact must be shown to have low likelihood if it is to be regarded as acceptable. The estimation of low rates of occurrence is inherently problematic, requiring either very large amounts of data, or potentially questionable methods of combining rates of occurrence for component events, or both. I shall focus on the first of these issues.


3.2 An accessible example of a large data requirement is to be found in NG6 Appendix 4 Paragraph 7.3, which I quote for ease of reference.

7.3 With respect to the catastrophic failure of filter, flow meter or valve body, National Grid has never experienced such a failure in the operating experience of the National Transmission System. This equates to an estimated failure rate of less than 3×10–7 per year (1 in 3,333,333 per year). Catastrophic failure of these items is not therefore considered further in the risk assessment.

3.3 It is implicit in the way that this is expressed that a common rate of catastrophic failure is assumed for these items and that this does not change with the age of the item (rather, in practice, that routine replacement takes place before any increase in failure rate becomes material). I shall provisionally accept these assumptions.

3.4 Any claim that a failure rate is less than 3×10-7 per item per year must clearly be based on a very large number of item-years of operating experience, even if no failures have been recorded during that period. Clarification was accordingly sought from NG on a number of points relating to this claim. The response was as follows.

The estimation has been carried out by utilising the inventory of high pressure line valves on the National Transmission System and multiplying that figure by the time that they have been at risk of failure. This figure has then had a factor applied to it to adjust it for wider gas industry experience. Professional judgement has then be applied to consider the qualitative issues that need to be considered before making a judgement to discount catastrophic failure of filters, flow meters and valve bodies. The qualitative considerations in the judgement have been given greater weight that the quantitative evidence. National Grid can demonstrate 0.32 million item years of failure free operation, based on 7960 valves in service for 40 years. By dividing the 0.32 million item years service by one, on the assumption that we have had one failure in this time even though we have not had one, gives a  first estimate of failure rate to be 3 x 10-6 per year. It is known that the European gas pipeline network is 17 times the size of the UK NTS network, and the US gas transmission network is 68 time the size of the UK NTS network. National Grid is not aware of a spontaneous complete failure of a valve body leading to a full bore pipeline rupture in Europe, though 2 have been reported on transmission systems in the US. This information has been gathered through reviews of incident databases. Such databases are the HSE MHIDAS database which collects incident data collected under the RIDDOR regulations and the COMAH regulations in the UK and their European equivalents under the Seveso directive in the European Community, the Marsh Loss Reports on Large Property Damage in the Hydro-Carbon chemical industries or the US Department of Transportation and Hazardous Materials Safety Administration (PHMSA). In the case of the Seveso directive in the European Community and the requirements of the PHMSA in the US, operators are required by law to report such incidents. Therefore, it is reasonable to take into account this much greater operating experience and apply a further reduction factor to the assumed failure rate. If it was assumed that the US and European valve operating experience is proportional to the length of the pipeline systems, this would allow the UK experience to be multiplied by 85 (17 + 68). This would give 27,200,000 item years operating experience and with two failures would produce a failure rate of 7 x 10-8 per year. In order to be cautious, Mr McCollum has only applied a reduction factor of 10 to the UK estimate to obtain the result of 3 x 10-7 per year.

There are also qualitative considerations to be taken into account. Valve and filter  bodies are typically two or more times the thickness of the linepipe to which they are attached; so the pipe is much more likely to fail than the valve body. It is standard practice not to include valve, filter and meter failures in Quantitative Risk Assessments (QRA). This is consistent with National Grid’s QRA methodology, which they also apply to their Control of Major Hazards (COMAH) establishments. The COMAH regulations require a QRA to be submitted with the site safety report for those sites to which the regulations apply. The QRAs are then assessed by the HSE. Within UK transmission, there are five COMAH sites, two gas terminals and three LNG sites, each containing a PRI within its boundary. All the QRAs for these sites have been accepted by HSE. All the QRAs for these sites discounted catastrophic valve, filter and meter body failure. The methodology applied is also consistent with the TNO Purple Book, the guide to risk assessment, developed by the authorities in the Netherlands, which is respected world wide. The TNO Purple Book states the requirements for all the elements of a QRA. This consists of the equipment on site to be considered, the consequence methods to be applied,

the failure rates to be applied and the risk calculations to be carried out. Failure rates are given for loss of full containment failure for pipelines, pressure vessels, storage tanks and other types of equipment. No failure rates are given for valves, but the advice given for the pipeline failure rate is that the data quoted of 1 x10-7 per metre per year includes the flange connections on pipelines. This recognises that the potentially weaker part of the valve connection to the pipeline is included in the pipeline failure rate and does not have to be accounted for separately.

The evidence discussed in the answer to No 4 above does not lend itself to purely statistical analysis. Therefore, some degree of judgement is required to reach a conclusion. The figures quoted in the risk assessment were intended to give a guide to the magnitude of the failure rate in comparison with the pipe failure rate. The manner in which such items are physically constructed shows that they will be less likely to fail than the pipe to which thy are attached. It is common practice not to include such failures in QRAs. This is supported by the TNO Purple Book from where the pipe failure rates were sourced and the professional judgement of safety professionals and national safety regulators such as the HSE.

3.5 A total of 7960 valves in service for 40 years gives 318,400 item-years of operation, approximately 0.32 million years as claimed. However, it would be surprising if the network had always been the same size, and this calculation would be valid only if the figure of 7960 were the average number of valves operating during the 40-year period. If in fact the network had grown from a small start to a current size with 7960 valves, the number of item-years might well be half that claimed. There has not been an opportunity to obtain further clarification from NG on this point.

3.6 The wording of 7.3 unequivocally implies that the inferred failure rate is based on UK experience. However, it is clear from the NG response that this is not so, and indeed that UK data is a very small component of the total set of data on which the inference is based. This constitutes a serious misrepresentation.

3.7 In general using more rather than less data is commendable – provided that you state correctly what you are doing, and that the additional data does not have different properties. But in this case there is a particular reason for concern, over the issue of data quality. Regulatory requirements ensure reporting of failures – none in Europe and two in the US – but there is no such certainty over the number of item-years of operation. NG have assumed that the (as I suppose, current) lengths of the European and US networks relative to the UK network can simply be used as factors to scale up the number of item-years of operation from the UK figure. This involves assuming (a) that the number of items scales with network length, and (b) that the networks have operated over the same period. The first assumption may be particularly questionable for the US, where the distances between installations that use valves may well be much greater, leading to a smaller number of valves per unit length – we simply do not know. In any event, steps should have been taken to obtain proper figures, especially in view of the aggressive step of dividing the UK ‘first estimate’ by 10 that the Europe+US data are used to justify, see 3.14 below.

3.8 ‘The qualitative considerations in the judgement have been given greater weight than the quantitative evidence’ (in 3.4 above). This is a classic symptom of denial, no different in principle from the reluctance of proponents of homeopathic medicine to see their methods subjected to double-blind clinical trials, or of spiritualists to have their séances observed by a professional conjurer. Of course professional judgement has to be the context in which safety assessments are made, but that does not provide any licence for a cavalier attitude towards the data on which they must be based, and the wording as a whole of the response quoted in 3.4 suggests real dangers in this respect.

3.9 But in general, the issues raised up to this point about the actual calculation of failure rates could, in principle at least, be overcome by the provision of properly quality-assured data on which the calculations could be re-run, and the making of a few apologies. I now turn to a much more serious matter. First some background.


3.10 If the available data comprises the number of item-years of operation and the number of failures that have occurred, how should the failure rate be assessed? At first sight it is almost irresistible simply to take the ratio, the observed failure rate. But in a safety context that is a very misguided approach in any event, and is clearly nonsensical if there have in fact been no failures, as in both the UK and Europe data. The reason why it is misguided can be explained in a number of ways. It must be understood that the observed data are random: regardless of the actual failure rate, the number of observed failures will fluctuate as between different periods of observation. Thus it is quite possible for the observed failure rate to be lower than the actual failure rate on which we would like to base safety considerations. When the number of failures is small, the discrepancy can easily be large.

3.11 We have no direct access to the actual failure rate, and so our safety considerations cannot be based directly on it. What can be done, however, is to use a failure rate value derived from the data that allows little possibility for the true but unknown failure rate to be any higher – a ‘prudent bound’ for the failure rate. I have prepared a technical note, attached as Appendix A to this statement, that shows how to carry out this process. Where there have been some failures, then it can be thought of as providing a safety factor to apply to the observed failure rate in order to reach a safe level, but it also provides a consistent and meaningful way of handling the zero-failure case, where the concept of a safety factor does not apply, because zero multiplied by anything is still zero. What that level of safety should be is a matter of judgement. Certainly it should be far more rigorous than the conventional levels at which statistical tests are commonly conducted, and it is arguable that if we are seeking annual failure rates in the region of 10-6, then the test size should itself be of this order of magnitude. The technical note shows that the safety factor grows relatively slowly with increased levels of prudence, so it is not unduly onerous to demand a high level.

3.12 With this as background, it is now instructive to examine how NG have analysed the data. This is set out in the response quoted in 3.4. Starting with the UK data, they have obviously realised that, with zero failures, they cannot just quote a failure rate of zero; some way of suggesting a non-zero failure rate has to be found. So they have invented a failure, let us call it a ‘virtual failure’. They can now obtain a non-zero value by dividing a count of 1 for the number of failures (their one virtual failure) by 320,000 for the number of item-years, to give a ‘first estimate’ of the failure rate of about 3×10-6, per item-year (they actually describe this process as dividing 320,000 by 1). In Appendix A it is pointed out that this is equivalent to using a figure of 1 in place of the actual figures in the small table dealing with the case of no failures. The actual figures in that table range from 2.9957 for a test of size 5% (far too lax a level of prudence in the present context) to 13.8155 for a test size of 10-6 (a level that is reasonably conservative). To obtain the corresponding prudent bounds on the failure rate, these figures are divided by the 320,000 item-years of failure-free operation, giving prudent bounds of 9.4×10-6 and 4.3×10-5 respectively. In terms of this approach NG’s ‘first estimate’ is over-optimistic (that is, too small) by a factor of between 3 and 14.

3.13 It cannot be emphasised too strongly that the approach adopted by NG has no foundation whatsoever in proper statistical practice. They clearly have no understanding of the data analysis problem they are trying to deal with, yet the need to produce some sort of answer has driven them to adopt an approach that is entirely arbitrary and artificial. They even try to imply that, because they have introduced a virtual failure where there was in fact no failure, their approach is a conservative one, whereas it can be seen that it substantially fails to provide a prudent bound.

3.14 However, even the ‘first estimate’ is not a small enough figure for the purpose of establishing an adequate safety level. This is the point at which the European and US data are introduced into the calculation, on the basis already criticised in 3.7 above. The number of item-years obtained by scaling up the UK figure of 0.32 million by a factorof (17 + 68) is 27.2 million, and with two real failures now in the data, no virtual failure need be introduced. By dividing 2 by 27.2 million they obtain a failure rate estimate of 7.4×10-8 which they quote as 7×10-8. That figure is clearly helpful to NG’s claims, and allows the headroom for another entirely arbitrary step with no justification, which is to claim that if the Europe+US figure is that good, then it is safe to divide the UK ‘first estimate’ by a factor of 10 (Why 10? There is, of course, no answer to that) to reach the final claimed figure of 3×10-7. Again, it is hard to believe that in an industry critically dependent on proper safety analysis, a procedure of this kind can pass muster. 3.15 The figures I have calculated in the large table in Appendix A allow us to perform a soundly-based analysis of the Europe+US data. The relevant pair of rows is that for two failures (r=2),

and the first of those two rows contains a figure of 6.2958 for a test size of 5% and 19.1292 for a test size of 10-6. The prudent bounds for the failure rate are obtained by dividing each of those figures by the number 27.2 million of item-years during which those two failures have occurred. This gives a figure of  2.3×10-7 at the 5% level and 7.0×10-7 at the 10-6 level. These figures would be very slightly improved (reduced) by incorporating the UK data as well into the overall calculation, since the number of item-years would rise by 320,000 but the number of  failures would not increase. The prudent bound at the 10-6 level is ten times (worse, that is) the figure on which NG rely in order arbitrarily to divide their already meaningless figure of 3×10-6 obtained from the UK data by (as it happens) 10.

3.16 The conclusion from a proper analysis is that if the number of item-years of data can indeed be taken on trust, then the Europe+US data do provide a good level of assurance that the valves are very safe components, with a prudent bound for the failure rate of 7.0×10-7 per item-year. That is not the point. The concern is that NG have reached their conclusion about safety by an entirely meaningless process, showing that those who carried out the analysis do not understand what they are doing. What confidence can anyone have in the parts of the safety case that cannot readily be exposed to public gaze in this way?

4. Disorderly failure

4.1 It is commonly the case that the subsequent inquiry into a major accident reveals that it was caused by events that cannot easily be built into a quantitative risk calculation. Often multiple failures are involved, and in most cases human rather than purely technological factors play a major part. These may arise in the immediate context of the accident or through management decisions that allow a dangerous situation to develop – an accident waiting to happen – or both. Causal factors may range from a surprisingly minor and not necessarily culpable human error that proves to have unforeseen consequences, through negligence, failure to act on safety alerts, failure to adopt an adequately fail-safe design, to recklessness, and even deliberate sabotage. Acts of terrorism fall into the last of those categories, but do not exhaust it. For example, the first major fire in the Channel Tunnel appears to have been caused deliberately in pursuance of an industrial dispute.

4.2 There are many instructive examples of accidents that fall into the category of disorderly failure but that have not been caused deliberately. The Chernobyl disaster has been attributed to recklessness on the part of staff at the plant, who by-passed safety precautions in order to perform tests on the reactor. The Potters Bar rail crash has been blamed on negligent maintenance. Major factors in the Paddington rail crash were lack of automatic fail-safe braking devices, and lack of response to repeated reports that a critical signal was hard to observe, leading to a situation where there was a high risk of driver error and nothing to protect against it. The first of two recent gas pipeline explosions in Texas was reported to have been caused by puncture of the pipeline by contractors installing an electricity power-line, and it has been claimed that the contractors were working on the basis of inaccurate plans. Although it is too early to be confident of the full causes of the BP oil rig disaster in the Gulf of Mexico, it does appear to be well-established that engineers working on the project were expressing very grave concerns about it in the months before the disaster and that these were eliciting little response from senior management.

4.3 It is not acceptable for an organisation to neglect to address issues of this kind in a safety case, simply because they are nearly impossible to quantify and hard to deal with even qualitatively. Procedures do exist to mitigate the risk that disorderly failure will occur. For example, in a process analogous to financial audit, any safety case should be subject to challenge by a team independent of those producing it, either internal to the company or possibly external through the use of safety consultants. HSE should also have a responsibility here, so it is worrying that they have apparently not yet responded to the safety case for the Tirley site.

5. Impact

5.1 The issue of impact is much closer to the engineering side of the safety case than to the statistical side, and I am accordingly not in a position to comment in depth on this aspect. However, one of the generalist skills of a statistician is to be able to identify anomalies in quantitative reasoning almost regardless of subject content, and I have noted a number of points about which I am uneasy.

5.2 The type of outcome considered appears to be exclusively that of loss of life. Quite properly, that should have primacy, but public concern will also relate (at least) to serious injury and to damage to property and the environment. If a major accident occurred, it would be impossible initially to know the pattern of immediate fatality, serious injury, and minor injury. So even if all those affected were immediate fatalities, the ambulance services and hospitals would still need to respond on a ‘major incident’ scale. If there were many serious but not immediately fatal injuries, pressure would then move to the A&E departments and intensive care services of local hospitals. The safety case gives no consideration to the adequacy of local resources in this regard.

5.3 Rate of escape. Various calculations assume that on becoming aware of an incident, individuals will flee at a speed of 2.5ms-1, or 9kmh-1 (NG6 App 4 4.7, fifth bullet). This is a jogging pace. The assumption is unrealistic because: (a) individuals may not reliably respond in this way to the first indications of an incident; (b) obstacles such as hedges may prevent flight directly away from the incident; and (c) the elderly and infirm may not be able to maintain this pace. An acceptably conservative assumption in my view would be that individuals do not move. This would clearly increase the finally calculated risk, but in the absence of further information it is not possible to judge whether the difference would be material.

5.4 Road traffic. In calculating the societal risk, road traffic levels along the B4211 and B4213 are used (NG6 App 4 4.10). However, these are generic figures for rural B-roads. Highway authorities routinely measure traffic levels automatically using pressure-sensitive strips fixed temporarily to the road surface. The cost of doing this is presumably quite modest. Real site-specific data could accordingly have been collected and used in place of generic data. Again, in the absence of further information it is not possible to judge whether this would have given a more or less favourable view of traffic levels. Note that among the traffic passing the site is a regular public bus service. Note also that in this context calculations have been carried out using LD01, the 1% lethality distance.

5.5 Distances over which impact may be apparent. Three distances are quoted in tabulations (see NG6 App 4 6.1): the piloted ignition distance, which is the maximum distance at which buildings could be ignited; the LD50 distance at which 50% lethality would arise (although calculated on the assumption that individuals would attempt to escape, see above); and the lower flammable limit distance which is the distance beyond which the gas/air mixture would be too dilute to ignite if it had not already done so. It appears from NG6 App4 4.7 bullet 5 that lethality calculations take account of distance in a somewhat more graduated manner, although it is not easy to be confident of this from the information given, but certainly it would be interesting to have the values of, say, LD10 and LD01 as well as LD50. Also, the individual risk figure at the nearest property as quoted in NG6 App 4 8.2 is reasonably small only because the likelihood of high-impact events is taken to be small, not because the impact is limited, that is, only orderly failure is considered. What would happen at the nearest property in a worst-case scenario at the PRI, given that it is around 300m from the PRI (figures given in NG6 App 4 6.2.1 and 6.2.3.5)? The figures in the table in NG6 App 4 6.2.3.4, suggest that anywhere within about a 1km radius of the PRI is not a good place to be.

5.6 Crater fires. The event tree shown in NG6 App 5 shows ignited rupture outcomes as ‘crater fires’, with or without fireball. Although rupture is considered as a failure mode in NG6 App 4 6.2.3.2, this is all under the heading ‘Jet Fires’, and I can see no mention of crater fires in App 4. Why? What would be the effect of a crater fire resulting from the rupture of a 1200mm pipe operating at 94barg?

5.7 Validation. It is claimed in NG6 App 4 4.8 that ‘All the mathematical consequence models that are employed in this analysis have been validated by data from large and full scale experiments’. Does this mean that NG have  real experimental data from full bore releases from a 1200mm pipe operating at 94barg? If not, what does it mean?

Robin Sibson
2010-07-12


 
© CAPRI 2010