Proposed
Pressure Reduction Installation at Tirley
Appeal Hearing
Written statement by Professor
Robin Sibson MA PhD CStat
1. Personal background
1.1 I am resident a few miles
from the site of the proposed PRI, and accordingly have an interest in the proposal.
1.2 I hold the degrees of Master
of Arts (Mathematical Tripos) and Doctor of Philosophy (pure mathematics and
mathematical statistics) from the University
of Cambridge, and am a Chartered
Statistician.
1.3 Although I am not
specifically an expert in risk analysis, an understanding of the statistical
issues is part of the expertise of every professional statistician.
2. Scope of evidence and summary of
conclusions
2.1 My evidence is concerned
entirely with the safety case as presented in documents NG4, NG5 and NG6
submitted as proofs of evidence presented by David McCollum on behalf of
National Grid, and further explored in correspondence between Joseph Gabbott
(Secretary of CAPRI) and Hammonds LLP (solicitors acting on behalf of NG).
2.2 The safety issues most
likely to be of concern to residents near the site of the proposed PRI are
those the impact of which extends beyond the boundaries of the PRI itself.
These are characterised by NG as ‘extremely infrequent risks’ and are
contrasted with ‘credible risks’, those where the impact is contained within
the site (NG6 Appendix 4 paragraph 5.1) – a somewhat tendentious use of
terminology. My evidence focuses on the former.
2.3 The overall level of risk
associated with an adverse event is composed of the likelihood (strictly,
probability if it as a one-off event, or hazard rate if it is an event that can
occur at any time) and the impact. The safety case is based on the idea that a
risk where the impact is high can be tolerated if the likelihood is low enough.
2.4 Analysis of a risk
accordingly requires: (a) a likelihood calculation; (b) an assessment of
circumstances where that calculation may be bypassed (‘disorderly’ rather than
‘orderly’ failure); and (c) an impact assessment.
2.5 I shall show that there are
serious defects in some aspects of the likelihood (failure rate) calculations
carried out by NG, in an area where these calculations are susceptible to
independent analysis. This leads to the suspicion that equally serious defects
may lie hidden in areas that are not accessible to external scrutiny. If that
is the case, the necessary adjustments might well produce figures for individual
and societal risks that are a factor of ten worse than those calculated by NG,
and thus approach or exceed unacceptable levels.
2.6 NG completely fail to
provide any discussion of disorderly failure and how it might be mitigated. I
shall illustrate by reference to a number of recent major accidents that
disorderly rather than orderly failure needs to be given proper consideration.
2.7 I shall identify a number of
weaknesses in the impact assessment, typically resulting in the safety case as
presented by NG offering a more favourable picture than may be justified.
3. Failure rate calculations
3.1 A risk of high impact must
be shown to have low likelihood if it is to be regarded as acceptable. The
estimation of low rates of occurrence is inherently problematic, requiring
either very large amounts of data, or potentially questionable methods of
combining rates of occurrence for component events, or both. I shall focus on
the first of these issues.
3.2 An accessible example of a
large data requirement is to be found in NG6 Appendix 4 Paragraph 7.3, which I
quote for ease of reference.
7.3 With
respect to the catastrophic failure of filter, flow meter or valve body,
National Grid has never experienced such a failure in the operating experience
of the National Transmission System. This equates to an estimated failure rate
of less than 3×10–7 per year (1 in 3,333,333 per year). Catastrophic
failure of these items is not therefore considered further in the risk
assessment.
3.3 It is implicit in the way
that this is expressed that a common rate of catastrophic failure is assumed
for these items and that this does not change with the age of the item (rather,
in practice, that routine replacement takes place before any increase in
failure rate becomes material). I shall provisionally accept these assumptions.
3.4 Any claim that a failure
rate is less than 3×10-7 per item per year must clearly be based on
a very large number of item-years of operating experience, even if no failures
have been recorded during that period. Clarification was accordingly sought
from NG on a number of points relating to this claim. The response was as
follows.
The
estimation has been carried out by utilising the inventory of high pressure
line valves on the National Transmission System and multiplying that figure by
the time that they have been at risk of failure. This figure has then had a
factor applied to it to adjust it for wider gas industry experience.
Professional judgement has then be applied to consider the qualitative issues
that need to be considered before making a judgement to discount catastrophic
failure of filters, flow meters and valve bodies. The qualitative
considerations in the judgement have been given greater weight that the
quantitative evidence. National Grid can demonstrate 0.32 million item years of
failure free operation, based on 7960 valves in service for 40 years. By
dividing the 0.32 million item years service by one, on the assumption that we
have had one failure in this time even though we have not had one, gives a first estimate of failure rate to be 3 x 10-6
per year. It is known that the European gas pipeline network is 17 times the
size of the UK NTS network, and the US gas transmission network is 68 time the
size of the UK NTS network. National Grid is not aware of a spontaneous
complete failure of a valve body leading to a full bore pipeline rupture in
Europe, though 2 have been reported on transmission systems in the US. This
information has been gathered through reviews of incident databases. Such
databases are the HSE MHIDAS database which collects incident data collected
under the RIDDOR regulations and the COMAH regulations in the UK and their European equivalents
under the Seveso directive in the European Community, the Marsh Loss Reports on
Large Property Damage in the Hydro-Carbon chemical industries or the US
Department of Transportation and Hazardous Materials Safety Administration
(PHMSA). In the case of the Seveso directive in the European Community and the
requirements of the PHMSA in the US, operators are required by law to
report such incidents. Therefore, it is reasonable to take into account this
much greater operating experience and apply a further reduction factor to the
assumed failure rate. If it was assumed that the US and European valve operating
experience is proportional to the length of the pipeline systems, this would
allow the UK experience to be multiplied by 85
(17 + 68). This would give 27,200,000 item years operating experience and with
two failures would produce a failure rate of 7 x 10-8 per year. In
order to be cautious, Mr McCollum has only applied a reduction factor of 10 to
the UK estimate to obtain the result of 3
x 10-7 per year.
There
are also qualitative considerations to be taken into account. Valve and
filter bodies are typically two or more
times the thickness of the linepipe to which they are attached; so the pipe is
much more likely to fail than the valve body. It is standard practice not to
include valve, filter and meter failures in Quantitative Risk Assessments
(QRA). This is consistent with National Grid’s QRA methodology, which they also
apply to their Control of Major Hazards (COMAH) establishments. The COMAH
regulations require a QRA to be submitted with the site safety report for those
sites to which the regulations apply. The QRAs are then assessed by the HSE.
Within UK
transmission, there are five COMAH sites, two gas terminals and three LNG
sites, each containing a PRI within its boundary. All the QRAs for these sites
have been accepted by HSE.
All the QRAs for these sites discounted catastrophic valve, filter and meter
body failure. The methodology applied is also consistent with the TNO
Purple Book, the guide to risk assessment, developed by the authorities in the Netherlands,
which is respected world wide. The TNO
Purple Book states the requirements for all the elements of a QRA. This
consists of the equipment on site to be considered, the consequence methods to
be applied,
the
failure rates to be applied and the risk calculations to be carried out.
Failure rates are given for loss of full containment failure for pipelines,
pressure vessels, storage tanks and other types of equipment. No failure rates
are given for valves, but the advice given for the pipeline failure rate is
that the data quoted of 1 x10-7 per metre per year includes the
flange connections on pipelines. This recognises that the potentially weaker
part of the valve connection to the pipeline is included in the pipeline
failure rate and does not have to be accounted for separately.
The
evidence discussed in the answer to No 4 above does not lend itself to purely
statistical analysis. Therefore, some degree of judgement is required to reach
a conclusion. The figures quoted in the risk assessment were intended to give a
guide to the magnitude of the failure rate in comparison with the pipe failure
rate. The manner in which such items are physically constructed shows that they
will be less likely to fail than the pipe to which thy are attached. It is
common practice not to include such failures in QRAs. This is supported by the TNO Purple Book from where the pipe
failure rates were sourced and the professional judgement of safety
professionals and national safety regulators such as the HSE.
3.5 A total of 7960 valves in
service for 40 years gives 318,400 item-years of operation, approximately 0.32
million years as claimed. However, it would be surprising if the network had
always been the same size, and this calculation would be valid only if the
figure of 7960 were the average number of valves operating during the 40-year
period. If in fact the network had grown from a small start to a current size with
7960 valves, the number of item-years might well be half that claimed. There
has not been an opportunity to obtain further clarification from NG on this
point.
3.6 The wording of 7.3
unequivocally implies that the inferred failure rate is based on UK
experience. However, it is clear from the NG response that this is not so, and
indeed that UK
data is a very small component of the total set of data on which the inference
is based. This constitutes a serious misrepresentation.
3.7 In general using more rather
than less data is commendable – provided that you state correctly what you are
doing, and that the additional data does not have different properties. But in
this case there is a particular reason for concern, over the issue of data
quality. Regulatory requirements ensure reporting of failures – none in Europe
and two in the US
– but there is no such certainty over the number of item-years of operation. NG
have assumed that the (as I suppose, current) lengths of the European and US
networks relative to the UK network can simply be used as factors to scale up
the number of item-years of operation from the UK figure. This involves
assuming (a) that the number of items scales with network length, and (b) that
the networks have operated over the same period. The first assumption may be
particularly questionable for the US,
where the distances between installations that use valves may well be much
greater, leading to a smaller number of valves per unit length – we simply do
not know. In any event, steps should have been taken to obtain proper figures,
especially in view of the aggressive step of dividing the UK
‘first estimate’ by 10 that the Europe+US
data are used to justify, see 3.14 below.
3.8 ‘The qualitative
considerations in the judgement have been given greater weight than the quantitative
evidence’ (in 3.4 above). This is a classic symptom of denial, no different in
principle from the reluctance of proponents of homeopathic medicine to see
their methods subjected to double-blind clinical trials, or of spiritualists to
have their séances observed by a professional conjurer. Of course professional
judgement has to be the context in which safety assessments are made, but that
does not provide any licence for a cavalier attitude towards the data on which
they must be based, and the wording as a whole of the response quoted in 3.4
suggests real dangers in this respect.
3.9 But in general, the issues
raised up to this point about the actual calculation of failure rates could, in
principle at least, be overcome by the provision of properly quality-assured
data on which the calculations could be re-run, and the making of a few
apologies. I now turn to a much more serious matter. First some background.
3.10 If the available data
comprises the number of item-years of operation and the number of failures that
have occurred, how should the failure rate be assessed? At first sight it is
almost irresistible simply to take the ratio, the observed failure rate. But in
a safety context that is a very misguided approach in any event, and is clearly
nonsensical if there have in fact been no failures, as in both the UK
and Europe data. The reason why it is misguided can be
explained in a number of ways. It must be understood that the observed data are
random: regardless of the actual failure rate, the number of observed failures
will fluctuate as between different periods of observation. Thus it is quite
possible for the observed failure rate to be lower than the actual failure rate
on which we would like to base safety considerations. When the number of
failures is small, the discrepancy can easily be large.
3.11 We have no direct access to
the actual failure rate, and so our safety considerations cannot be based
directly on it. What can be done, however, is to use a failure rate value
derived from the data that allows little possibility for the true but unknown
failure rate to be any higher – a ‘prudent bound’ for the failure rate. I have
prepared a technical note, attached as Appendix A to this statement, that shows
how to carry out this process. Where there have been some failures, then it can
be thought of as providing a safety factor to apply to the observed failure
rate in order to reach a safe level, but it also provides a consistent and
meaningful way of handling the zero-failure case, where the concept of a safety
factor does not apply, because zero multiplied by anything is still zero. What
that level of safety should be is a matter of judgement. Certainly it should be
far more rigorous than the conventional levels at which statistical tests are
commonly conducted, and it is arguable that if we are seeking annual failure
rates in the region of 10-6, then the test size should itself be of
this order of magnitude. The technical note shows that the safety factor grows
relatively slowly with increased levels of prudence, so it is not unduly
onerous to demand a high level.
3.12 With this as background, it
is now instructive to examine how NG have analysed the data. This is set out in
the response quoted in 3.4. Starting with the UK
data, they have obviously realised that, with zero failures, they cannot just
quote a failure rate of zero; some way of suggesting a non-zero failure rate
has to be found. So they have invented a failure, let us call it a ‘virtual
failure’. They can now obtain a non-zero value by dividing a count of 1 for the
number of failures (their one virtual failure) by 320,000 for the number of
item-years, to give a ‘first estimate’ of the failure rate of about 3×10-6,
per item-year (they actually describe this process as dividing 320,000 by 1).
In Appendix A it is pointed out that this is equivalent to using a figure of 1
in place of the actual figures in the small table dealing with the case of no
failures. The actual figures in that table range from 2.9957 for a test of size
5% (far too lax a level of prudence in the present context) to 13.8155 for a
test size of 10-6 (a level that is reasonably conservative). To
obtain the corresponding prudent bounds on the failure rate, these figures are
divided by the 320,000 item-years of failure-free operation, giving prudent
bounds of 9.4×10-6 and 4.3×10-5 respectively. In terms of
this approach NG’s ‘first estimate’ is over-optimistic (that is, too small) by
a factor of between 3 and 14.
3.13 It cannot be emphasised too
strongly that the approach adopted by NG has no foundation whatsoever in proper
statistical practice. They clearly have no understanding of the data analysis problem
they are trying to deal with, yet the need to produce some sort of answer has
driven them to adopt an approach that is entirely arbitrary and artificial.
They even try to imply that, because they have introduced a virtual failure
where there was in fact no failure, their approach is a conservative one,
whereas it can be seen that it substantially fails to provide a prudent bound.
3.14 However, even the ‘first estimate’ is not a small
enough figure for the purpose of establishing an adequate safety level. This is
the point at which the European and US data are introduced into the calculation, on the
basis already criticised in 3.7 above. The number of item-years obtained by
scaling up the UK figure of 0.32 million by a factorof (17 + 68) is 27.2 million, and with two real
failures now in the data, no virtual failure need be introduced. By dividing 2
by 27.2 million they obtain a failure rate estimate of 7.4×10-8 which
they quote as 7×10-8. That figure is clearly helpful to NG’s claims,
and allows the headroom for another entirely arbitrary step with no
justification, which is to claim that if the Europe+US figure is that good,
then it is safe to divide the UK ‘first estimate’ by a factor of 10 (Why 10? There
is, of course, no answer to that) to reach the final claimed figure of 3×10-7.
Again, it is hard to believe that in an industry critically dependent on proper
safety analysis, a procedure of this kind can pass muster.
3.15 The figures I have calculated in the large table
in Appendix A allow us to perform a soundly-based analysis of the Europe+US data. The relevant pair of rows is that for two failures (r=2),
and the first of those two rows
contains a figure of 6.2958 for a test size of 5% and 19.1292 for a test size
of 10-6. The prudent bounds for the failure rate are obtained by
dividing each of those figures by the number 27.2 million of item-years during
which those two failures have occurred. This gives a figure of 2.3×10-7 at the 5% level and 7.0×10-7
at the 10-6 level. These figures would be very slightly improved
(reduced) by incorporating the UK
data as well into the overall calculation, since the number of item-years would
rise by 320,000 but the number of failures would not increase. The prudent bound
at the 10-6 level is ten times (worse, that is) the figure on which
NG rely in order arbitrarily to divide their already meaningless figure of 3×10-6
obtained from the UK data by (as it happens) 10.
3.16 The conclusion from a
proper analysis is that if the number of item-years of data can indeed be taken
on trust, then the Europe+US data do provide a good
level of assurance that the valves are very safe components, with a prudent
bound for the failure rate of 7.0×10-7 per item-year. That is not
the point. The concern is that NG have reached their conclusion about safety by
an entirely meaningless process, showing that those who carried out the
analysis do not understand what they are doing. What confidence can anyone have
in the parts of the safety case that cannot readily be exposed to public gaze
in this way?
4. Disorderly failure
4.1 It is commonly the case that
the subsequent inquiry into a major accident reveals that it was caused by
events that cannot easily be built into a quantitative risk calculation. Often
multiple failures are involved, and in most cases human rather than purely
technological factors play a major part. These may arise in the immediate
context of the accident or through management decisions that allow a dangerous
situation to develop – an accident waiting to happen – or both. Causal factors
may range from a surprisingly minor and not necessarily culpable human error
that proves to have unforeseen consequences, through negligence, failure to act
on safety alerts, failure to adopt an adequately fail-safe design, to
recklessness, and even deliberate sabotage. Acts of terrorism fall into the
last of those categories, but do not exhaust it. For example, the first major
fire in the Channel Tunnel appears to have been caused deliberately in
pursuance of an industrial dispute.
4.2 There are many instructive
examples of accidents that fall into the category of disorderly failure but
that have not been caused deliberately. The Chernobyl
disaster has been attributed to recklessness on the part of staff at the plant,
who by-passed safety precautions in order to perform tests on the reactor. The
Potters Bar rail crash has been blamed on negligent maintenance. Major factors
in the Paddington rail crash were lack of automatic fail-safe braking devices,
and lack of response to repeated reports that a critical signal was hard to
observe, leading to a situation where there was a high risk of driver error and
nothing to protect against it. The first of two recent gas pipeline explosions
in Texas was reported to have
been caused by puncture of the pipeline by contractors installing an
electricity power-line, and it has been claimed that the contractors were
working on the basis of inaccurate plans. Although it is too early to be
confident of the full causes of the BP oil rig disaster in the Gulf of Mexico,
it does appear to be well-established that engineers working on the project were
expressing very grave concerns about it in the months before the disaster and
that these were eliciting little response from senior management.
4.3 It is not acceptable for an
organisation to neglect to address issues of this kind in a safety case, simply
because they are nearly impossible to quantify and hard to deal with even
qualitatively. Procedures do exist to mitigate the risk that disorderly failure
will occur. For example, in a process analogous to financial audit, any safety
case should be subject to challenge by a team independent of those producing
it, either internal to the company or possibly external through the use of
safety consultants. HSE should also have a
responsibility here, so it is worrying that they have apparently not yet responded
to the safety case for the Tirley site.
5. Impact
5.1 The issue of impact is much
closer to the engineering side of the safety case than to the statistical side,
and I am accordingly not in a position to comment in depth on this aspect.
However, one of the generalist skills of a statistician is to be able to
identify anomalies in quantitative reasoning almost regardless of subject
content, and I have noted a number of points about which I am uneasy.
5.2 The type of outcome considered appears to be exclusively that
of loss of life. Quite properly, that should have primacy, but public concern
will also relate (at least) to serious injury and to damage to property and the
environment. If a major accident occurred, it would be impossible initially to
know the pattern of immediate fatality, serious injury, and minor injury. So
even if all those affected were immediate fatalities, the ambulance services
and hospitals would still need to respond on a ‘major incident’ scale. If there
were many serious but not immediately fatal injuries, pressure would then move
to the A&E departments and intensive care services of local hospitals. The
safety case gives no consideration to the adequacy of local resources in this
regard.
5.3 Rate of escape. Various calculations assume that on becoming
aware of an incident, individuals will flee at a speed of 2.5ms-1,
or 9kmh-1 (NG6 App 4 4.7, fifth bullet). This is a jogging pace. The
assumption is unrealistic because: (a) individuals may not reliably respond in
this way to the first indications of an incident; (b) obstacles such as hedges
may prevent flight directly away from the incident; and (c) the elderly and
infirm may not be able to maintain this pace. An acceptably conservative
assumption in my view would be that individuals do not move. This would clearly
increase the finally calculated risk, but in the absence of further information
it is not possible to judge whether the difference would be material.
5.4 Road traffic. In calculating the societal risk, road traffic
levels along the B4211 and B4213 are used (NG6 App 4 4.10). However, these are
generic figures for rural B-roads. Highway authorities routinely measure
traffic levels automatically using pressure-sensitive strips fixed temporarily
to the road surface. The cost of doing this is presumably quite modest. Real
site-specific data could accordingly have been collected and used in place of
generic data. Again, in the absence of further information it is not possible
to judge whether this would have given a more or less favourable view of
traffic levels. Note that among the traffic passing the site is a regular
public bus service. Note also that in this context calculations have been
carried out using LD01, the 1% lethality distance.
5.5 Distances over which impact may be apparent. Three distances
are quoted in tabulations (see NG6 App 4 6.1): the piloted ignition distance,
which is the maximum distance at which buildings could be ignited; the LD50
distance at which 50% lethality would arise (although calculated on the assumption
that individuals would attempt to escape, see above); and the lower flammable
limit distance which is the distance beyond which the gas/air mixture would be
too dilute to ignite if it had not already done so. It appears from NG6 App4
4.7 bullet 5 that lethality calculations take account of distance in a somewhat
more graduated manner, although it is not easy to be confident of this from the
information given, but certainly it would be interesting to have the values of,
say, LD10 and LD01 as well as LD50. Also, the individual risk figure at the
nearest property as quoted in NG6 App 4 8.2 is reasonably small only because
the likelihood of high-impact events is taken to be small, not because the
impact is limited, that is, only orderly failure is considered. What would
happen at the nearest property in a worst-case scenario at the PRI, given that
it is around 300m from the PRI (figures given in NG6 App 4 6.2.1 and 6.2.3.5)?
The figures in the table in NG6 App 4 6.2.3.4, suggest that anywhere within
about a 1km radius of the PRI is not a good place to be.
5.6 Crater fires. The event tree shown in NG6 App 5 shows ignited
rupture outcomes as ‘crater fires’, with or without fireball. Although rupture
is considered as a failure mode in NG6 App 4 6.2.3.2, this is all under the
heading ‘Jet Fires’, and I can see no mention of crater fires in App 4. Why?
What would be the effect of a crater fire resulting from the rupture of a
1200mm pipe operating at 94barg?
5.7 Validation. It is claimed in NG6 App 4 4.8 that ‘All the
mathematical consequence models that are employed in this analysis have been
validated by data from large and full scale experiments’. Does this mean that
NG have real experimental data from full
bore releases from a 1200mm pipe operating at 94barg? If not, what does it
mean?
Robin Sibson
2010-07-12