Reliability prediction framework, process ans use

1.5. Reliability prediction framework, process ans use#

1.5.1. Reliability prediction framework underlying this handbook#

Any reliability prediction should be clear, specific and useful.

“Clear” means that the assumptions underlying the prediction are clearly defined and communicated together with the prediction results, allowing for a meaningful interpretation.
“Specific” means that the characteristics of the product and mission under analysis are considered appropriately, and the prediction accounts for the relevant variables.
“Useful” means that the prediction is giving answers to the questions that are of relevance for the required use of the prediction results.

To achieve this, the framework of this handbook starts the prediction process with a discussion of the assumptions underlying the prediction (Section 1.5.2 of this part), allowing to be at the same time specific to the problem at hand and clear regarding the meaning and interpretation of the prediction results. A key concept is the intended use of the prediction, which is driving the required scope and focus of the prediction in order to be useful for the trade-offs that will be supported (Section 1.5.3 of this part). Throughout the project life cycle, the prediction will serve different purposes, and different reliability prediction tasks are performed to coordinate the prediction throughout the system development (Section 1.5.4 of this part).

1.5.2. Assumptions and ground rules for the prediction#

When planning a reliability prediction, the ground rules and key assumptions underlying the prediction must be agreed upon in a first step. Which assumptions are appropriate, depends on the following aspects:

The system characteristics or item under analysis,
The project life cycle phase in which the prediction is performed, and
The intended use of the prediction results.

Thus, sufficient information on these three points should be available before starting with the prediction and even before defining any ground rules and assumptions.

Table 1.5.1 gives an overview of different areas in which assumptions need to be made before starting a system level reliability prediction. The associated ground rules give a rough indication of the general principles that should be followed to achieve maximum accuracy (i.e. realistic predictions).

It is not mandatory to be compliant with each ground rule, but the assumptions made should be justified considering the characteristics of the system, the project life cycle phase in which the prediction is made and the intended use of the prediction results.

It should be clear that the assumptions made can have a tremendous impact on the results. If the main goal is to receive reliability predictions that are comparable (e.g. between different manufacturers or suppliers), one should ensure that the same assumptions are used. At least the key assumptions with the largest impact on the result should be agreed by all parties, and/or specified in the supply chain.

Any reliability prediction report should provide full information on the assumptions made for the prediction, as listed in Table 1.5.1 below. This ensures that the required elements of a reliability prediction, as defined in [NR_METHODO_11], Clause 4, are provided with the prediction. The relation between each of these required elements, the assumptions listed in Table 1.5.1 and the relevant paragraphs of this handbook is given in Table 1.5.2.

Table 1.5.1 List of assumptions to be agreed upon and associated ground rules for predictions at system level (equipment level or higher).#

Assumptions		Ground rules
1	Basic information needed to define the assumptions
1.1	System definition	The system or item under analysis shall be clearly defined with a description of all relevant characteristics needed for performing the prediction.
1.2	Project life cycle phase	The project life cycle phase during which the prediction is performed as well as the associated reviews shall be clearly defined
1.3	Intended use of the prediction	The reliability prediction objectives or intended use (s) of the prediction results shall be clearly defined
2	Assumptions related to the reliability prediction coverage
2.1	Mission phases coverage	The prediction shall cover all mission phases that affect reliability and are relevant for the supported reliability prediction use cases.
2.2	Elements coverage	The prediction shall cover all spacecraft elements unless their contribution to overall system (un-)reliability is negligible
2.3	Failure modes coverage	The prediction shall cover all failure modes with a relevant effect on the state (or performance) of the overall system.
2.4	Failure mechanisms coverage	The prediction shall cover all failure mechanism with a relevant contribution to the occurrence of the considered failure modes.
2.5	Failure root causes coverage	The coverage in terms of failure root causes shall be defined depending on the reliability prediction use
3	Assumptions related to the reliability prediction input
3.1	Mission definition	The mission definition shall specify the functions as well as the performance levels (degraded system modes) to be analysed.
3.2	Design lifetime	The design life time shall be clearly specified. For life time extensions, the analysis shall account for accumulated time and stresses.
3.3	Operational conditions	In each project phase, the operational conditions shall be defined or updated based on all available information
3.4	Environmental conditions	In each project phase, the environmental conditions shall be defined or updated based on all available information
3.5	Product design information	The available product design information, as of the current project phase, shall be used to build or update the reliability model
3.6	Methods, models	The prediction methods and models shall be selected based on the technologies, use conditions and available information
3.7	Data	Relevant test or field return data should be used to build or update the reliability models (when available)
4	Assumptions related to the reliability modelling
4.1	Redundancy considerations	Redundancies shall be modelled considering the specific type of redundancy and appropriate input from lower levels
4.2	Degraded system modes	Degraded system modes shall be modelled explicitly, considering reliability as a function of the required performance level.
4.3	Dormant phases	Dormant phase modelling shall account for the difference between the stresses in active and passive mode
4.4	Common cause effects	Common cause effects shall be considered accounting for the system layout, use conditions and considered categories of failures
4.5	Distribution functions	The selection of distribution functions to model reliability in time shall be justified considering the technologies and relevant failure mechanisms
5	Assumptions related to the reliability prediction outputs
5.1	Prediction metrics	The prediction metrics (e.g. failure rate, reliability in time, probability of failure on demand) shall be consistent with the reliability modelling
5.2	Prediction uncertainties	The most relevant epistemic uncertainties associated with the prediction shall be identified and communicated together with the prediction results.
5.3	Conservatism	The required accuracy or conservatism (realistic vs. conservative prediction) shall depend on the intended use of the prediction

Table 1.5.2 Relation between the assumptions listed in Table 1.5.1, the required elements of a prediction according to IEEE 1413:2010 and the relevant sections of this handbook.#
Required element of a reliability prediction ([NR_METHODO_11], Clause 4)	Table 1.5.1 assumptions	Handbook parts/paragraphs
Identification and description of the item for which the prediction is made and the life cycle phase upon which the prediction is performed	System definition Project life cycle phase	Section 1.4.1.1, Section 1.5.4
Intended use of the prediction results	Intended use of the prediction	Section 1.5.3.2
RP coverage: No required element in [NR_METHODO_11]	Mission phase coverage Elements coverage Failure modes coverage Failure mechanisms coverage Root causes coverage	Section 1.4.1, Section 1.4.2, Section 2.5, Section 2.5, Section 1.4.3, Section 1.4.4
List of inputs used for the selected methodologies	Mission definition Design lifetime Operational conditions Environmental conditions Product design information Methods, models Data	Section 1.4.3.4, Section 2.5.3, Section 2.5,
Modelling: No required element in [NR_METHODO_11]	Redundancy considerations Degraded system modes Dormant phases Common cause effects Distribution functions	Section 9
Prediction metrics: Definitions and values	Prediction metrics	Section 2.5.1
Uncertainties and limitations of the prediction	Prediction uncertainties	Section 2.6
Statistical confidence in the prediction	Conservativism	Section 2.6

1.5.3. Scope and focus of the prediction for different reliability prediction uses#

The scope and focus of a reliability prediction can be defined in terms of different axes, see Table 1.5.2 above (RP coverage).

As a general rule, all elements and associated failure modes and mechanisms need to be covered by the prediction, unless their contribution to overall system (un-)reliability or to the decision that will be supported, can be assumed to be negligible for practical purposes. Similar considerations hold for the coverage of mission phases during the prediction. The required coverage in terms of root causes (failure categories, see Section 1.4.3 for classification) depends on the intended use of the prediction. This is discussed in the following paragraphs.

1.5.3.1. Reliability prediction versus reliability management#

Achieving a high reliability product is an important objective during the design and production of any space system. Considering all root causes of failure is a prerequisite, and different mitigation processes are in place to avoid the occurrence of each of them (see Table 1.5.3 for examples). Apart from measures to avoid the different root causes, system level design aims at mitigating the effect of lower-level failures on the success of the mission.

The objective of reliability predictions is to provide quantitative estimates for the (remaining) probability of failure despite the implementation of these measures. Some of the mitigation measures are explicitly considered in the prediction, e.g. quality level of EEE parts, or redundancy at system level. Others can be used as a justification to neglect certain root causes in the prediction, provided that the mitigation measures are sufficiently effective to avoid their occurrence.

To give an example, calculations from radiation engineering can provide evidence that the rate of destructive SEE is negligible compared to the random failure rate. Similar considerations become relevant for degradation failures of EEE components on satellites for instance, which can in most cases be effectively avoided by safe life qualification (with appropriate margins), at least when the prediction is limited to the specified design lifetime. In addition, depending on the intended use of the prediction, there can be no added value to make a quantitative prediction for a certain root cause, because it does not make a difference for the trade-offs that will be supported by the prediction.

These aspects are discussed in the following parts.

Table 1.5.3 Failure categories (from Table 1.4.2) with examples of mitigation measures#
Failure category	Root cause	Mitigation
Random failure (RF)	• Unknown residual defect/weakness: ○ Consistent with quality level ○ Under normal stresses (refer to data sheet) ○ On-Off events	• Space qualification • Part quality selection • Derating • Redundancy • FDIR
Systematic failure (SF)	• Design error • Manufacturing error • Operations error	• Robust design • Quality assurance (during design, manufacturing and operations) • Qualification & verification process
Degradation failure (DEG)	• Normal physical process → time/equivalent time ○ Operations-related (e.g. On/Off, duty cycle) ○ Environment-related (e.g. Radiations)	• Components and materials selection • Design calculations and margins • Lifetime qualification with margins
Extrinsic failure (EX)	For launchers, the local environmental conditions apply. For the other systems considered in this handbook, the extrinsic failures are related to the space environment: • Vacuum (outgassing, cold-welding, heat transfer) • Thermal (solar radiation, solar albedo, Earth OLR) • Magnetic field • Mechanical vibrations/shocks (launcher, pyro activation) • Cumulated effects → considered as DEG • Plasma (ESD) • SEE: destructive/non-destructive • Micrometeoroites • Debris	• Components and materials selection • Design calculations and margins • Qualification and verification testing • Thermal control • Shielding (thermal, radiation, debris) • Radiation engineering • Debris impact predictions • Avoidance manoeuvres

1.5.3.2. Overview on different reliability prediction uses#

Different reliability prediction uses become relevant throughout the project life cycle of a space mission (see Table 1.5.4 for an overview). This table includes some classical reliability prediction uses, related to management and verification of reliability requirements or to the support of design trade-off decisions.

Moreover, some new stakes are addressed, e.g. related to the constellations design or to the satellites safe disposal.

Table 1.5.4 Possible reliability prediction uses throughout the project life cycle#
Reliability prediction uses	Description	Phases
Input to the design, support trade-offs and comparisons	During the design phase, reliability prediction can be used to compare competing designs or trade-off options, to identify weak parts of the design and to assess the impact of design changes. The level of detail increases with the project phases.	A – D
Establishment, management and verification of quantitative reliability requirements	The purpose of quantitative reliability requirements and their management and verification is to ensure acceptable (as specified) reliability of space products through contractual specification.	0 – D
Support decisions on the choice of engineering design margins	Reducing excessive margins is one way to reduce costs but should be justified by an appropriate rationale or analysis. On the other hand, it can sometimes be reasonable to increase a specific margin to avoid a catastrophic single point failure.	A – C
Choosing a test strategy at part, equipment or higher levels	Another way to reduce costs is to reduce the effort dedicated to testing. Additional tests can be useful, e.g. to verify a design for identified stresses, to avoid costly redesigns. Also, these decisions should be justified by an appropriate rationale or analysis.	A – D
Support business planning for single spacecraft and for the design of constellations	Reliability predictions – or specified reliability requirements – form an important input to support space customer’s business planning. This holds particularly for the design of constellations as “system of systems”, e.g. to decide on the number of spares, replenishment scenarios or redundancy management.	0 – E
Health monitoring and decision-making on lifetime extension vs. safe disposal	Spacecrafts are designed to be reliable for a specified lifetime and any lifetime extension needs to be justified based on reliability prediction. To support the decision, the reliability prediction needs to be revisited, in particular for the functions relevant for satellite safe disposal (space debris mitigation).	E – F

The reliability prediction methodology presented in this handbook intends to embrace different reliability prediction uses, although the focus is clearly on the “classical” uses, related to the development and design of a single spacecraft. The first use listed in Table 1.5.4 is considered as the base case. Recommendations on the root cause coverage required for this use are provided in Section 1.5.3.2.1 of this part, followed by a discussion of the remaining uses in the following paragraphs.

1.5.3.2.1. Reliability prediction as input to the design#

Providing input to the design of a spacecraft can be seen as the classical use for reliability predictions in space applications and it is the focus of this handbook. To be useful for design, the methodology needs to account for the relevant design variables, in order to support the required trade-offs and for the affected categories of failures.

Guidance on the root cause coverage required for this reliability prediction use is given in Table 1.5.5 below:

Note 5

The recommendation made for systematic failure modelling is driven by the limitations of the available modelling approaches, which do not account for the relevant decision variables (e.g. impact of maturity category, test strategy). Other design decisions, such as redundancy sizing or the margin policy, are not effective to avoid systematic failures. For these reasons, the added value to consider this failure category for design support is small despite its clear relevance for the overall failure count.

Table 1.5.5 Root causes coverage for reliability prediction as input to the design.#
Failure category	Required coverage for reliability prediction as design support
Random failure (RF)	Full coverage required
Systematic failure (SF)	Not generally required to cover systematic failures, unless full root causes coverage is needed, e.g. to support the design of a constellation (see Section 1.5.3.2.5).
Degradation failure (DEG)	Degradation after the specified lifetime is out of scope for this use (see Section 1.5.3.2.6 for lifetime extensions). Premature degradation (excluding systematic failures) needs to be considered for technologies for which safe life qualification is not possible, or not fully effective.
Extrinsic failure (EX)	Relevant stress contributors resulting from the spacecraft environment should be considered in the prediction of random and degradation failures. Explicit consideration of extrinsic failures with dedicated models is only required if the rate of occurrence of additional failure modes (e.g. destructive SEE, space debris impact) cannot be neglected when compared to the random failure rate.

The recommendations regarding root causes coverage are generally valid also for preliminary reliability predictions, e.g. for the Preliminary Design Review (PDR).

However, the level of detail used in the modelling can be reduced in this context to reduce prediction efforts and to account for limited input availability in early project phases.

1.5.3.2.2. Establishment and verification of quantitative reliability requirements#

To allow for a meaningful verification, the specification of quantitative reliability requirements should always go hand in hand with a clear definition of the requirement’s scope in terms of failure categories, as well as elements coverage.

The recommendations on root cause coverage for design support in Table 1.5.5 can be used also in this context. The quantitative reliability requirements specified between customers, prime contractors and suppliers can then directly be used as a design driver, with the goal to find the best architecture and detailed design to comply with the requirements under the given schedule and budget constraints.

After completion of the detailed design, the prediction can be extended to account for systematic failures as well, e.g. when full root causes coverage is needed to support the business planning for the owner of a single satellite or a satellite constellation (Section 1.5.3.2.5).

1.5.3.2.3. Reliability prediction supporting the choice of engineering design margins#

The recommendations made in Table 1.5.5 are based on the assumption that the occurrence of certain root causes is effectively reduced by different mitigation measures, as listed in Table 1.5.3. However, one possible use of quantitative reliability predictions is to assess the risk associated with a reduction of design margins, or the benefit of increasing a specific margin.

The reliability prediction then needs to account for the effect of this margin policy decision, which can require the consideration of additional root causes. Safe life qualification to avoid degradation failures before end-of-life is a case in point; degradation models that are based on Physics of Failure (PoF) allow quantifying the effect of the associated margins (e.g. radiation margins) on the item’s reliability. More generally, the effect of design margins can be quantified with the aid of any reliability model that consider the effect of the stress contributors that are addressed by this margin.

1.5.3.2.4. Reliability prediction supporting the choice of a test strategy#

Also, the choice of a suitable test strategy can be based on quantitative reliability predictions to assess the effect on the reliability of the flight item. Part level tests, such as e.g. lot acceptance tests, reliability tests, lifetime tests or radiation tests, generally have a clear relation to a specific failure category and the risk associated with a specific test plan (e.g. sample size, duration) can be quantified using statistical methods.

System level qualification and verification tests are performed mainly as a means of quality control, to identify possible design and manufacturing errors that can otherwise lead to systematic failures during launches or operations in-orbit. However, with the approaches for systematic failure modelling presented in the current handbook, it is not possible to consider the effect of testing and thus to quantify the risk associated with a specific test strategy.

1.5.3.2.5. Reliability prediction as input for business planning and design of constellations#

To support business planning on the customer side (for single satellites, and especially for constellations), or for the insurance of space systems, reliability predictions need to be as realistic as possible.

To achieve this, the scope of the prediction should follow the recommendations made for design support (Table 1.5.5), with an extension to account for systematic failures as well.

1.5.3.2.6. Reliability prediction to support decisions about lifetime extensions#

The limited coverage of degradation effects proposed in Table 1.5.5 for design support is justified by the fact that components are generally qualified for the specified lifetime. This condition is violated in the case of a lifetime extension, requiring additional considerations for the associated reliability predictions.

To support decisions on lifetime extensions, it cannot be required to revisit the scope of the prediction for all spacecraft elements, e.g. for space debris mitigation only the functions needed for safe disposal are of interest, and health monitoring can be used to better assess the risk of failure in redundant system architectures. Where a quantitative prediction is required, the scope of the prediction needs to be extended to account for additional degradation failures that can become relevant due to the lifetime extension.

1.5.4. Reliability prediction during project life cycle#

This paragraph explains how the system reliability prediction process interacts with the system development process and which activities and deliverables should be performed throughout the system life cycle.

The typical system life cycle of space products according to ECSS-M-ST-10C [BR_METHODO_1] consists of seven phases, as shown in Fig. 1.5.1 (see also Section 1.4.1.1).

../../_images/methodo_figure5_2.png — Fig. 1.5.1 Interaction of the reliability process with the system life cycle of a satellite#

The establishment and cascading of reliability requirements will be explained in Section 1.5.4.2.1 of this part. The evaluation of system architectures from reliability perspective in early design phase is introduced in Section 1.5.4.3. The verification of system reliability requirements is part of Section 1.5.4.4 and the aspects of reliability prediction for life-time extension and safe disposal are handled in Section 1.5.4.5. The contribution of reliability prediction to the system development during the different life cycle phases is shown in Section 1.5.4.1 of this part.

1.5.4.1. Deliverables of the reliability prediction process during life cycle#

The scope and aim of each system life cycle phase, is highlighted in Fig. 1.5.1, taken from [BR_METHODO_1].

../../_images/methodo_figure5_3.png — Fig. 1.5.2 System life cycle#

The contribution of reliability prediction to each phase and the associated reviews are explained as follows in Table 1.5.6. This table gives an overview of reliability documents to be provided per review during the system life cycle.

Table 1.5.6 Reliability deliverables per project milestones/reviews#

*Table 4-6: Reliability deliverables per project milestones/reviews*
Title	ESCC	Phase
		0	A	B		C	D			E				F
		MDR	PRR	SRR	PDR	CDR	QR	AR	ORR	FRR	LRR	CRR	ELR	MCR
Failure modes and effects analysis/ failure modes, effects and criticality analysis (as input for system level analysis, e.g. FTA)	ECSS-Q-ST-30-02			*	X	X
Reliability Prediction	ECSS-Q-ST-30C		X *	X *	X	X	X	X

Note 6

Although ECSS-Q-ST-30C mentions a reliability prediction at PRR and SRR, this is not done for all projects and can be used only to assist apportioning of requirements to lower level. A FMEA at SRR can be required for specific missions and can be used, e.g. to assist safety analysis.

Table 1.5.7 Contribution of reliability prediction in different mission phases#

  </tr>

Main objectives	Associated reviews	Involvement of reliability prediction
Main objectives	Associated reviews	Task	Input	Output
Phase 0: Mission analysis - needs identification
Define mission needs and expected performance. Identify constraints and boundary conditions with respect to physical and operational environment. Define possible mission concepts.	Mission Definition Review (MDR): Definition of mission baseline.	In Phase 0, the reliability prediction activities focus on capturing top level requirements and boundary conditions. The Mission Definition Review (MDR) provides the mission profile that is to be used for reliability prediction. Top level reliability requirements are derived from customer needs. A first high level reliability prediction could be done, e.g. for quotation	Mission Profile At this stage of the development process, no system architecture is available. Only data from similar projects.	Top level requirements included in preliminary technical specification. First rough reliability estimation.
Phase A: Feasibility
Assess the technical and programmatic feasibility of the possible concepts by identifying constraints related to implementation, costs, schedule, organization, operations, maintenance, production and disposal. Identify critical technologies and propose pre‐development activities.	Preliminary Requirement Review (PRR): Assess feasibility of user requirements to allow a solid start of preliminary design.	In Phase A, the following points are to be addressed by the reliability activities: Assessment of feasibility to achieve the system level reliability requirement Breakdown of system level reliability requirement to lower level to establish a requirement basis to start with preliminary definition in Phase B, which is an input for Preliminary Requirement Review (PRR).	The data input for activities in Phase A are based on a concept of system architecture, as the detailed design is usually not available and also no FMEA/FMECA. To perform a reliability assessment as well as the partitioning of reliability requirements, often historical data of similar systems and components are used as initial values.	System reliability requirements Reliability requirement breakdown to lower level. Reliability prediction in such an early phase can be done as rough estimate to check feasibility and to support apportioning of requirements to lower levels.
Phase B: Preliminary definition
Conduct “trade-off” studies and select the preferred system concept, together with the preferred technical solution(s) for this concept. Establish a preliminary design definition for the selected system concept and the preferred technical solution(s).	System Requirements Review (SRR): Freeze of high level requirements Preliminary Design Review (PDR): Freeze of mission baseline and requirements down to subsystem level. Freeze of design concept at system level.	In the early phase of the development, quantitative methods are used to support the definition of the system and to refine the allocation of requirements. During the development activities, decisions and trade-offs are supported by reliability prediction. For each proposed system solution a preliminary reliability prediction is performed as decision basis to support the Preliminary Design Review (PDR). Preliminary version of FMEA, FMECA are to be prepared as input for system level analysis.	Input data for the preliminary reliability assessment are: • Preliminary system architecture, • Preliminary reliability data on subsystems, equipment and component level • Preliminary versions of FMEA/FMECA. Top level reliability requirements System level reliability requirements	The allocation of system level reliability requirements to lower levels is refined as system architecture evolves. For the SRR the reliability requirements are to be validated. At PDR, compliance justifies the technology and architecture choices.
Phase C: Detailed definition
Completion of the detailed design definition at all levels in the customer‐supplier chain. Production, development testing and pre‐qualification of selected critical elements and components. Production and development testing of engineering models, as required by the selected model philosophy and verification approach.	Critical Design Review (CDR): Confirmation of detailed design, release of final design. Authorisation to complete qualification and build flight units.	During Phase C, the reliability assessment is updated based on the detailed system definition to demonstrate that reliability requirements are met, supporting Critical Design Review (CDR). FMEA, FMECA and component reliability prediction are to be prepared based on detailed design.	Input data based on a detailed system architecture are updated, including FMEA, FMECA, FMES and component level reliability prediction.	Reliability prediction including Fault Tree Analysis or other equivalent methods for system reliability assessment.
Phase D: Qualification and production
Complete qualification testing and associated verification activities. Complete manufacturing, assembly and testing of flight hardware/software and associated ground support hardware/software.	Qualification Review (QR): Demonstrate that the system meets all requirements and verification proof is complete. Acceptance Review (AR): Acceptance of the system by the Customer. Operation Readiness Review (ORR):** Verify readiness of the operational teams and procedures and their compatibility with the flight system.	The reliability prediction is updated for customer acceptance considering test results from qualification.	Input is based on final system design with updated FMEA, FMECA and FMES.	Reliability prediction including updated Fault Tree Analysis or other equivalent methods for system reliability assessment.
Phase E: Operations / Utilization
Prepare launch of the system. Perform launch and in-orbit testing. Perform operations	Flight Readiness Review (FRR): Verify that the flight and ground segments are ready for launch. Launch Readiness Review (LRR): Performed right before launch to provide authorization to proceed for launch. Commissioning Results Review (CRR): Verify system performance after in-orbit testing. End-of-Life Review (ELR): Verify that the system has completed its useful life.	Depending on the result of in-orbit testing during commissioning, for example if redundancies are not available, the reliability assessment may have to be re-evaluated to support CRR. Update of reliability prediction based on in-orbit feedback. Reliability assessment for safe disposal.	FMEA, FMES Test results IOperational reliability data	Update reliability prediction with in-orbit data for deorbiting function to supportELR and decision making on life extension.
Phase F: Disposal (not for launchers)
Perform disposal, deorbiting of the space system	Mission Closure Review (MCR): Ensure that disposal is completed	Comparison of initial reliability prediction with IOR		Derive lessons learnt to support future projects

1.5.4.2. Management of reliability requirements#

The following paragraphs give information about the management of reliability requirements.

1.5.4.2.1. Establishment of reliability requirements#

In the following paragraph, the process steps to support the establishment of appropriate reliability requirements will be explained according to [[BR_METHODO_5]]. The first step is the classification of the type of space mission, as shown in Table 1.5.8. Details on the coverage of this handbook in terms of mission types can be found in Section 1.4.2.1 of this part of the handbook.

Table 1.5.8 Space mission classification#
Classes	Mission type
Class A	Human flight or material transport flight
Class B	Telecommunication, observation and navigation for application with high integrity requirements
Class C	Telecommunication, observation mission, space probe
Class D	Test and demonstration mission
Class L	Launcher, launch base

Within each class, the missions can be further categorized based on the following criteria [BR_METHODO_5]. For instance, Table 1.5.9 presents ESA’s satellite mission classification. This is one example of mission classification, that needs to be applied for ESA projects.

Note 7

Other companies can have their own specific mission classification.

Table 1.5.9 ESA satellite and constellation mission classification#
Mission characteristics criteria	Alpha	Beta	Gamma	Delta
Acceptable Risk Risk of not fulfilling some or all of the mission objectives	LOW	→	→	HIGH
Criticality to agency Objectives, strategy and image Flagship mission, international co-operation, impact on strategic ESA goals and image.	Extremely critical	Highly critical	Medium criticality	Low criticality
Cost Cost at completion inc. Phase E1	400 M€	200 – 400 M€	25 – 200 M€	< 25 M€
Mission lifetime Nominal mission life duration	7 years	5 – 7 years	2 – 5 years	< 2 years
Mission complexity Design interfaces, unique payloads, new technology development.	Extremely complex	Highly complex	Medium complexity	Low complexity

The type of mission as well as the categories of mission within each class (Table 1.5.9) should be taken into account for the establishment of reliability requirements. As a general rule, the larger the economic loss a failure of the mission would cause, the more stringent the reliability requirement should be. Furthermore, the criticality of the mission success to the strategic objectives of the customer can justify a higher reliability requirement.

Reliability requirements are expressed as expected probability of success over a given time period under consideration of:

Customer commercial objectives, revenue and return on investment,
Insurer requirements,
Regulation, e.g. avoidance of space debris and safe disposal,
Technical feasibility,
Cost, weight and volume constraints.

For example: The probability that the satellite achieves its performance requirements shall be better than 90% after a mission duration of 10 years in orbit.

Each quantitative requirement should be linked to an explanation to ensure correct interpretation. This includes definition of the scope, principles and boundary conditions that are to be applied for the reliability prediction, as discussed with the ground rules and assumptions for reliability prediction in Section 1.5.2 of this part.

The reliability requirements could also be defined considering partial losses leading to reduced functional capability and graceful degradation, for example:

The probability that at least \(k\) out of \(n\) antenna links are operative after \(y\) years in orbit shall be greater than 98%.

Where \(n\) denotes the total number of units installed and \(k\) an acceptable degraded mode, with \(k < n\). The duration \(y\) is usually defined in years and corresponds to end of mission, end of in-orbit testing or end of Launcher and Early Orbit Phase (LEOP), but it can also refer to operation cycles or hours.

1.5.4.2.2. Allocation of reliability requirements#

The allocation of requirements can start with assigning historical data of similar systems to subsystems and equipment level and is then refined as more details become available.

The allocation of reliability requirements then consists of the following steps:

Analysis of input requirement to formulate functional and performance requirements.
Define functional architecture to ensure system performance.
Functional failure analysis at system and subsystem level to identify failure scenarios that would lead to violation of reliability requirements.
Create a high-level system model that consists of the relevant subsystems based on the functional failure analysis.
Assign reliability targets to sub-functions. Besides the use of historical data of similar systems, different approaches could be used to assign initial reliability targets, for example:
- Equal allocation,
- Proportional allocation (ARINC method),
- Feasibility-Of-Objectives (FOO) method .
Review of subsystem targets with regard to feasibility, cost, schedule etc… and refine allocation if deemed necessary. This may involve iterations to find a well-balanced apportionment of reliability targets.

The allocation of requirements to lower level starts with the identification of system functions. Based on the customer top level reliability requirements, a system function for the design can be identified. This overall system function is decomposed into its sub-functions [BR_METHODO_4].

The functional failure analysis determines the failure effects and repercussions at system level. The following generic failures could be used as guideline to assess each function and sub-function:

Total loss of function,
Partial loss of function,
Un-commanded or spurious functioning,
Erroneous functioning.

The results of the functional failure analysis allow identifying the functions; those failures would affect the ability to perform the required system function. The cascading of the top-level system reliability requirement to the contributing sub-functions should consider the results of the functional failure analysis. That means, the reliability target allocated to functions and sub-functions should consider the relevant failure modes to ensure functional integrity.

An example of the functional failure analysis for the power supply system is shown in Table 1.5.10. A similar analysis needs to be performed for all functions of the satellite, whereby in early phases of the development the main functions are considered and as design evolves the functional breakdown can be refined and more details are included.

Table 1.5.10 Example functional failure analysis power supply of a satellite#
Function / Sub-function	Functional failure	Failure effect
1 - Provide electrical power	Total loss of electrical power	• Total loss of power supply • No data communication • Loss of satellite control
1.1 - Photovoltaics	Total loss of photovoltaics capabilities	• Total loss of power supply • No payload • Total loss of satellite
1.1 - Photovoltaics	Partial loss of photovoltaics capabilities	• Degraded performance, battery not fully charged, power interrupt in Earth’s shadow possible • Payload interruptions
1.2 - Charge Battery	Loss of battery charging	• No power supply in Earth shadow in one orbit • Interruption of payload • Possible loss of satellite
1.2 - Charge Battery	Overcharge of battery	• Permanent damage of battery possible • No power supply in Earth shadow in one orbit • Interruption of payload • Possible loss of satellite
1.3 - Output voltage regulation	Erroneous function output regulation – too low	• Insufficient voltage supply to satellite • No data communication
1.3 - Output voltage regulation	Erroneous function output regulation – voltage too high	• Permanent damage to electronic components possible if no over-voltage protection is implemented

Quantitative methods are used to support the system definition and to refine the allocation of requirements to subsystem level.

In the following, different approaches for requirement allocation are introduced, including:

Equal allocation,
Proportional allocation (ARINC method),
Feasibility-Of-Objectives (FOO) method. These methods are applicable for a serial system structure only. For more complex system architectures, the reliability allocation should also make use of system level reliability assessment methods (see Section 9, Part 6 - System of this handbook).

The apportionment of the system level reliability target to sub systems is based on the following relation for serial system.

Equation

(1.5.1)#\[\hat{R}_{i}(t) = (\hat{R}_{S}(t))^{w_{i}}\]

Where,

\(\hat{R}_{S}(t)\) denotes the reliability target on system level,
\(\hat{R}_{i}(t)\) denotes the reliability target on sub-system level, and
\(w_{i}\) denotes the weighting factor of sub-system \(i\).

It is obvious that for different values for \(R_{i}(t)\) the same system reliability \(R_{S}(t)\) can be obtained. Thus, there are infinite possible solutions to allocate the system level reliability target to the subsystem level. Therefore, practical methods are needed to assist the system design in the apportionment of reliability requirements.

Equal allocation

This method equally distributes the system reliability on all the subsystems below the system level (from [NR_METHODO_1]). The weighting factor is equal for each subsystem and given by the number of subsystems. The reliability target for the subsystems is given by:

Equation

(1.5.2)#\[\hat{R}_{i}(t) = (\hat{R}_{S}(t))^{\frac{1}{n}}\]

Where,

\(\hat{R}_{S}(t)\) denotes the reliability target on system level,
\(\hat{R}_{i}(t)\) denotes the reliability target on subsystem level, and
\(n\) denotes the number of subsystems.

For example, a reliability target of 0.9 on system level would result in a target of 0.9826 for each sub system, if the system consists of 6 subsystems. If the target on system level is given as failure probability or failure rate the target on subsystem level is obtained from the following equations:

Equation

(1.5.3)#\[\hat{F}_{i}(t) = \frac{\hat{F}_{S}(t)}{n}\]

Where,

\(\hat{F}_{S}(t)\) denotes the failure probability target on system level,
\(\hat{F}_{i}(t)\) denotes the failure probability target on subsystem level, and
\(n\) denotes the number of subsystems.

Equation

(1.5.4)#\[\hat{\lambda}_{i}(t) = \frac{\hat{\lambda}_{S}(t)}{n}\]

Where,

\(\hat{\lambda}_{S}(t)\) denotes the failure rate target on system level,
\(\hat{\lambda}_{i}(t)\) denotes the failure rate target on sub-system level, and
\(n\) denotes the number of sub-systems.

The equal allocation is very easy to apply but does not consider the technical feasibility and experience from similar projects. This could result in very stringent requirements for some subsystem that cannot be achieved or only with increased effort. Thus, it might be necessary to refine the allocation in order to get to a more balanced share between subsystems.

Proportional allocation (ARINC method)

The proportional allocation also known as ARINC method takes historical data on subsystem reliability into account to distribute the reliability to subsystem level. The weighting factor is determined by the ratio of the observed subsystem failure probability to the total system failure probability, as shown in Eq. (1.5.5). The new reliability target is allocated to subsystem proportional to this factor.

Equation

(1.5.5)#\[w_{i} = \frac{F_{i.old}(t)}{F_{S.old}(t)}\]

Where,

\(F_{S}(t)\) denotes the failure probability target on system level,
\(F_{i}(t)\) denotes the failure probability target on subsystem level,
\(F_{S.old}(t)\) denotes the historical failure probability on system level,
\(F_{i.old}(t)\) denotes the historical failure probability on subsystem level.

Given that 15% of the system failures are caused by a failure of the power supply subsystem a reliability target of 98,43% is derived for the power supply system using Eq. (1.5.1) to achieve a reliability of 90% on system level, as shown in Table 1.5.11. Please note that in the example the failure probabilities of the subsystem were derived from the minimal cut set approximation to simplify the calculation.

Table 1.5.11 Example of proportional allocation of reliability targets to subsystems of a satellite#
Subsystem	Weighting Factor w_i	Failure Probability target (approximation)	Reliability target (\hat{R}_i(t) = \left( \hat{R}_S(t) \right)^{w_i})
Power	0.15	0.015	0.9843
Tele-Command/Telemetry	0.20	0.02	0.9791
Propulsion	0.10	0.01	0.9895
Orbit Control	0.15	0.015	0.9843
Structure	0.10	0.01	0.9895
Data Communication (Payload)	0.30	0.04	0.9689
System level	100%	0.1	0.90

Note 8

This table corresponds to a specific example with the predefined weighting factors. It should be adapted to the need and experience of the user.

Feasibility-Of-Objectives (FOO) method

The FOO method allows users to assign grading factors to subsystems and their components in order to determine how reliability targets are cascaded from top level to lower level. A subsystem with high grading factors is allocated a lower reliability than a subsystem with low grading factors. Default grading categories are complexity, technical level (state of art), operating time and environmental condition. Users can change these categories. Each rank is based on a scale from 1 to 10 and is estimated using both design engineering and expert judgment (see [NR_METHODO_1]):

System complexity: Complexity is evaluated by considering the probable number of parts or components making up the sub system and also is judged by the assembled intricacy of these parts or components. The least complex subsystem is rated at 1, and a highly complex subsystem is rated at 10.
Technology level: The state of present engineering progress in all fields is considered. The least developed design or method receives a value of 10, and the most highly developed is assigned a value of 1.
Operating time: An element that operates for the entire mission time is rated 10, and an element that operates the least time during the mission is rated 1.
Environmental conditions are also rated from 10 through 1: Elements expected to experience harsh and very severe environments during their operation are rated as 10, and those expected to encounter the least severe environments are rated as 1.The first stage for this allocation method is to calculate the total grading value for each subsystem. This is obtained by multiplying the grading factors from each category:

Equation

(1.5.6)#\[G_{i} = \prod_{j}^{n} g_{ij}\]

Where, \(g_{ij}\) are the grading values.

Equation

(1.5.7)#\[w_{i} = \frac{\prod_{j}^{n} g_{ij}}{\sum_{i} G_{i}}\]

An example for reliability allocation using FOO method is shown in Table 1.5.12. The system level reliability target of 0.9 is distributed to subsystems based on weighting factor obtained from this equation:

Table 1.5.12 Example of graded allocation of reliability targets to subsystem of a satellite#
Subsystem	Categories grading values (\(g_{ij}\))	(\(g_{ij}\))	(\(g_{ij}\))	(\(g_{ij}\))	Weighting Factor (\(w_i\))	Subsystem Target (\(\hat{R}_i(t)\))
	Complexity	Technology level	Operation time	Environmental conditions
Power	6	5	5	5	0.12550	0.98686
Tele-Command/Telemetry	6	4	7	6	0.16867	0.98239
Propulsion	5	6	5	5	0.12550	0.98686
Orbit control	8	8	5	7	0.37483	0.96128
Structure	4	2	10	8	0.10710	0.98878
Payload	7	6	7	2	0.09839	0.98969
System level					1.0	0.90

1.5.4.3. Reliability assessment for system architecture development during conceptual design#

The outcome of the conceptual design phase are concepts that will be implemented during the next stages of the system development. The selection of the preferred system architecture is essentially a trade-off among the various architecture options. According to ECSS-E-ST-10C [BR_METHODO_3] a trade-off report should contain the result of the evaluation of every identified alternative design solution with regard to the key technical requirements.

For each alternative design solution, the following should be performed:

Assessment of all the key technical requirements / evaluation criteria,
Presentation of the pros and cons of the design solution, and
Identification of the technical and programmatic risks.

The reliability prediction is an important part of the trade-off studies and could be applied to support system engineering, subsystem engineering and equipment level design engineering. It provides a quantitative assessment of alternative design solutions regarding the achievement of reliability requirements. In the trade-off study, a sensitivity analysis can support system engineering through quantifying how reliability on system level changes if certain parameters change. Importance measures (see Section 9, Part 6 - System of this handbook) could be used to perform such a sensitivity analysis. Furthermore, the reliability prediction for the trade-off should identify the equipment failure modes that significantly impact the system reliability. If the correlation between equipment failure modes and system reliability can be established, the aim of system reliability improvement is to eliminate or significantly reduce these failure modes by improving equipment quality or reconfiguring the system architecture. Different design concepts can cause different types of failure modes. The system designers should be aware of the underlying failure causes to achieve a robust design. To achieve the desired quality, the development process should be accompanied by an appropriate quality assurance procedure, see e.g. ECSS-Q-ST-10C [BR_METHODO_2].

1.5.4.4. Reliability requirement verification for compliance demonstration#

The verification activities should demonstrate that the design and architecture is compliant with the reliability requirements for the corresponding level.

This includes to demonstrate that:

All requirements have been taken into account and that the design is compliant with the requirements.
There is sufficient confidence that the final product will meet the requirements.
There is sufficient confidence in the correctness of the design on system level so that the specification and design of the next lower level can progress.
The verification activities should demonstrate that the design and architecture is compliant with the reliability requirements for the corresponding level. But the verification should also demonstrate that the subsystems on the next lower level are compliant with the requirements for the next lower level. It is important that compliance of each building block with the requirements is substantiated and that potential non-compliances are identified.
For reliability requirements, the verification is performed through quantitative analysis, see Section 9, Part 6 - System of this handbook. To finally demonstrate compliance with the top-level reliability requirements, the final reliability prediction should be performed when the data for the subsystems and component failure behaviour are available. Depending on the performance requirements, a consideration of degraded system operability might be useful.

1.5.4.5. Reliability assessment for lifetime extension and safe disposal#

To support life time extension decisions, the reliability prediction can need to be updated and in particular the reliability of the functions for safe disposal has to be considered to demonstrate compliance with space debris mitigation requirements, see also Section 1.5.3.2.6 of this part.

The following aspects should be considered:

If the life extension exceeds the qualification time, the assumptions made in the beginning of the project have to be revisited.
Degradation can need to be considered for lifetime beyond qualification time.
The result of in-orbit testing and reliability estimation based on in-orbit feedback should be taken into account to support decision on lifetime extension.

It is important to note that the probability of success of safe disposal should be demonstrated already during the design phase as part of the space debris mitigation requirements. The analysis for safe disposal is then reassessed if the lifetime in-orbit is completed and should be further extended if a failure occurred during the lifetime. That means the requirements for safe disposal determine to what extent a lifetime extension is possible.

Reliability prediction framework, process ans use

Contents

1.5. Reliability prediction framework, process ans use#

1.5.1. Reliability prediction framework underlying this handbook#

1.5.2. Assumptions and ground rules for the prediction#

1.5.3. Scope and focus of the prediction for different reliability prediction uses#

1.5.3.1. Reliability prediction versus reliability management#

1.5.3.2. Overview on different reliability prediction uses#

1.5.3.2.1. Reliability prediction as input to the design#

1.5.3.2.2. Establishment and verification of quantitative reliability requirements#

1.5.3.2.3. Reliability prediction supporting the choice of engineering design margins#

1.5.3.2.4. Reliability prediction supporting the choice of a test strategy#

1.5.3.2.5. Reliability prediction as input for business planning and design of constellations#

1.5.3.2.6. Reliability prediction to support decisions about lifetime extensions#

1.5.4. Reliability prediction during project life cycle#

1.5.4.1. Deliverables of the reliability prediction process during life cycle#

1.5.4.2. Management of reliability requirements#

1.5.4.2.1. Establishment of reliability requirements#

1.5.4.2.2. Allocation of reliability requirements#

1.5.4.3. Reliability assessment for system architecture development during conceptual design#

1.5.4.4. Reliability requirement verification for compliance demonstration#

1.5.4.5. Reliability assessment for lifetime extension and safe disposal#