It is, perhaps, your dream job – doing software testing for positive world-changing applications such as space exploration. But that comes with additional concerns, such as lives at stake and too-far-to-repair constraints.
Every quality tester worries about the cost of missing defects. But imagine the scenario when lives are at stake, and when embedded flaws can be expensive or impossible to fix. That’s what it’s like for QA testing at NASA – and it applies to equipment such as rocket engines, fuel mixes, satellites, space habitats, as well as to ordinary computer software and hardware.
What makes NASA's testing requirements unique? Here’s a take-off point – and how the U.S. space agency’s methods can help not-for-space testers and QA practitioners.
Testing and QA within NASA is handled by a mix of dedicated departments, including the Office of Safety and Mission Assurance (OSMA).
I had the pleasure of chatting with two NASA test and QA testers: Tim Crumbley, Software Assurance Technical Fellow, who supports the NASA headquarters OSMA (located at NASA's Marshall Space Flight Center in Alabama); and Jeannette Plante, Quality Engineering Technical Fellow, who also supports OSMA – she works on Hardware Assurance – located at NASA headquarters in Washington, DC.
"Most of the mission software is mission-critical and often also safety-critical." --Tim Crumbley, NASA Software Assurance Technical Fellow
Crumbley started at NASA in 1987, working on the International Space Station (ISS) and its software development. “I was primarily in the engineering side of the house for 31 years; and then in early 2019, I moved to OSMA to improve and automate how we do software assurance."
"The scope of my domain is mission hardware: space flight and aeronautics hardware," says Plante. "This includes launch vehicles like SLS space launch systems, and processes like welding. I have a background in electrical engineering parts and assemblies."
Plante started working at NASA in 1987 as an electrical parts engineer. " I have studied how things fail, which has to do a lot with how they're built and their manufacturing process attributes," says Plante. "I expanded my knowledge from electronic parts to assemblies, and then to supply chains, and found my way to having a view of all hardware manufacturing."
In other words, some testing activities at NASA are the same as for us groundhogs, while other aspects apply uniquely to their high-stakes missions and environmental concerns.
"Approximately 40-50% of the total software development project lifecycle cost involves testing, which is in line with industry software cost models," says Crumbley. "Therefore, from a software lifecycle cost perspective, software testing becomes a large part of our budget."
"NASA embeds quality assurance throughout the entire software lifecycle," says Crumbley. "This totals approximately 6% of our software development cost."
Testing also represents a large part of the hardware budget, says Plante. "The number of people involved in writing the requirements in the early stages, especially for hardware...you have them for raw materials, piece parts, and sub-assemblies."
"The type of work that QA does is less about inspections than you might think," says Plante.
A lot of work goes into decisions about what’s wanted and the measures that determine conformance. "If you set your requirements for a mission too high, for something like a high-risk R&D experimental payload, and you don't have a big budget, when you find a non-conformance somewhere in the lifecycle, now you have to burn money to determine if you really needed that physical condition in the finished hardware,” Plante explains. “You could run out of money on the ground trying to achieve a feature you may not actually need and that you can't afford. And of course the opposite is true. If you don’t set the limit or tolerance right, you may accept hardware that will fail later.”
Situations like this, says Plante, are among the reasons for angst around how much authority Quality requirements have late in the lifecycle, when the cost to rework is so high. “When and how do we decide we need to depart from an established quality requirement? This is why getting the requirements right first and building in those quality levels is critical for NASA," she adds.
"One of the biggest keys to generating good quality software is making sure we have good detailed software requirements," says Crumbley. "For what we do, that means requirements that are detailed enough to represent all the software functionality and capabilities; and good detailed software requirements contribute to making the software testable."
Also, stresses Crumbley, "From a software perspective, it's not just writing good software requirements for the software people, but writing requirements that the system and sub-system engineers can understand.” The objective of a well-written detailed software specification is that the hardware engineer understands how the software controls the system.
How are test and QA different for NASA versus other industries?
"We consider software quality as part of the process from start to finish, not something we just do at the end," says Crumbley. "Good, complete software testing helps us ensure that we have a good quality software product as we go forward. This isn't unique to NASA – but for NASA, it has to always be our approach, since the mission, and, often, lives, are at stake.”
"For most of our missions, and the spacecraft and software code created for the mission is one of a kind," says Crumbley. "Software engineering is a core capability and key enabling technology for NASA's missions and supporting infrastructure."
Take, for example, the software for the NASA Space Launch System (SLS), being developed to provide heavy-lift capability, says Crumbley. "For the SLS software testing, we use a lot of simulations and models," says Crumbley. "In the early stages of the software testing we simulate the inputs,” he says. “The second part of testing includes engineering units and actual hardware units to test the software operations and interfaces. This can include details like having the correct cable lengths for the data buses to ensure proper timing during testing.”
Hardware and simulations are also used to test the software from an end-to-end perspective from the ground commands to the final separation phase. “During ascent, it's a ten-minute flight – but that's a critical ten minutes for the software operation," Crumbly adds.
"Most NASA hardware doesn't lend itself to in-service repair," says Plante. "That creates a different paradigm that makes building in quality more urgent than to other industry sectors."
The budget is always an issue. "Our mission costs are high, and system complexity increases as it progresses through its lifecycle," says Plante. "So the later in the lifecycle a problem is found, the more it costs to fix it or to make other adjustments to overcome the quality shortfall."
Scheduling adds yet another constraint, Plante points out. "Time and timing are critical – not just in terms of hitting orbital launch windows, but also for logistics constraints like being ready for your schedule spot on the launch pad or in a unique testing facility like a thermal vacuum chamber."
Finally, materials and physical processes may behave differently in space, such as in micro or zero gravity, multi-G boost acceleration, or extreme temperatures. This means more often-unique tests have to be conducted.
"The bad things that we are looking for are called 'defects,'" notes Plante. "If a problem slips through because the standard test that we used wasn't looking at an attribute, that's called a 'quality escape,' as in, 'our normal routines, tests, and process controls have let this escape us.' We have to ask how we can close that gap."
One famous example of an escape, recounts Plante, is in the ceramic capacitor world, where defects in the layers can elude a test. “The part would later crack and become dysfunctional. We've been working for years on how to detect these defects and prevent capacitor failures after they've been installed into systems,” he adds.
Crumbley offers a story from building the Chandra X-Ray Observatory space telescope. "We found during on-the-ground testing that the Chandra spacecraft data bus was sending out random bits, which in the correct order could result in a valid command. So we were occasionally getting an extra command executed during ground testing that we had not sent or did not expect.”
"We spent a long time trying to determine what was happening and how this was occurring," says Crumbley. "After months of investigation, we determined that we had a bad data bus hardware connection. During thermal changes the bad connection would generate erroneous signals on the data. Good thorough testing found the problem and allowed us to launch Chandra.”
Once Chandra was aloft, Crumbley notes, "It was up there for a number of years before we found any software defects."
"Most of the mission software is mission-critical and often also safety-critical," says Crumbley.
"You get what you pay for," says Plante. "Requirements, whether they are form/fit/function, process controls, the controls associated with test and inspection, or tolerance on a pass/fail limit...these all cost money.”
“You have to be aware of what the context is, what you are trying to accomplish,” Plante says. “What performance elements are the most critical? How reliable does it have to be? You have to be in touch with the mission objectives, as well as commercial production objectives. These drive where you put your requirements. Quality Assurance responds to the requirements; you can't say it's low quality if you didn't specify the goal up front. The requirements are how we all get on the same page – having QA objectives align with engineering and science objectives and be in synch with programmatic strategies that include how much money is available and the most value-added ways to invest in QA.
Dream job, right? Perhaps you want to apply for a job at the space agency.
"If you want to develop good software or perform software assurance for NASA, study the industry software engineering development process models. Learn how and what is involved in performing good software engineering," suggests Crumbley. "What's the right way to do software, what are all the activities involved in software development.” Good software engineering is more than just understanding the latest programming language or software auto-coding tool, after all. “The key is understanding the software and system engineering processes. How the pieces of a project work together, how they relate, and how your part fits into the overall system and operations."
Crumbley recommends the CMMI Institute's Capability Maturity Model Integration (CMMI) as a good process model. "We use the CMMI model as a tool to see how our software development practices compare with other industries and what software practices areas we have that we need to watch to reduce our overall risk on a project."
"The focus for an engineer would be on understanding processes and where the controls are," says Plante. "Consider that all the manufacturing processes have attributes of control, and control limits. You have to learn about the physics of failure, how do things fail, for example, based on what environments they are in. And how do you find defects in highly-engineered products."
Also, urges Plante, learn about new QA techniques and methods. "We are on the cusp of digital transformation in Quality Assurance, often being referred to as Quality 4.0, which is related to Industry 4.0. You should also learn about 'Digital Twins' – creating digital replicas of things, products, processes and systems."
These people work in an environment that captures every science fiction fan’s imagination. Surely they have favorite science fiction that resonates with their jobs?
"APOLLO 13," says Crumbley, "For the way the movie was laid out, the attention to why we do what we do and how important it is, what steps were necessary to recover from quality manufacturing defects that were built into those things."
Crumbley and Plante’s experiences and insights can be helpful for testers and QA specialists even if they aren’t heading to infinity or beyond. Perhaps it’s useful advice for your down-to-earth tasks.
Learn why test automation is so critical for the modern software development lifecycle. Our introduction to testing methodologies includes an evolving context of bug finding-and-fixing, including both manual and test automation.