Hi, and welcome to another episode of CTO Talks. I'm a Co-Founder and CTO of Glorium Technologies, an international technology company hosting tonight's event. This video podcast is for technical leaders and those who want to become them someday. We have guests with extensive experience in technology, and tonight's guest is Rosalind Radcliffe. But before we start our discussion, please consider subscribing to our YouTube channel. We have lots of great content coming soon, including future installments of CCO talks.
Now let's talk about our guest, Rosalind Radcliffe, an IBM fellow for 35 years. Well, 35 years is much working for the same company. It means that you're doing something right, that's for sure. And actually, before we continue, I mean the main thought about the conversation will be around this book, “Enterprise Bug Busting: From Testing Through CI/CD to Deliver Business Results.” And even though it says “bug busting,” I think we won't talk much about bugs today. We will twist our conversation slightly differently, and you will understand why we're not emphasizing bugs. Because, Rosalind, we will talk today about what about quality, right?
Absolutely, all about quality, and I'm very happy to be here today with you.
Why Testing Does Not Mean Quality
Excellent. Thank you very much for joining us, and thank you very much for this great opportunity to talk about your book, your experience and your knowledge and skills, and everything else. In one of the first chapters, you are saying most people understand but, for some reason, not using it in practice; that testing is not quality, right? Testing is just a measurement of how quality is done. And that's the majority of the discussion we're going to have today. But before we go there, in your experience, how many people do you think actually assumed or acted just like testing is a quality, approximately? Just out of curiosity.
If you look at the industry, most people think testing equals quality. If I have good testing, I have good quality, even if it isn't necessarily the right answer. But that is a very prominent thought process, unfortunately.
Quality is delivering capability to end-users that provide the function they expect, need, or want, depending on the definition. And it is secure, it's reliable, and it provides the capabilities when they need it. So it's all about the function, the capability, and the security and reliability for them to be able to do what they want to do.
Good quality also includes a good user interface or access, easy to use, within reason. Some things are not easy to do, so they're not as easy to use, but it's that focus on the actual capability that is delivered to the end-user. And testing helps you verify that, but you have to start with quality and design quality from the beginning.
Okay, that's what people are confused about. They are confused about the quality assurance, validation, and verification processes. It is all mixed up. Everything is the same; it's just synonyms, no difference, no borders. And that's all confusion coming. In your view, quality is meeting business expectations, right?
It has absolutely to do with meeting business expectations while also meeting security risk profile, all of those things that some people might not list as business requirements, but the non-functionals that are required as well.
Non-functionals, yeah, but there are still business requirements, right? If you have a financial business, you have specific requirements for security. If it's some…, I mean, I don't want to negate social networking or something like this, they do need to follow some security, but they're different from the financial industry.
Actually, a good question on who is working better from the security standpoint? I would question that from my experience. I was slightly confused when I was reading your book. It states that metrics are important, not just because you correctly measure something. But, I guess, you send a proper message to the team, the business, and the stakeholders because some metrics might be wrong, some metrics might be good, and some metrics can motivate to do some bad stuff.
It really depends, and you systemize those metrics in the following structure: availability metrics, business value, customer satisfaction, development, learning happiness, operational metric, quality, security, value delivery speed, and offering. You actually put metrics for quality as separate metrics. I thought business value, availability metrics, development, and operation security mean quality. So why did you put it as a separate metric here?
They absolutely are a part of quality. Part of the problem is the industry uses the word quality metrics for a set of terms. So, when we measure things like the number of defects and code coverage, we call those quality metrics.
But everything is quality, which is the same problem with quality assurance teams. We're using a term for a testing team; usually, it isn't the one responsible. They are partially responsible for quality. But we can't totally remove ourselves from the fact that the industry uses some terms. Quality assurance teams exist, and things called quality metrics exist, but we need to look at the bigger picture, which is part of the book's point. Let's look bigger; let's look at quality first, just like we should look at security first. It's everybody's job.
Great Wall Between Business and Development
So if a business person is coming and saying, “You're not doing your work properly,” it means that you're not doing it either, right? Something is wrong in between. Since we did touch on the business aspect of the quality, there is usually a separation between the business and development domains. And by development, I mean everybody - developers, the quality assurance department, our DevOps designers, and everything else.
And for some reason, there is a huge Chinese wall between these two domains. So, what can you say about this Chinese wall? How does it affect quality? How should we, I don’t know, break this wall, and what should we do about this?
Those brick walls when we think about DevOps, we talk about breaking down the brick wall between developers and operations. But it should be Biz-Dev-SEC-INF-QA-Ops, and I may be missing somebody there. The point is to break down all the silos. And that's why I like product teams and the idea that everyone works together to deliver the capabilities, and business has to be part of it.
We must have that partnership with the business. Otherwise, we end up throwing things over the wall. And in any place where you throw things over the wall, you give it to a different team; you have an opportunity to get it wrong; you have a chance for confusion; you don't get that fast feedback. If you really have a product team, if you really have a group of people working together, then everybody understands.
The other problem with that brick wall is that the business doesn't understand what it takes to do the work. So unrealistic requirements come out, and we all have lived with a few of those. If you're working together, then you understand what it takes to deliver value, and so you can focus on the highest business need, what do I really want first? Or it's not important once I see how this process works. What is the way to drive that value? What's the best experience for my end-users for the capabilities that I'm doing it?
Well, that's why a partnership is the only way to solve that problem. If business and IT literally work together, they can understand that better. They can collaborate better, and they don't have that misunderstanding. I mean, sometimes you end up with, "I have to have this." Well, why do you have to have that? And a lot of times that requirement comes through is, "I have to have it," and they're leaving out the part that "Oh, the legislation just changed, and the laws just made it, so I must do x, what I have to do it by that date. That wasn't a made-up business date; that was legal regulation. So, we want to do it before the regulation comes in because we don't want to miss that date.” Those kinds of things.
But at the same time, if your team cannot deliver it, from the business standpoint, yes, you will be late to regulation. Yes, you will be in violation with some compliance, and then you need to do something from the business standpoint, not just stand and demand, "I just needed because the law required this." I guess reasonable expectation and communication between both parties provides good solutions.
Okay, good. I asked this question already. You dedicated a lot in your book to space for continuous integration and deployment. The majority of the book is focused on the mainframes, not the desktop or maybe not regular cloud computer. Value of continuous integration and continuous deployment from your standpoint, why do we need it? Why is it so important?
Wait, if we don't continuously validate or understand quickly what's going wrong, we spend way too much time doing the bad thing or not sure. By doing continuous integration, continuous delivery, we have to do continuous tests; we have to do continuous validation. We have to partner better with the business to be delivering that frequently. So it helps push this change in culture: "I'm going to build something small, I'm going to deliver it, I'm going to see the value, and I'm going to get that feedback.”
And that change in mentality. I've always said you can do Scrum fall, you can do lots of things. But if you push to having to deliver frequently, and you pick a week, (and I know that's not really continuous delivery), but even at a week's delivery, you can't fake it. You have to automate things. You can't get enough manual time in that process. And so, if you're pushing the process, you have to automate we remove all those manual errors. We remove all those users in the system, making changes that I have to go deal with later. We remove all those problems from the system.
And by getting continuous delivery, I can test quickly. I can really understand. I can do experiments. I can understand if users like this way of doing something or not. I can pull it back quickly by delivering an old version if needed. I've got lots of flexibility if I have a continuous delivery option, and if I don't, then a lot of people are probably spending a lot of manual time doing a lot of things, and nobody enjoys repetitive manual tasks. If we have a pipeline and automation, then people can focus on the things that excite them and that deliver business value.
Let me summarize, and you tell me if I'm wrong, and I will sprinkle some of my ideas here. So, the CI/CD process is actually not just a process to have or to do something, as you say, like a small delivery product and everything else. It's actually a process that forces you to take specific steps that improve quality. It enhances quality through the decomposition of the task. You deliver those small tasks instead of a huge one that you have no idea how it's working. It makes sure that you didn't break any previous automation test or anything like this, and it verifies you on the fly.
But in the same way as many other parts, I don't know, risk management, control management, many other frameworks like this just improve the quality. It's not because I want it, it's not because I like it, because I do it. I mean, it's just quality, right? Is that a correct summary?
Yeah, it absolutely helps you change the way of working, change the way of thinking. And when we think about risk, and we think about separation of duties, all sorts of things that are regulations and or best practices and or lots of things throughout the years, by doing things with automation, we remove the errors that are happening in the system because someone's doing it manually. And that is a quality improvement, removing those repetitive manual tasks. And realistically, I haven't seen this statistic recently, but at one point in time, it was somewhere near 56 percent of the errors in production were caused by manual configuration errors, not by problems.
We didn't deploy it automatically. We deployed it manually, and so a user made a mistake, which happened. If you do something manually, you're opening yourself to mistakes, and if you get rid of all the manual parts (assuming you've tested your automated deployment, yes), then you don't have those problems, at least, yeah.
It's actually very interesting. Hence, failures of the manual deployment are a manifestation of a missed piece that controls the quality of your product in general, but for some reason, those manifestations are ignored. They think, "Oh, next time, do it better." If the mistake was made, it would arise again, maybe not next time, but every 10th time, every hundredth time. It doesn't really matter, but it will happen. It’s not “if” but “when.”
Let's talk about how the industry evolved over time. You saw a lot of innovations, changes, and mind shifts in your experience during those 35 years at IBM. I can mention a lot of different technologies and framework approaches, but let's just go to the old-fashioned waterfall and Scrum. You mentioned Scrum fall. So, please tell us what Scrum fall is and why you emphasize it in your book.
When I started in development, we didn't do an actual waterfall. We did much closer to what we talked about today. We had teams who built the function, delivered the function, and ran the function. And then problems occurred, or security happened, or “take your choice,” or applications got bigger. And we developed waterfall.
Over the years, regulations, challenges, errors all built up this waterfall process to ensure there were all the quality gates, make sure there was the separation of duties, make sure there were all those requirements, and so we handed our work over. Never was that a good idea, but it was a way to solve particular problems.
When agile came out, everybody got wonderful. I'm going to get rid of all that process, except most large and waterfall companies had all these teams, and so they converted to scrum fall, as I put it. The development team would become agile and they would have their sprints. And then they'd hand it over to the quality assurance team, who would do their two months of manual testing anyway, so it was still scrum fall.
It was scrum followed by waterfall. And so scrum fall is something I see all over the industry, and it really is not a good idea. It's probably the worst of all choices in many ways because I'm backing up all this function from development and not having it tested. So when we go to agile and modern development practices, we have to get that whole life cycle and bring it together, and we have to think again about why we did.
So, for those of us who've been around long enough and can remember the old days of doing things, automated testing, some of the other things that we used to do that were honestly easier to do because we had one big centralized system. When I started, we had mainframes doing computing, and then we had this PC invention. And then we did distribute computing, and all these things made it more complex and made it harder because out of all these peace parts, as we move in and people are using the cloud. It's simpler in one way because you don't have to deal with the infrastructure, but it's harder in another way because I've got to deal with all these services.
So, we have to deal with this change. The ability to deliver quickly is great, but if I don't have the right automated testing, if I don't focus on quality from the beginning, if I don't have the right architectural design in the beginning, I'll end up with just as big a mess in the end.
So, this change has to remove the bad parts of waterfall and the over-the-wall throwing, but we still have to focus on design, we still have to understand the business requirements so that what we deliver quickly is actually a value, is manageable and is monitorable so that I can make sure I don't have a problem, and I can provide the availability that my application requires because not all applications have to be available all the time. I mean, credit card processing does, but there are plenty of things that don't have to be available all the time, so you've got to provide the availability required for the thing you're delivering.
Resistance to Change
Okay, so how do you fight, especially in the enterprise environment, it's very hard, right? A lot of established practices and managers know how to do it properly. And you are just who? Are you going to teach me how to do it properly? So what do you do?
Not invented here? “I've always done it this way. I've always done it this way! Why do I need to change? It works!” Actually, the 'I've always done it this way, and it works' is the biggest problem because it really doesn't work. It has worked, but we've been working slower than the business needs. We need to be able to move faster. And so yeah, it used to workish, but it can't work today if I have to be able to make changes every day. I can't have a manual testing cycle if I have to deliver every day. If I have to change frequently, that's one reminder of why they need to change.
Another reminder they need to change is the discussion that's come out from the auditors about the fact that pipelines actually are better for automation, for auditors, and for separation of duties. Actually, that means humans aren't doing the deployment. This can make things better. And so getting out of this, the auditors say I have to do it, is the other thing you have to get over in large enterprises. And there's been more information out from auditors about this being a good thing to do. That's also helped with this push to say no, we can move faster with quality because we're going to focus on quality from the very beginning. And the pipeline is going to help ensure our separation of duties and our tracking and our process controls in a way that's much better and easier than all of the old steps that we used to have.
We just need to remember to change the process documentation to change how you work. Otherwise, you will fail your audit because you didn't follow your process.
Sometimes people use compliance to get an excuse or make me do waterfall. But from my experience, it was a very funny story. A development team was discussing moving to agile 10 years ago, most moving to agile, which was half-and-half.
The first half says no, we can’t, it's a medical device, it's a gated system, it's gate 128, it's only waterfall, we cannot do this. It's only waterfall, that's it. And the compliance person approached us and said, “Guys, I am a compliance officer. My instruction doesn’t say that you cannot do agile. So, I don't know what you're talking about. I don't know why you rethink me or my department. I'm not putting any restrictions on you. Do whatever you want. Yes, you will have a gate system, but even within this gate system, you can still run agile. Nothing prevents you from doing this.” So this happens, right?
It really does. Who can I blame for allowing me not to have the change? Because change is hard, and that’s it. I mean, that's why I wrote the book; that's why many books come out about this change process. The reason I've been at IBM for 35 years is because I like change. I get to do lots of different things. The more we, as technologists, focus on change, the better. Let's do things better; let's continuously improve and learn. We'll all be better off and do better for society, even.
Yep, that's true. Now my favorite topic - uncertainty. I love this topic, and we'll discuss it regarding quality. So, uncertainty and quality, how do you think they are connected?
How do you think they conflict with each other? Because quality assumes the expected behavior, predictable results what business needs, and so on. But the uncertainty on the other side says, well guys life is uncertain, business is uncertain, it's your problem. So, I don't care, right?
Wait, there is no perfect world and there's never any perfect code. So we really have to balance our focus on quality with risk acceptance and with understanding. One of the reasons I like chaos engineering, sort of, is the idea that uncertainty is always true. You have to pay attention to the fact that something will fail in the system, so you have to design for that fact. You have to plan for that fact.
And again, this leads to doing design correctly in the first place, but making sure you understand when something fails, what's going to happen to the rest of the system. And you have to plan that from the beginning. This is why, if you start with a quality focus, think about what my requirements are. Do I have to be available all the time? What are my considerations, and what are all the piece parts of my solution?
If I'm running my own hardware, okay, I've got the hardware. I can understand my own hardware. If I'm running in the cloud, I've got to understand, and I'm still running on hardware. It's just not mine. Somebody else is running the hardware. I've got networks. What's the network considerations? You have to consider all those pieces and assume something's going to go wrong, so it will.
There were a couple of reasons. One is, I've, there's at least one picture of a book behind me. I've written books before, but without my name on them. They said, 'author IBM' because, in the old world, author IBM was the choice. So, one part of me said I wanted to publish a book with my own name on it —part of the reason.
Here is the second reason. I was spending a lot of time on the road, talking to many clients and working with a lot of people, and it's hard to tell the stories and get the information out to everybody traveling around the world, and then COVID hit. And so, I was sitting at home, I didn't have to be on planes all the time, I wasn't traveling all over the world, and I still needed to get the message out. So, if I publish a book, more people can read it, see it, and consume it, and I'm sitting at home. So, while I was sitting at home working on things, I wrote the book.
And I like telling stories, explaining what's happened in the years that I've worked. Why we got to where we got to, why we need to change, and why we need to focus on quality. It was something fun to do. So, I worked on it and got the book out during COVID, which is the least fun time to put a book out. You don't get to do book signings and all those other fun things, but I can share the information and get people to receive it.
I wrote it to talk about the fact that quality is not quality assurance. It is not a quality assurance team. So, the value, the concept, the fundamental principle is the quality is everyone's role, and even in large enterprises that have been around for 110 years or so, whatever you can change, you can do.
This means you can adopt parts of some of the mainframe stories, particularly for large organizations who didn't understand how they'd gotten to where they were. Because if you're new to technology or have never been in a large company, how on Earth is it possible they'd be doing it that way? Well, you know, if you've done things, been in the industry, and certain errors have happened, that's how we got to where we were. It wasn't because people were not trying to do the right thing. They were. We just got to this point, and we needed to do a reset. We need to change the way we work and change that culture to be this continuous improvement, continuous learning, continuous testing, continuous everything culture.
Is QA Needed?
All testing is absolutely critical. It just not needs to be earlier, we don’t need to wait. We have to do unit testing, automated testing, and testers are there from the very beginning. Write your test first, and then go from there. All sorts of ways to shift it all the way left, so I get fast feedback. And then there's some testing, like performance testing, scalability testing, that are really key skills.
So when we say quality assurance doesn't give us quality, there are some skills that are really important. To ensure a high-quality outcome, there are specific skills within the quality assurance process that are critical. These skills include scalability testing and performance testing. By testing how the system performs under load, we can determine whether it's ready for production or if it needs further refinement.
It's also essential to test early on in the development process, so issues can be identified and resolved from the beginning. Even if you roll out a function to a small subset of users, it's important to test it thoroughly to ensure it can handle that subset. So, in short, testing is critical, but don't wait until the end. Test from the beginning.
We have not improved our quality. We must focus from the beginning, look at the application, and understand the design. If we want to improve quality, we need to know what we have, and then we need to focus on what are the areas that are causing us trouble, what are the areas that are causing the most problem. Is it user experience? Is it performance? Is it actual code quality?
Because there are different things that can cause a bad experience. You could have the best code, but the experience is lousy, so okay, it's perceived as bad quality. And it's all of those things that can lead to that.
If we want to focus on quality, if somebody says "time to focus on quality," I'm like, "Okay, hold on a minute, what do you really mean?"
But I do want to take the other twist. Sometimes the quality happens in life cycles, you start something new, and usually, you cut too many corners in the beginning, and you gotta get out. And so there are times when you say, "Wait, I do need to focus on quality," but it's not from the standpoint of everybody, right?
Yes, so you come back and fix the technical debt, and you come back and add those automated tests that you didn't do because you were trying to get out really quick. It's that kind of thing. Okay, I have a sprint that says I gotta get something out because there's some really big business requirement, and I circumvent some things I shouldn't have, but we do that, everybody does.
You come back and fix that then so that doesn't keep making it worse and worse. Especially when you put out early MVPs and didn't do everything you should have done in the MVP. Making sure you go back and fix that technical debt as quickly as possible so that you can now manage that application going forward is really important.
There is a danger when business assumes that the development team can deliver any feature requested within a short timeframe. Putting off technical debt to the next year may seem convenient, but it can lead to a dangerous cycle that's difficult to exit. It's crucial to understand the risks involved and not to play with fire. Be aware of what you're getting into, as it can be very dangerous.
Do you want to add something else about your book? Or do you want to send a message to the audience or say something additionally on top of what we just discussed?
I believe that quality is the key point. It's our collective responsibility to think about it from the very beginning. We should prioritize quality and focus on it from the start. Just like with Security, Quality should be our first priority. When we transitioned to Agile, some people mistakenly thought that Agile meant no more design, but that's not entirely true. We need to start with design, even if we don't spend three months on a detailed architecture. It's important to understand what we're building and the non-functional requirements that need to be met. This way, we have a better chance of building something of value.
We must take the time to understand the aspects, criteria, and true non-functional requirements of what we're building. We need to identify the scope and use cases to ensure that what we're building can provide quality in the first place. This is crucial, especially since technology is now pervasive in our lives. Computers are in every device we use, from our cars to our homes. Software plays a significant role in people's lives, and we must pay attention to its quality.
It's possible that you don't think what you're building matters to someone's life, but there are things you don't know that could affect someone's life or livelihood. So it's essential that we all focus on quality and understand what we're delivering because it matters. It matters a lot.