Saturday, February 26, 2022

More to lifecycle for testing

 With work on recruiting, I have seen a lot of CVs recently. A lot of CVs mention "experience on SDLC" - software development lifecycle, and everyone has a varying set of experiences what it really means in practice to work in agile or waterfall. So this week I've done my share of conversations with people, modeling differences to how we set up "lifecycle" to test, and how others are doing it. Here's a sampling of what I have learned.

Sample 1. Oh So Many Roles. 

This team has embraced the idea of test automation, and defined their lifecycle around that. Per feature, they have a tester writing test cases, a test automation engineer implementing these written test cases in code, and an agreement in place where results on day to day app development belong to the developers to look at. 

My conclusion: not contemporary exploratory testing or even exploratory testing, but very much test-case driven. Leverages specialized skills and while you need more people, specializing people allow you to scale your efforts. Not my choice of style but I can see how some teams would come to this. 

Sample 2. So many amigas

This team has embraced the idea of scoping and specifying before implementing, and has so many amigas participating in the four amigas sessions. Yes, some might call this three amigos, but a story refinement workshop can have more than three people and they are definitely all men. So we should go for a gender-neutral feminine expression, right? 

For every story refinement, there is the before and after thinking for every perspective, even if the session itself is all together and nicely collaborative. People aren't at their best when thinking on their feet. 

My conclusion: Too much before implementation, and too many helpers. Cut down the roles, lighten up the process. Make the pieces smaller. This fits my idea of contemporary exploratory testing and leaves documentation around as automation. 

Sample 3. Prep with test cases, then test

This team gets a project with many features in one go, and prepares by writing test cases. If the features come quicker than test cases can be written, the team writes a checklist to fill in them as proper step-by-step test cases later. Star marks the focus of effort - in preparing and analyzing. 

My conclusion: not exploratory testing, not contemporary exploratory testing, not agile testing. A lot of wait and prep, and a little learning time. Would not be my choice of mode, but I have worked on a mode like this. 

Sample 4. Turn prep to learn time

This team never writes detailed test cases, instead they create lighter checklist (and are usually busy with other projects while the prep time is ongoing). Overall time and effort is lower, but otherwise very similar to sample 3. Star marks focus of effort - during test execution, to explore around checklists. 

My conclusion: exploratory testing, not contemporary exploratory testing, not agile testing. You can leave prep undone, but you can't make the tail in the end longer and thus are always squeezed with time. 

Conclusion overall

We have significantly different frames we do testing in, and when we talk about only the most modern ones in conferences, we have a whole variety of testers who aren't on board with the frame. And frankly, can be powerless in changing the frame they work from. We could do better. 


Core tasks for test positions

 In the last weeks, I have been asking candidates in interviews and colleagues in conversations a question: 

How Would You Test This? 

I would show the the user interface. I would show them a single approval test against settings API returning the settings. And I would show them the server side configuration file. The thing to test is the settings. 

I have come to learn that I am not particularly happy with the way people would test this. In most cases, describing the chosen test approach is best described as:

Doing what developers did already locally in end to end environment. 

For a significant portion, you would see application of two additional rules. 

Try unexpected / disallowed values. 

Find value boundary and try too small / too large / just right. 

I've been talking about resultful testing (contemporary exploratory testing) and from that perspective, I have been disappointed. None of these three approaches center results, but routine. 

For a significant portion of people centering automation, they would apply an additional rule. 

Random gives discovery a chance. 

I had few shining lights amongst the many conversations. In the best ones people ground what they see in the world they know ("Location, I'll try my home address") and seek understanding of concepts ("latitude and longitude, what do they look like). The better automation testers would have some ideas of how to know if it worked as it was supposed to for their random values, and in implementing it may run a chance of creating a way to reveal how it breaks even if they could not explain it. 

Looking at this from the point of view of me having reported bugs on it, and my team having fixed many bugs, I know that for most of the testers I have shown this to, they would have waited for the realistic customer trying to configure this for their unfortunate location to find out the locations in the world that don't work.  

I've come to see that many professional testers overemphasize negative testing (input validation) and pay all too little attention to the positive testing that is much more than a single test with values given as the defaults. 

As we have discovered essentially different, we also documented that. Whether we need to is another topic for another day.  

This experience of disappointment leads me into thinking about core tasks for positions. Like when I hire for a (contemporary exploratory) tester position, the core task I expect them to be able to do is resultful testing. Their main assignment is to find some of what others may have missed and when they miss out on all information when there is information to find, I would not want to call them a tester. Their secondary assignment is to document in automation to support discovery in scale over iterations. 

At the same time, I realize not all testers are contemporary exploratory testers. Some are manual testers. Their main assignment is to do what devs may have done in local in test environment and document it in test case. Later rounds of what they then do are use test cases again as documented to ensure no regression with changes. There is an inherent value also in being the persistently last one to check things before delivering them forward, especially in teams with little to none on test automation. 

Some testers are also traditional exploratory testers. Their main assignment, to find some of what others may have missed combined with lack of time and skills on programming leave out the secondary assignment I require in a contemporary exploratory tester. 

We would be disappointed in a contemporary exploratory tester if they did not find useful insights in a proportion that helps us not leak all problems to production, and contribute to automation baseline. We would be disappointed in a manual tester if they did not leave behind evidence of systematically covering basic scenarios and reporting blockers on those. We would be disappointed in a traditional exploratory tester if they did not find a trustworthy set of results, providing some types of models to support the continued work in the area. 

What then are the core tasks for automation testers? If we are lucky, same as contemporary exploratory testers. Usually we are not lucky though, and their main assignment is to document in automation basic scenarios in test environment. Their secondary assignment is maintain automation and ensure right reactions to feedback automation gives and the resultful aspect is delayed to first feedback of results we are missing. 

I find myself in a place where I am hoping to get all in one, yet find potential in manual testers or automation testers growing into contemporary exploratory testers. 

I guess we need to still mention pay. I don't think the manual tester or the automation testers should be paid what developers are paid, unless the automation testers are developers choosing to specialize in the testing domain. A lot of automation testers are not very strong developers not strong testers. I have also heard a proposal on framing this differently: let's pay our people for the position we want them to be in, hire on potential and guide on expectations to do a different role than what their current experience is. 

Sunday, February 20, 2022

How My Team Tests for Now

I'm with a new team, acting as the resident testing specialist. We're building a new product and our day to day work is fairly collaborative. We get a feature request (epic/story), developers take whatever time it takes to add it, and cycle through tasks of adding features, adding tests for the features and refactoring to better architecture. I, as the team's tester, review pull requests to know what is changing, note failing test automation to know what changes surprise us and test the growing system from user interfaces, APIs and even units, extending test automation either through mentions of ideas, issues or a pull request adding to the existing tests. 

For a feature that is ready on Wednesday, my kind of testing happens on the previous Friday, but I can show up any day in either pre-production or production environments and find information that makes changes to whatever we could be delivering the next week. While our eventual target is to be a day away from production ready, the reality now is two weeks. We have just started our journey of tightening our cycles. 

I tried drawing our way of working with testing into a picture. 

On the left the "Improve claims" process is one of our continuously ongoing dual tracks. I personally work a lot with the product owner in ensuring we understand our next requested increments, increasingly with examples. As important as understanding the scope (and how we could test it), is to ask how we can split it to smaller. As we are increasingly adding examples, we are also increasingly making our requests smaller. We start with epics and stories, but working towards merging the two, thus making stories something that we can classify into ongoing themes. 

In the middle is the four layers of perspectives that drive testing. Our developers and our pipelines test changes continuously, and developers document their intent in unit, api and UI tests in different scopes of integration. Other developers, including me as developer specializing in testing comment and if seeing the integrated result helps as external imagination, can take a look at. For now at least a PR is usually multiple commits, and the team has a gate at PR level expecting someone other than the original developer to look at it. All tests we require for evidence of testing are already included on the PR level. 

The two top parts, change and change(s) in pull request are the continuous flow. They include the mechanism of seeing whatever is there any day. We support these with the two bottom parts. 

We actively move from a developer's intent and interpretation to a test specialist centering testing and information to question and improve how well we did with clarified claims ending up into the implementation. Looking at added features somewhere in the chain of changes and pull requests, we compare to the conversations we had while clarifying the claims  with claims coverage testing. If lucky, developer intent matched. If not, conversations correct developer intent. As applying external imagination goes, you see different things when you think about the feature (new value you made available) and the theme (how it connect with similar things). 

When the team thinks they have a version they want out, they promote a release candidate and work through the day of final tests we're minimizing to make the release candidate a release, properly archived. 

With the shades of purple post-its showing where in the team the center of responsibility is, a good question is if tester (medium purple) is a gatekeeper in our process. Tester feeds into the developer intent (deep purple) with added information, but often not in the end of it all, but rather throughout and not stopping at release. The work on omissions continues while in production, exploring logs and feedback. There is also team work we have managed to truly share for all (light purple), supporting automations (light blue), and common decisions (black). 

There is no clearly defined time in this process. It's less of instruction on what exactly to do, and more of a description of perspectives we hold space for, for now. There's many changes on our road still: tightening release cycle, keeping unfinished work under the hood, connecting requirements and some selection of tests with BDD, making smaller changes, timely and efficient refinement, growing capabilities of testing to models and properties, growing environments … the list will never be completely done. But where we are now is already good, and it can and should be better. 

Friday, February 18, 2022

The Positive Negative Split Leads Us Astray

As I teach testing to various groups in ensemble and pair formats, I have unique insight into what people do when they are asked to test. As I watch any of my students, I know already what I would do, and how many of my other students have done things. Noticing students miss out on something, I get to have those conversations:

"You did not test with realistic data at all. Why do you think you ended up with that?" 

"You focused on all the wrong things you can write into a number input, but very little on the numbers you could write. Why do you think you ended up with that?"

"You tested the slow and long scenario first, that then fails so you need to do it again. Why do you think you ended up with that?" 

As responses, I get to hear the resident theory of why - either why they did not yet but would if there was more time, or more often, why they don't need to do that, and how they think there are no options to what they do as if they followed an invisible book of rules for proper testing. Most popular theory is that developers test the positive flows so testers must focus only on negative tests, often without consulting the developers on what they focus on. 

I typically ask this question knowing that the tester I am asking is missing a bug. A relevant bug. Or doing testing in a way that will make them less efficient overall, delaying feedback of the bug that they may or may not see. 

I have watched this scenario unfold in so many sessions that today I am ready to call out a pattern: ISTQB Test Design oversimplification of equivalence classes and boundary values hurts our industry

Let me give you a recent example, from my work. 

My team had just implemented a new user interface that shows a particular identifier of an airport called ICAO code. We had created a settings API making this information available from the backends, and a settings file in which this code is defined in. 

Looking at the user interface, this code was the only airport information we were asked to display for now. Looking at the settings API and the settings file, there was other information related to the airport in question like it's location in latitude and longitude values. Two numbers, each showing a value 50.9 that someone had typed in. How would you test this?

I showed it around for people asking this. 

One person focused on the idea of random values you can place in automation, ones that would be different every run time and mentioned the concept of valid and invalid values. They explained that the selection of values is an acceptance tester's job, even if the project does not have such separation in product development. 

One person focused on the idea that you would try valid and invalid values, and identified that there are positive and negative values, and that the coordinates can have more than one decimal place. We tested together for a while, and they chose a few positive scenarios with negative value combined with decimal places before calling it done. 

I had started with asking myself what kind of real locations and coordinates are there, and how I could choose a representative sample of real airport locations. I googled for ICAO codes to find a list of three examples, and without any real reason, chose third on the list that happened to be an airport in Chicago. I can't reproduce the exact google search that inspired me to pick that one, but it was one where the little info box of the page on google already showed me a few combos of codes and coordinates, where I chose 41.978611, -87.904724. I also learned googling that latitudes may range from 0 to 90 and longitudes range from 0 to 180.

It turned out that it did not work at all. Lucky accident brought me with my first choice to discover a combination of four things that needed to be put together to reveal a bug. 
  • The second number had to be negative
  • The second number had to be with more than four digits
  • The second number had to be less than 90
  • The first number had to be positive
Serendipity took me to a bug that was high priority: a real use case that fails. Every effort in analyzing with the simple ISTQB style equivalence classes and boundary values failed, you needed BBST style idea of risk-based equivalence and combination testing to identify this. The random numbers may have found it, but I am not sure if it would have been motivated to immediate fix like the fact that a real airport location for a functionality that describes airports does not work. 

Our time is limited, and ISQTB style equivalence classes overfocus us on the negative tests. When the positive tests fail, your team jumps. When the negative tests fail, they remember to add error handling, if nothing more important is ongoing. 

After I had already made up my mind on the feature, I showed it to two more testers. One started with coordinates of their home - real locations, and I am sure they would have explored their way to the other 3/4 of the globe Finnish coordinates would not cover. The other, being put on the spot fell into the negative tests trap, disconnected with the represented information but when pointing this out, found additional scenarios of locations that are relevantly different, and now I know some airports are below sea level - the third value to define that I had not personally properly focused on. 

Put the five of us together and we have the resultful testing I call for with contemporary exploratory testing. But first, unlearn the oversimplistic positive / negative split and overfocus on the negative. The power of testing well lies in your hands when you test. 

Sunday, February 13, 2022

Doing Security Testing

This week Wednesday, as we were kicking off BrowserStack Champions program with a meeting of program participants around a fireside chat, something in the conversation in relation to all things going on at work pushed an invisible button in me. We were talking about security testing, as if it was something separate and new. At work, we have a separate responsibility for security, and I have come to experience over the years that a lot of people assume and expect that testers know little of security. Those who are testers, love to box security testing separate from functional testing and when asked for security testing, only think in terms of penetration testing. Those who are not testers, love to make space for security by hiring specialists in that space and by the Shirky Principle, the specialists will preserve the problem to which they are a solution. 

Security is important. But like with other aspects of quality, it is too important for specialists. And the ways we talk about it under one term "security" or "security testing", are in my experience harmful for our intentions of doing better in this space. 

Like with all testing, with security we work with *risks*. With all testing, what we have at stake when we take risk can differ. Saying we risk money is too straightforward. We risk:

  • other people's discretionary money, until we take corrective action. 
  • our own discretionary money, until we take corrective action.
  • money, lives, and human suffering where corrective actions don't exist
We live with the appalling software quality in production, because a lot of the problems we imagine we have are about the first, and may escalate to the second but while losing one customer is sad, we imagine others in scale. When we hear RISK, we hear REWARD in taking a risk, and this math works fine while corrective actions exist. Also, connecting testing with the bad decisions we do in this space feels like a way of the world, assuming that bug advocacy as part of testing would lead to companies doing the right things knowing the problems. Speaking with 25 years of watching this unfold, the bad problems we see out there weren't result of insufficient testing, but us choosing the RISK in hopes of REWARD. Because risk is not certain, we could still win. 

The third category of problems is unique. While I know of efforts to assign a financial number to a human life or suffering, those don't sit well with me. The 100 euros of compensation for the victims of cybercriminals stealing psychotherapy patient data is laughable. The existence of the company limiting liability to company going bankrupt is unsettling. The amount of money police uses investigating is out of our control. The fear of having your most private ideas out there will never start to spark joy. 

Not all security issues are in the third category, and what upsets me about the overemphasis of security testing is that we should be adding emphasis to all problems in the third category. 

A few years ago I stepped as far away as I possibly can from anyone associating with "security" after feeling attacked on a Finnish security podcast. Back then, I wrote a post discussing the irony of my loss / company's loss categories proposing that my losses should be the company's losses by sending them a professional level services bill, but a select group of security folks decided that me running into a problem where it was a company's loss, ridiculing my professionalism was a worthwhile platform. While I did report this for slander (crime) and learned it wasn't, the rift remains. Me losing money for bug: testing problem. The company losing money for bug: security problem. I care for both. 

As much as I can, I don't think in terms of security testing. But I have a very practical way of including the functional considerations of undesired actors. 

We test for having security controls. And since testing is not about explicit requirements but also ensuring we haven't omitted any, I find myself leading conversations about timing of implementing security controls in incremental development from perspective of risks. We need security controls - named functionalities to avoid, detect, counteract and minimize impacts of undesired actors.

We test for software update mechanism. It connects with security tightly, with the idea that in a software world riddled with dependencies to 3rd party libraries, our efforts alone without the connected ecosystem are in vain. We all have late discoveries despite our best efforts but we can have the power of reacting, if only we are always able to update. Continuous delivery is necessary to protect customers from the problems we dropped at their doorstep, along with out lovely own functionalities. 

We test for secure design and implementation. Threat modeling still remains an activity that brings together security considerations and exploratory testing for the assumptions we rely our threat modeling decisions on as a superb pair. Secure programming - avoiding typical errors for a particular language - shows up as teams sharing lists of examples. Addressing something tangible - in readable code - is a lot more straightforward than trying to hold all ideas in head all the time. Thus we need both. And security is just one of the many perspectives where we have opportunities to explore patterns out of the body of code. 

We integrate tools into pipelines. Security scanners for static and dynamic perspectives exist, and some scanners you can use in the scale of the organization, not just a team. 

We interpret standards for proposals of controls and practices that the whole might entail. This alone, by the way, can be work of a full time person. So we make choices on standards, we make choices of detail of interpretation. 

We coordinate reactions to new emerging information, including both external and internal communication. 

We monitor the ecosystem to know that a reaction from us is needed. 

We understand legal implications as well as reasons for privacy as its own consideration, as it includes a high risk in the third category: irreversible impacts. 

And finally, we may do some penetration testing. Usually for the purpose of it is less of finding problems, but to say we tried. In addition, we may organize for a marketplace for legally hunting our bugs and selling our bugs with high implications to us over the undesired actors, through a bug bounty program. 

So you see, talking about security testing isn't helpful. We need more words rather than less. And we need to remove the convolution of assuming all security problems are important just as much as we need to remove the convolution of assuming all functional problems aren't.