A Seasoned Tester's Crystal Ball

Monday, April 13, 2020

Blood on the Terrace

On a Finnish summer evening, a group of friends got together on a summer cottage. They enjoyed their time and each other's company, with a few drinks. But as things unfold in unexpected ways, all things coming together, one of them decided it was a good idea to play with a hammer. End result: hammer hitting a foot, lot of blood flowing to the terrace.

The other person on the terrace quickly assessed the situation, with a feeling of panic summing it simply: blood on the terrace. And what do you do when you have blood on the terrace? You go and get a bucket and a rag to clean it up. After all, you will have to do this. Blood on the terrace could ruin the terrace! All of this, while the friend in need of patching, was still in need of patching, bleeding on the terrace. "Hey, go get Anna", coming from the person bleeding on the terrace corrected the action and the incident became a funny story to recount to people.

I could not stop laughing as my sister just told me this story of something she did. It was a misplaced reaction made under sense of helplessness and panic. How something that was necessary to do, but was not necessary to do right there and then turned out a funny story of how people behave. I'm telling this here with her permission and for a reason.

Misplaced reactions are common when we build software. Some of the worse reactions happen when we feel afraid, panicking and out of control. When we don't know what is the right thing to do and how to figure that out in a moment. And we choose to do something, because something is better than nothing.

You might recognize this situation: a bug report telling you to hide a wrong text. You could do what the report says and hide the text. Or, you could stop to think why that text is there, why it should not be there and why it being there is a problem in the first place. If you did more than what the immediate ask was, you would probably be better off. Don't patch symptom by symptom, but rather understand the symptoms and fix the cause.

While a lot of times the metaphor we use is adding bandaids, blood on the terrace describes the problem we face better. It is not that we are just doing something that really isn't addressing the problem in its full scale. It is that we are, under pressure with limited knowledge, making rash judgements out of all the things we could be doing, and timing the right action to the wrong time.

When you feel rushed, you do what you do when time is of essence: something RIGHT NOW. What would it take to approach a report you get as your first step being to really pay attention and understand what the problem is, why it is a problem and if whoever told you it was a problem was a definitive source. Or if that matters.

It was great that the terrace did not end up ruined. The foot healed too. All levels of damage controlled.

Friday, April 10, 2020

Reporting and Notetaking

Exploratory Testing. The wonderful transformation machinery that takes in someone skilled and multidisciplinary and time, crunches whatever activities necessary and provides an output of someone more skilled, whatever documentation and numbers we need and information about testing.

What happens in the machinery can be building tools and scripts, it can be operating and observing the system under test but whatever it is, it is chosen under the principle of opportunity cost and focused on learning.

When the world of exploratory testing meets the world of test automation, we often frame same activities and outputs differently. When a test automation specialist discusses reporting, it seems they often talk about what an exploratory tester would describe as note taking.

Note taking is when we keep track of the steps we do and results we deliver and explain what we tested to other people who test. Reporting is when we summarize steps and results over a timeframe and interpret our notes to people who were not doing the testing.

Note taking produces:

freeform or structured text
screenshots and conceptual images
application logs
test automation that can be thrown away or kept for later

Reporting produces something that says what quality is, what coverage is, how long what we say now stays true, and what we recommend we would do about that.

Automation can take notes, count the passes and fails, summarize use of time. The application under test takes notes (we call that logging) and it can be a central source of information on what happened.

When automation does reporting, that usually summarizes numbers. I find our managers care very little for these numbers. Instead, the reporting they seek is releases and their contents.

When automation does log of test case execution, we can call that a report. But it is a report in scope very different than what I mean by a report from exploratory testing - including automation.

Thursday, April 9, 2020

It does what it is supposed to do, but is this all we get?

Yesterday in a testing workshop I run 3rd time in this format online, something interesting happened. I noticed a pattern, and I am indebted to the women who made it so evident I could not escape the insight.

We tested two pieces of software, to have an experience on testing that enabled us to discuss what testing is, why it is important and if it interests us. On my part, the interest is evident, perhaps even infectious. The 30 women participating were people from the Finnish MimmitKoodaa program, introducing women new to creating software to the skills in that space.

The first software we tested was the infamous Park Calculator. The insight that I picked up on came quite late to a bubbling discussion on how many problems and what kind of problems we were seeing, when someone framed it as a question: Why would the calculator first ask the type of parking and then give the cost of it, when a user would usually start with the idea that they know when they are traveling, and would benefit from a summary of all types of parking for that timeframe? The answer seems to be that both approaches would fit some way of thinking around the requirements, and what we have was the way who ever implemented this decided to frame that requirement. We found a bug, that would lead to a redesign of what we had, even if we could reuse many of the components. Such feedback would be more welcome early on, but if not early, discussing this to be aware was still a good option.

The second software we tested was the GildedRose. A piece of code, intertwined with lovely clear requirements, often used as a way of showing how a programmer can get code under tests without even reading the requirements. Reading the requirements leads us to a different, interesting path though. One of the requirements states that quality of an item can never be negative and another one tells it is maximum 50. Given a list of requirements where these two are somewhere in the middle, the likelihood of a tester picking these up to test first is very high - a fascinating pattern in itself. However, what we learn from those is that there is no error message on an input or output beyond the boundaries as we expect, instead given inputs outside boundaries the software stays on the level of wrongness of input not making it worse, and given inputs barely at boundaries it blocks the outputs from changing to wrong. This fits the requirement, but goes as a confusing design choice.

The two examples together bring out a common pattern we see in testing: sometimes what we have is what we intended to have, and fits our requirements. However, we can easily imagine a better way of interpreting that requirement, and would start a discussion on this as a bug we know will stretch some folks ideas of what a bug is.

With both of our test targets yesterday, we identified a bug where it fits a requirement but the requirement was not up to par what users would want. You get what you ask for, not what you could ask for if you understood how software works.
— Maaret Pyhäjärvi (@maaretp) April 9, 2020

Tuesday, April 7, 2020

Developer-centric way of working with three flight levels

The way we have been working for the last years can best be described as a developer-centric way of working.

Where other people draw processes as filters from customer idea to the development machinery, the way I illustrate our process is like I have always illustrated exploratory testing - putting the main actor in the center. With the main actor, good things either become reality or they fail doing so.

In the center of software development is two people:

the customer willing to pay for whatever value the software is creating for them
the developer turning ideas into code

Without code, the software isn't providing value. Without good ideas of value, we are likely to miss the mark for the customer.

Even with developers in the center, they are not alone. There's other developers, and other roles: testers, ux, managers, sales, support, just to mention a few. But with the developer, the idea either gets turned into code and moved through a release machinery (also code), and only through making changes to what we already have something new can be done with our software.

As I describe our way of working, I often focus on the idea of having no product owner. We have product experts and specialists crunching through data of a versatile customer base, but given those numbers, the product owner isn't deciding what is the next feature to implement. The data drives the team to make those decisions. At Brewing Agile -conference, I realized there was another way of modeling our approach that would benefit people trying to understand what we do and how we get it done: flight levels.

Flight levels is an idea that when describing our (agile) process, we need to address it from three perspectives:

Doing the work - the team level perspective, often a focus in how we describe agile team working and finding the ways to work together
Getting the work coordinated - the cross team level of getting things done in scale larger than a single team
Leading the business - the level where we define why the organization exists and what value it exists to create and turn into positive finances

As I say "no product owner", I have previously explained all of this dynamic in the level of the team doing the work, leaving out the two other levels. But I have come to realize that the two other levels are perhaps more insightful than the first.

For getting work coordinated, we build a network from every single team member to other people in the organization. When I recognize a colleague is often transmitting messages from another development team in test automation, I recognize they fill that hole and I serve my team focusing on another connection. We share what connections give us, and I think of this as "going fishing and bringing the fish home". The fluency and trust in not having to be part of all conversations but tapping into a distributed conversation model is the engine that enables a lot of what we achieve.

For leading the business, we listen to our company high level management and our business metrics, comparing to telemetry we can create from the product. Even if the mechanism is more of broadcast and verify, than co-create, we seem to think of the management as a group serving an important purpose of guiding the frame in which we all work for a shared goal. This third level is like connecting networks serving different purposes.

The three levels in place, implicitly, enabled us to be more successful than others around us. Not just the sense of ownership and excellence of skills, but the system that supports it and is quite different from what you might usually expect to see.

Saturday, March 28, 2020

A Python Koans Learning Experiment

I'm curious by nature. And when I say curious, I mean I have hard time sticking to doing what I was doing because I keep discovering other things.

When I'm curious while I test, I call it exploratory testing. It leads me to discover information other people benefit from, and would be without if I didn't share my insights.

When I'm curious while I learn a programming language, I find myself having trouble completing what I intended, and come off a learning activity with a thousand more things to learn. And having a good plan isn't half the work done, it is not having started the work.

On my list of activities I want to complete on learning Python, I have had Python Koans. Today I want to complete that activity by reporting on its completion and what I learned with it.

Getting Set Up

The Python Koans I wanted to do were ones created by Felienne Hermans. On this round of learning yet-another-programming-language (I survived many with passing grades at Helsinki University of Technology as Computer Science major), I knew what I wanted to do. I picked Koans as the learning mechanisms because:

Discovery learning: learning sticks in me much better when instead of handing me theory to read, I get examples illustrating something and I discover the topic myself
Small steps: making steady progress through material over getting stuck on a concept - while Koans grow, they are usually more like a flashlight pointed at topics than requiring a significant step between one and the other
Test first: as failing test cases, they motivate a tester like myself to discover puzzles in a familiar context
Great activity paired: social learning and learning through another person's eyes in addition to one's own is highly motivating.
Exploratory programming: you do what you need to do, but you can do what else you learn you need to do. Experiment away from whatever you have to deeper understanding works for me.

This time I found a pair and mechanism that worked to get us through it. Searching for another learner with similar background (other languages, tester) on Twitter paired me up with Mesut Durukal, and we worked pretty consistently an hour a day until we completed the whole thing in 15 hours.

The way we worked together was sharing screen and solving the Koans actively together. After completing each, we would explore around the concept with different values or extending with lessons we had learned earlier in Koans, testing if what we thought was true was true. And we wrote down our lessons after each Koan on a shared document.

The Learning

Being able to look back to doing this with the document as well as tweets two months after we completed the exercise is interesting. I picked up some key insights from Twitter.

On Koans for Python comprehensions, this one really threw us off for a moment. Still loving discovery learning, 6 hours in with @DurukalMesut. pic.twitter.com/53Zbvcfqd0

— Maaret Pyhäjärvi (@maaretp) January 21, 2020

Writing python I’m appreciating how human connection helps me recall. Working through Koans with @DurukalMesut gets connected with how I know remember how to do a particular thing.
— Maaret Pyhäjärvi (@maaretp) January 22, 2020

While doing Python Koans, there has been numerous times when we are supposed to assert type of exception without knowing which exact type of exception would come out. When we see it, we know if it is right. Exploratory programming.— Maaret Pyhäjärvi (@maaretp) January 30, 2020

Looks like completing #Python Koans with a pair is a 14-15 hours of effort. It has been a lot of fun! https://t.co/1Z1N42RKZ6
— Maaret Pyhäjärvi (@maaretp) January 31, 2020

Looking at out private document, the numbers are fascinating: 382 observations of learning something.

With 15 hours, that gives us an average of 25 things in an hour.

On top of those 15 hours, I had a colleague wanting to discuss our learning activity, and multiple whiteboarding sessions to discuss differences of languages the learning activity inspired.

Next up, I have so many options for learning activities. Better not make promises, because no matter how publicly I promise, the only thing keeping me accountable is activities that we complete together. Thanks for the super-fun learning with you, Mesut!

Users test your code

In a session on introduction to testing (not testers), I simplified my story to having two kinds of testing:

Your pair of eyes on seeing problems
Someone else's pair of eyes on seeing problems

My own experience in 99% of what I have ended up doing on my 25-year is that I'm providing that second pair of eyes, and working as that has made me a tester by profession.

Sometimes the second pair of eyes spend only a moment on your code as they are making their own changes adding features (another developer) and you do what you do for testing yourself. Sometimes it becomes more of a specialty (tester). And while the second pair of eyes often is used to bring in perspectives you may be lacking (domain knowledge), there is nothing preventing that second pair of eyes having as strong or stronger programming knowledge that you do.

You may not even notice your company has second pair of eyes, as there's you and then production. Then whatever you did not test gets tested in production, by the users. And it is only a problem if they complain about the quality, with feeling strong enough to act.

To avoid complaining or extensive testing done slowly after making changes, modern developers write tests as code. As any second pair of eyes notices something is missing, while adding that, we also add tests as code. And we run them, all the time. Being able to rely on them is almost less of a thing about testing and quality, and more of a thing about peace of mind to move faster.

In the last year or so, my team's developers have gotten to a point where they no longer benefit from having a tester around - assuming a non-programmer tester covering the features like a user would. While one is around, it is easy to not do the work yourself, creating the self-fulfilling prophecy of needing one.

Over an over again, I go back to thinking of one of my favorite quotes:

"Future is already here, it is just not equally divided"

I believe future is without testers, with programmers co-creating both application software and software that tests it. I believe I live at least one foot in that future. It does not mean that I might not find myself using 80% of my time testing and creating testing systems. It means that the division is more fluid, and we are all encouraged to grow to contribute beyond our abilities of the day.

The past was without testers but also without testing. To see the difference of past and future, you need to see how customer perceives value and speed. Testing (not testers) is the way to improve both.

Wednesday, March 25, 2020

One Eight Fraction of a Tester

As I was browsing through LinkedIn, I spotted a post with an important message. With appropriate emphasis, the post delivered its intended point: TEST AUTOMATION IS A FULL TIME JOB. I agree.

The post, however, brought me in touch with a modeling problem I was working through, for work. How would I explain that the four testers we had, were all valuable yet so very different? The difference was not in their seniority - all four are seniors, with years and years of experience. But it is in where we focus. Because, TEST AUTOMATION IS A FULL TIME JOB. But also, because OTHER TESTING IS A FULL TIME JOB.

As part of me pondering this all, I posted on Twitter:

I'm reconsidering my position around need of testers in teams, and as a tester, I am not doing this lightly. I am starting to believe that with agile (and learning cycles), developers get smart enough not to need manual testers around.
— Maaret Pyhäjärvi (@maaretp) March 23, 2020

The post started a lively discussion on where (manual) testers are moving, naming the two directions: quality coaches teaching others to build and test quality and product owners confirming features they commenced.

The Model of One Eight Fraction of a Tester

Taking the concepts I was using to clarify my thinking about different testers, a discussion with Tatu Aalto over a lovely refreshing beverage enjoyed remotely together drew the mental image of a model I could use to explain what we have. With two dimensions of 4x2 boxes, I'm naming the model "One Eight Fraction of a Tester".

1st Data Point

In our team, we have six developers and only one full-time manual tester. I use the word manual very intentionally, to emphasize that they don't read or write code. They are too busy with other work! The other work comes from the 6 super-fast developers (who also test their own things, and do it well!) and 50+ other developers working in the same product ecosystem. Just listing what goes on as changes on a daily basis is a lot of work, let alone seeing those changes in action. Even when you leave all regression testing for automation.

The concern here is that story and release testing both in our context could be intertwined with creating test automation. For level 1 testing to see features with human eyes, that could also happen while creating automation.

Yet as the context goes, it is really easy to find oneself in the wheel, chipping away level 1 story testing "I saw it work, maybe even a few times", story after story, and then repeating pieces of it with releases.

2nd Data Point

A full time exploratory tester in the team, taking a long hard look at where their time goes, is now confessing that the amount of testing they get done is small and the testing is level 1 in nature. The coverage of stories and releases is far from the tester focusing there full time. Instead, where time goes is enabling others in building the right thing incrementally (product owner perspective) and creating space for great testing to happen (quality coach perspective). While they read code, they struggle to find time to write it, and they use code for targeted coaching rather than automating or testing.

The concern here is that no testing is getting done by themselves. Even if they could do deeper story testing, they never practically find the time.

As the context goes, they are in a wheel that they aren't escaping, even if they recognize they are in it.

3rd Data Point

A most valued professional in the team, a spine of most things testing is the test automation specialist. They find themselves recognizing tests we don't yet have and turning those ideas into code. While they've found, with support of the whole team, particularly developers, time to add to coverage not only maintain things functional, maintenance of tests and coordinating that is a significant chunk of their work. While they automate, they will test the same thing manually. While they run the automation, they watch automation run to spot visual problems programmatic checks are hard to create for. That is their form of "manual testing" - watch it run and focus on things other than what the script does.

The concern here is that all testing is level 1. Well, with the number of stories flying around, even with all groups groups of developers having someone like this writing executable documentation on expectations exist, they still have a lot of work as is.

As context goes, they too are in a wheel of their own with their idea of priorities that make sense.

4th Data Point

Automation and Infrastructure is a significant enabler, and it does not stay around any more than any other software unless it is maintained and further developed. The test automation programmer creates and maintains a script here and there, test a thing here and there but find that creating that new functionality we all could benefit from needs someone to volunteer for it. Be it turning manually configured Jenkins to code in a repository, or our most beloved test automation telemetry to deal with the scale, there is work to be done. As frameworks are best being used by many, they make their way to sharing and enabling others too.

The concern here is that no testing gets done with a framework alone. But it without framework, it is also slower and more difficult than it should be. There are always at least three main infrastructure contributions they could make when they can fit one into their schedule, like any developers.

They have a wheel of their own they are spinning and involving every in.

Combining the data points

In a team of 10 people, we have 10 testers, because every single developer is a tester. With the four generalizing specializing testers, we cover quite many of the Eights.

The concern here is that we are not being always intentional in how we design this to work, it is more of a product of being lucky with very different people.

The question remains for me: is the "Story Testing lvl 10" as necessary and needed I would like to believe it is? Is the "Story Testing lvl 1" as unnecessary to separate from automation creation as I believe it is? And how things change when one is pulled out - who will step up to fill the gaps?

How do you model your team's testing?