Friday, November 22, 2019

35 Years of Exploratory Testing

Exploratory Testing is 35 years this year. I've celebrated this by organizing two peer conferences to discuss what it is today and my summary is that it is:

  • more confused than ever
  • different to different people
  • still relevant and important
Cem Kaner described it 12 years ago this way: 

The core of what I pick up from this is emphasis on individual tester without separation of cognitive sequence, optimizing value through opportunity cost awareness and learning supporting test design and execution throughout the work being done.  

I dropped words like project (because in my world of agile continuous delivery, they don't exist), and added words like opportunity cost to emphasize the continuous choice of time on something being time away from something else. I also brought in the concept of cognitive load separation as the idea is to not separate work into roles but build skills and knowledge in the people through doing the work. 

Exploratory Testing after 23 years

Cem's notes are available in the presentation, and I wanted to summarize them here. He identified four areas to describe from his circles:

Areas of agreement
  • Definitions
  • Everyone does it to a degree
  • Approach, not a technique
  • Antithesis to scripting
Areas of controversy
  • Not quicktesting - packaged recipes around particular theory of error; requires domain and application knowledge to do well
  • Not only functional testing - quality beyond functional is of concern to exploring
  • Uses tools - test automation is a tool, but there could also be tools specifically in support of exploratory testing
  • Not only test execution - not a technique but an evolutionary approach to software testing. You can do all things testing in an exploratory and not-exploratory way. 
  • Complex tests requiring prep included - cycle of learning is not in the moment but varies
  • Certifications - don't understand this style of testing and can be worthless or anti-productive for the industry trying to do exploratory testing
Areas of progress
  • Understanding of quicktests - from works of Whittaker, Hendrickson and Bach
  • Oracle problem - thinking around oracles has evolved
  • Learning and cognition in the focus of ET - individual and paired work
  • Multiple guiding models - everyone with their own
Areas of ongoing concern

  • Modeling an area of early understanding - talk of it goes on
  • Myths - making way to understand it is cognitively challenging, skilled and multidisciplinary
  • Tracking and reporting status - dashboards and time-boxed approaches were the whim of the day
  • Individual tester performance - we don't know how to assess that
  • No standard test tool suite - tools guiding thinking in the smart way

Exploratory Testing after 35 years

Previous areas of agreement are not that anymore. 
Some folks have decided that Exploratory Testing is deprecated. 
Other folks are rediscovering Exploratory Testing as the smart way of testing that was relevant 35 years ago that is different today when automation is part of how we explore rather than a separate technique. 

We still agree that it is an approach not a technique, just not if it is necessary separation. Wasteful practices of testing both with and without automation are still popular, and it is a cost-aware approach to casting nets to identify quality-related information. 

Previous areas of controversy are old lessons and not particularly controversial. Some of them (like certifications) describe a divided industry that no longer gets the same attention. Most of what was controversial 12 years ago, is now part of defining the approach. 

Areas of progress seem far from done from today's perspective. Quicktests give less value with emergence of automation. Oracles have been a focus of Cem Kaner's teaching for a decade after the previous summary and are working to understand them in respect to automation being closely intertwined in our testing. Paired work of testing was a whim of a moment as focus of really understanding the cognitive side of it but in addition to paired work, we now have mob testing - work in a group. 

Concerns are still concerns except for tracking and reporting status. With automation integrated into exploratory testing and continuous delivery, status is not a problem but industry is further divided to those in the fast paced deliveries and those who are not. 

What do we agree on, what are the controversies, what progress can we hope we are still making and what concerns we have at 35 years of exploratory testing? 

Areas of agreement
  • All testing is exploratory to a degree
  • Exploratory testing is skilled, multidisciplinary and cognitively challenging and finds unknown unknowns
Areas of controversy
  • Is exploratory testing even necessary concept in the world of continuous development, when the forms of it happening can be labeled as "automation", "pairing", "production monitoring", "learning", and "test automation maintenance". 
  • The early experts cling to materials and lessons from early 2000 and fail to connect well with a wider community, creating a tester community isolated from the overall software communities
Areas of progress
  • New voices thinking and sharing from practice first, in agile development delivering continuously: Anne-Marie Charrett, Alex Schladebeck, Maaret Pyhäjärvi and many others are working actively to move the area further
  • Developers are doing exploratory testing, the split to this being tester specialty is giving way to working together and learning together
Areas of concern
  • Lack of shared learning and seeking common understanding. The field feels more like a competition of attribution than learning to do testing well in modern circumstances.
  • Trainings on exploratory testing are hard to select for organizations as they include very different things in similar looking box. A field as big as exploratory testing should have a whole series of trainings, not one that introduces the concept again and again. 
  • Testers, in particular those without coding skills, are dropped out of industry. We are losing people who believe they don't code without helping them see their contributions through collaboration skills. People dropping out are disproportionally women. The old adage of "coding not being trainable" to many testers is harmful. 

Testing Computer Software, 2nd ed

I'm collecting some of the history of Exploratory Testing and for that, going into selected references that brought the idea to me. The first book I read on testing (I'm a newbie, only 22 years in the industry :D ) was
Cem Kaner et al. Testing Computer Software, 2nd ed. published: 1999
For research purposes, I have had my hands on the 1st edition too, which is from 1988 but it's been a while.

For the second edition, I go for index and look for exploratory testing.

exploratory testing
   - generally 6, 7-11, 215
   - boundary conditions 7-11
   - discover the program's limits 215, 241

Page 6 is a chapter "The first cycle of testing" and Step 4: Do some testing "on the fly". Here the book speaks on running out of formally planned test cases. It emphasizes that regardless, you can keep testing. It encourages to think of new things without spending much time preparing or explaining the tests. It encourages trusting instincts and trying anything that feels promising.

It introduced the example in question as something where you switched from formal to informal very early on as the program crashed. And then it introduces the concept:
"Rather than gambling away the planning time, try some exploratory tests - what ever comes in mind."
It then pulls a core concept out:
"Always write down what you do and what happens when you run exploratory tests."
It moves to give examples about how surprising problems you did not anticipate for when planning your formal test series are now useless on figuring out what the problems with software are.

Page 7-11 goes into Step 5: Summarize what you know about the program and its problems. It introduces with a table "Further exploratory tests". It then discusses a possible way of implementing as a reason for problems testing is finding with explaining ascii character codes and linking boundaries to them. Finally, it describes a second cycle of testing after fixes and reminds there will be multiple cycles and that you want to select tests for each cycle based on what the programmer said she changed. The book emphasizes that it is not a mere act of selecting but you need to come up with new tests too.

Of total of 16 pages on the first sequence of testing, the book discusses exploratory tests for 5 pages at a time the rest of the world was still speaking only of scripted tests.

Page 215 talks about Initial development of test materials. It talks of parallel work on testing  and the test plan saying "you never let one get far ahead of the other". It moves to talk about the first prompts you should consider: testing against documentation, starting to create a function list and analyzing limits. It never mentions exploratory testing, but describes how a plan is created as per "our approach" which reads as exploratory.

Page 241 is on components of test planning documents and specifically three kinds of matrices: environment matrix, input combination matrix and error message and keyboard matrix. For input combination matrix, it elaborates:
"Our approach is more experiential. We learn a lot about input combinations as we test. We learn about natural combinations of these variables, and about variables that seem totally independent. We also go to the programmers with the list of variables and ask which ones are supposed to be totally independent."
The book was a collaboration between Cem Kaner, Jack Falk and Hung Quoc Nguyen and as such it does not outline if exploratory testing was coined by one or all of them but it being published under their names says the was some level of agreement to allow such concept in the book.

While these are the specific parts marked for "Exploratory testing", the way I read it, the whole book is a description of exploratory testing as it was known then - the smart way of testing with care but attention to costs and nature of testing in product companies that lived or died with wasteful practices.

Thursday, November 21, 2019

Tools Guide Thinking

I have spent a significant chunk of my day today in thinking about how exploratory testing is the approach that ties together all the testing activities even when test automation plays a significant role. Discussing this with my team from every developer to test automation specialists to the no-code-for-me-tester, finding a common chord isn't hard. But explaining it to people who have a different platform for their experiences isn't always easy. Blogging is a great way of rehearsing that explanation. 

I frame my thinking today around the idea I again picked up from Cem Kaner's presentation on Exploratory Testing after 23 years - presented 12 years ago. 
"Tools guide thinking" - Cem Kaner
Back then, Cem discussed tools that would support exploratory thinking, giving examples like mindmaps and atlas.ti. But looking back at that insight today, the tools that guide a rich multi-dimensional thinking can be tools we think of as test automation.

We have a tool we refer to as TA - short-hand for Test Automation. It is more than a set of scripts doing testing, but it is also a set of scripts doing testing. To shortly describe the parts:

  • machinery around spawning virtual environments 
  • job orchestration and remote control of the virtual machines
  • test runners and their extensions for versatile logging
  • layers of scripts to run on the virtual environments
  • execution status, both snapshot and event-based timelining
Having a tool like this guides thinking. 

When we have a testing question we can't see from our existing visualizations, we can go back to event telemetry (both from product and TA) and explore the answers without executing new tests. 

When we want to see something still works, we can check the status from the most recent snapshot automatically made available. 

When we want to explore on top of what the scripts checked, we can monitor the script real time in the orchestration tooling seeing what it logs, or remote to the virtual machine it is running and watch. Or we can stop it from running and do whatever attended testing we need. 

We can explore a risky change seeing what the TA catches and move either back or forward based on what we are learning. 

We can explore a wide selection of virtual environments simultaneously, running TA on a combination we just designed. 

We want a fresh image to test on without any scripted actions going on, we take a virtual environment which is at our hands ready to run in 2 seconds it takes to type it into a remote desktop tool. 

It makes sense to me to talk about all of this as exploratory testing, and split it to parts that are by design attended and unattended. A mix of those two extends my exploration reach. 

With every test I attend to either by proactive choice or reactive choice being called in by a color other than blue (or unexpected blue knowing the changes), I learn after every test. I learn about the product and its quality, about the domain and for exploratory testing most importantly, I learn about what more information I want to test for. 

Tool guides my thinking. But this tooling does not limit my thinking, it enables it. It makes me a more powerful explorer, but it does not take away my focus on the attended testing. That is where my head is needed to learn, to take things where they should go. Calling *this* manual is a crude underrepresentation of what we do. 

A Good Week for Exploratory Testing

Exploratory Testing is a 35-year-old approach to testing, created back in its days to describe a style of testing common in Silicon Valley companies that was clearly different from what we saw as mainstream.

In the 35 years of Exploratory Testing, we've grown to understand it better.

We now understand that the core of exploratory testing is the degree to which learning from the test we do now impacts our choice of what we do next.

We understood clearly that the old and traditional way of testing where people design and document tests to be later executed effectively is not what we would call exploratory testing. Exploratory testing is something else.

Where we still struggle is understanding the relationship of test automation and exploratory testing.

Test automation is a process in which we learn. Exploratory testing is a process in which we learn. When we center incremental and iterative, and learning in multiple dimensions, they are the two sides of the same coin.

This was a good week for exploratory testing, because someone many of us follow raised it up by writing about it.
Fowler describes it as:
  • a style of testing that emphasizes a rapid cycle of learning, test design, and test execution
  • avoiding separation of script creation and execution
  • avoiding predetermined expected behavior (being open to more than what we predetermine)
  • seeking to find new behaviors not covered by already defined tests
  • seeking failures defined tests don't catch
  • informal but relying on discipline to do it well
  • something that is good to consider as a task type of its own
  • requiring skill, curiosity and a learning mindset
 Sounds like how great test automation is created! But this is exploratory testing? YES!

Fowler concludes his article: 
Even the best automated testing is inherently scripted testing - and that alone is not good enough.
I argue that the best automated testing is inherently exploratory testing. The bad automated testing is inherently scripted testing. Because test automation, when it really works out, is a product of documenting what we are learning in an executable artifact.

I specialize in exploratory testing. While I read and even occasionally write automation code, it is a platform for a great exploratory testing. It extends my reach. Exploratory testing includes test automation.

In testing, we start with a baseline of knowledge and tools. I start my day with a very different baseline that someone new to my organization. Where exploratory and scripted approaches differ is in where our feedback loops are, and what learning is welcomed. Scripted approach learns about testing in the end, rather than continuously.

Saturday, November 16, 2019

From Single Scenario to Feedback in Test Automation

Understanding your system and environment is core to designing a strategy of how you balance unattended testing with test automation to attended testing. ¨

Imagine you just build a new feature that really sums up to a single scenario: your user can say "Do a scan on Wednesdays at 10.21 every second week of the month".

To test that basic positive scenario as attended testing, you kick up a Windows machine you will test on. It just happens to be a out-of-the-box-clean US 64-bit Windows 10 with the latest OS patches all in it. You confirm that indeed there is a scan on Wednesday (as it happened to be today) at 10.21 and that it looks like it is supposed to happen again in a month. Having chosen a realistic scenario, one that a user would definitely be likely to set up, you leave the machine waiting and make a note to test again in one month, and verify there were no surprise runs in the meanwhile.

If you know something about exploratory testing, you would probably look at the basic positive scenario as a starting point, and explore a lot more around it:

  • Explore Day, Time and Cadence
  • Explore type of task, is "Scan" always the same thing, realizing it scans something that could be entirely different
  • Explore plausible (and less plausible) error scenarios around wrong values, missing values, mixed up values, unintended use
  • Explore other things that could happen on the computer simultaneously and could have an impact
It all works. Are you done?

You are not. It works on your machine. Your machine that is just one kind of machine there could be.

And it worked today. With the version you had now. It will change. 

This is the thinking that drives us to do test automation, or TA as we seem to lovingly call it. 

For every scenario, it gets run on:
  • Versions of Windows for Workstations and  Windows for Servers
  • 64bit and 32bit - same code, different executables that only can be tested with right bitness. 
  • Mixes of patch level and "computer cleanliness"
Sometimes, it gets run on more. There's a nice list of Windows OS versions that matters in what we are testing. 

No human can attend to all that. 

For our 550 scenarios, we ended up with 178745 tests run on one day that was busiest last week. A very small percentage of them fail but the failures are usually crash dumps related to timing that are super hard to reveal by human repetition or information on changes an their side effects.

This takes us from 'works on my machines' to ' works on some thousands of our machines'. Yet it does not mean that things work in production in full. 

The basic scenario gets tested on "my machine" as I implement the automation. The exploration that must happen to script a scenario qualifies as testing. 

The other dimension to attended exploratory testing are relevant. But so is the tooling to enable unattended exploratory testing, one that covers new environments. The tooling that aids us calls us when we need to attend to it. 

Yesterday, it was 0.9% of our tests that called us to attend to it. And knowing these dimensions we work, I was very happy with seeing the new telemetry give us the list of priorities of what would make most impact if we attended to it. 

And at the same time, I was still doing very traditional exploratory testing to find problems that automation was not the best fit for. 

Thursday, November 14, 2019

Seasonal Fluctuation

It's that time of the year in Finland where days are feeling shorter and shorter, with the dark period of the day taken more than half of the day. The lesser amount of light is bound to have its impact: it's dark when we go to work, and it is dark when we get home from work.

Just like people's moods have seasonal fluctuation in response to the visible external impacts, a phenomenon very similar happens for our testing at work.

The easy season to spot for a lot of us in software is ahead of us, with Black Friday starting the Christmas shopping season and the webshops all around the world are under heavy loads from the eager shoppers. We just had a lovely discussion Lisa Crispin from MABL hosted online to cover that one.

But let's think of the visible external impacts a little more. There's more to seasonal fluctuation than the day of heavy load e-commerce sites prep for, and knowing your company's and businesses annual clock gives you insight into priorities that can really make you shine in your choices of how you test right now.

Budgeting and End of Year

I think we've all received the emails towards the year closing that we need to remember to invoice whatever expenses we have. Budgeting and end of year are are such a common cycle in companies that it is hard to miss.

You want to ask for a raise, your odds of getting that are very much tied to the annual cycle.

You want to go to a conference, and you could notice a pattern on it's easy end of year ("we have to use our budgets, or we won't get one next year!") or next to impossible before the year changes.

The computer systems have a lot of logic around the year changing, and the budgeting seasons are a specific feature of financial systems of many sorts. When on a normal month we run some batch processing sets, the end of year can be a very special one and we only get to rehearse it once a year, and with little rehearsal, there is often a surprise ahead.

Christmas Freeze

A seasonal feature I have seen a lot is driven by a company's IT and operations department, scheduling annually a freeze time when changes to production don't happen. For other companies, the same period seems to be the time of taking the huge new systems into production, while everything else around is not allowed to change so that granularity for identifying problems is better.

The time just before Christmas freeze is often critical. Make sure you're risk averse at that time, because fixing is slower and harder. It starts with the assumption it should not happen unless it must happen.

The Events

You might have noticed that the plans the company you work for have a spike at some time of the year, and especially the new-cool-feature is something that creates a special kind of buzz a certain time.

The big event your company must show new stuff at is bound to have a spike in what is expected of you. The requests of "just one more thing" come from many directions. The emphasis of how important this is eats up a significant chunk of focus. For those of us with a bit of nerves, the seasonal spike promises we must watch out for symptoms of burnout, and create those personal ways of managing the stress the managers of all levels popping in to check progress cause.

The Two Spikes A Year Principle

A more subtle fluctuation I have noticed is that a lot of companies find it necessary to tell something big an important to customers two times a year. If you recognize those times, there's relevant goodwill to collect on paying attention to these spikes and ensuring all the changes we deliver in an agile fashion sum up to a story worth telling. Also that story gives food for thinking about tests we might not have considered in the continuous flow of changes and a timely heads up can be greatly appreciated.

The Human Tendency to Seek Achievements

Two more spikes I see annually come from categorizing people into two boxes. People in the management side bring out a spike in around October, ensuring all the dreams and wishes from the whole year they have been planning for get done before the year ends. This is a time when I see a lot of plans, meetings, workshops and the like. People in doing side, the spike comes earlier, right after summer vacations with fully refreshed minds tackling problems they now have ideas and energy to improve on.

Removing fluctuation

Over the years, agile and continuous delivery has changed a lot of the seasonal fluctuation to feel less intense and I welcome the less stressed continuously flowing value approach that can come with it. But the fluctuation is there, and if we notice it, and are able to react appropriately, it can really amplify the impact we have as individuals, and the perceptions of how helpful and useful other people around us consider us.

What is your annual cycle? How about chatting on that the next time you sit for a cup of coffee at work chatting work? The powers that impact us are good to know. 

Tuesday, November 12, 2019

A Feature Was Born

There is a fascinating phenomenon that I am following, one I would call "Overanalyze, Underdeliver". I find it fascinating because I catch myself often wanting to slow things down to think, not trusting the others on their thinking and assuming delivering something could be the end of it.

When delivering truly continuously, there is no beginning, and no end. There are just steps on our journey to build something our users find more valuable, rather than less.

There is an evolving product vision we work against. It wasn't defined by product management, but it is most certainly influenced by them channeling different stakeholders - with their particular kind of filter. It is emerging from discussions with many different parties, including customers we actively seek out.

From this foundation, a single developer can have a great idea of how to make things better.

In last week, I have been following a feature being born and the discussions and actions we take around it.

Awareness of such a feature was born two years ago, and the wishful thinking of it was cut down before it bloomed. It needed people in a particular team to have time for it and they had higher priority work.

In two years, things changes. Not that the particular team would have time, they don't. But we took our internal open source practices to a next level, where we don't only share components on our main programming language, but bravely go polyglot-one-more and with right motivation, can make changes beyond our previous scope.

So it bloomed again. The "we need to think this through" meaning "I can't think this through right now" came about again. But this time instead of spending time on thinking it through in an abstract way, a developer molded the thing in code.

Today came the time to think it through - demoing, testing and improving the feature as a group. Tomorrow is the time to get it in, in a pull request.

A feature was born. It was born in a time where choosing the discussion route, we would still be discussing. What fascinates me most is about how much power there is in breaking off the defaults and reorganizing the flow.

We probably saw the earliest exploratory testing & fixing we have been capable of so far - before ever making that pull request.

An early Christmas for a Testing Dreamer. And just a happy day for a process rebel.

Monday, November 11, 2019

What the Testing We Do Looks Like?

Back in the days before releasing frequently was a thing, testing looked very different. The main difference was that we thought of automation as something that was replacing attended testing, whereas we now see it more as a way of introducing scale of unattended testing we could never have done attended.

The stuff we attend to changed as the world around us shifted. We still do "testing" with just as many hours, but the work is split on working heavily on unit tests, test automation and other quality practices.

I try to explain the change I see with a metaphor: pool is not a bigger bathtub. Both these are containers of water (just like software development is just making and delivering code changes), but what you can do in a pool is very different than what you can do in a bathtub. Imagine a bath guard - kind of hilarious. Or a bath party - very eccentric. We create new things we can do with just a bigger container of water, and bringing down the release cycle does something very similar to the ways we work. The whole conceptual model of what makes sense changes.

In efforts to describe what seems to make sense right now, I work on finding words to describe what testing I do looks like to me. It is not a tool for sense making the whole world of all testing, and I don't need a tool for that - I don't work in the whole world, I don't consult in the whole world. I merely describe my lessons from where I am for the organization that I work for but also in my understanding of how the world comes together.

For me, it all is customer-obsessed  and developer-centric. We all care for where the money comes for our business. And we recognize and appreciate that we have something out there that people are already paying for that we want to always improve and never make worse. But we already have something valuable

Smart Developers Turn Ideas Into Code

Since it is already out there, I can't describe a phase of requirements gathering, but rather it begins with a vision of value. There is always, even if informally, a mostly shared base of understanding of the types of things we are providing for the customers. I recognize vision not from a document someone wrote, but from discussions with multiple members of community bringing perspectives driving us to a similar direction. Similar, not same, as vision is in its most powerful state when it guides individual actions without excessive coordination.

From having something out there working for the customers and vision, we come to a set of ideas on what we could try to make things better. This part of the process of coming up and acting on the ideas, turning them into code, is where the change gets made.

Some ideas are bigger, and they are hard to grasp alone. Smart people are not alone, but surrounded by other smart people. Together, we can make ideas better, and as such, the resulting code better. Or we can discover the ideas in the first place.

Let me share a small example of how I can reasonably expect my days to be from today. 
I had a Windows machine I had kicked up from an image last week, forgetting that the images in that particular set of images give me a fairly old version of windows, one where our .Net dependency for showing a full UI stops me from seeing the UI that I wanted to test. I run a windows update tool on Friday, and since that takes its time, did something different going back to the computer today. As I remote to the computer, I see the IP but I wasn't remembering the name of the computer. There is one very simple way for me to do it: logging into our security management portal and seeing it there. So I did.  
I found the computer, got things I needed but also noticed we had started showing both IPv4 and IPv6 addresses. Having been elsewhere, I had not followed that detail, but just looking at it I said that I didn't like a detail on how they were ordered. 5 minutes later of me just casually mentioning this, I had a pull request to review that fixed the problem. We added a bit of discussion around what I would test around more network interfaces that was hard to simulate, and another pull request on an additional unit test was created. 
My tester-contribution can happen anywhere, anytime, without constraint of a process. It will happen from a foundation of me (the tool) being in the right mind in the right place to connect things. And I cultivate the chances of that lucky accident through discussions but also, hands-on with my external imagination - the product - either directly or through a set of scripts known as test automation (TA). 

On doing my work, I rely on an existing structure that we have built and are enhancing as we learn what we are missing:
  • A Smart Developer creates code to change the application 
  • Unit tests on local machine to 80-90% coverage 
  • Pull Request Review - a minimum of second pair of eyes on change
  • Static tools of various flavors in the pipeline
  • Unit tests in the pipeline
  • TA in the pipeline (with TA telemetry)
  • CI environment application telemetry
  • Change-log driven exploratory testing
  • Changes out in other product line continuous beta
  • Changes out in internal pilot
  • Changes out in early access
  • Synthetic monitoring aka. production TA (or machines, in production, continuously monitored)
  • Production Telemetry, positive events and error events 
The like "TA in the pipeline" is more than a few scripts run, and I will dedicate a post of its own to it later. 

With every change, we try to leave things better than they were before. Caring developers pulling in the help they need, and making the help readily available is what testing looks like to me. Testers are developers. We change things, but we also change the other developers perceptions. 

We see things other people don't. 

Saturday, November 9, 2019

A Career Retrospective

With few decades in the software industry, I have something I would call a career. I would not have seen how my career unfolds in advance, and I could never have described what I do as a path I want to take.

A few heuristics have served me well:
  • Do something you enjoy (and some of it should bring money in)
  • Always be learning. 
  • Have a goal, just to recognize whether what you enjoy takes you there. If not, change the goal. 
For some years, I had a goal of becoming an international keynote speaker and I did a lot of my choices around that goal. I chose jobs that built me a platform of experiences to speak on, doing things hands on with product development rather than consulting. And I became a keynote speaker that wants to quit speaking and replace her contribution of 20-30 talks a year with 20-30 new speakers who start from a platform of mentoring.

My current goal goes under the working name "Maaret to Wikipedia" and if you have ideas on how to do that, I am open to suggestions. Currently it feel funny, self-indulgent and next to impossible to see the route, and it is already making an impact on how I prioritize - within things I enjoy - the things I end up doing. The best thing about this goal is how supported I feel for it at work with my close colleagues and how it helps me see some people who always lift me up when I need it (Marit van Dijk, I'm thinking of you).

A lot of the work I do is invisible.

I help people who are speakers to get their messages out better. Sometimes giving people the transformative insight into what they are speaking on takes me literally a few minutes, and the people seeing their own experience in a different light completely miss the shine I added.  They would not be better off without running into me - usually very intentionally on my part.

At work I facilitate a developer-centric way of working in a way that mixes holding space for good things, injecting good things and leading by doing some of the things. The work I do leaves behind very few pull requests, but many things others do shine a little better because of the stuff I do.

I change jobs fairly often, leaving behind people who, I would hope, know something more because our paths have crossed. The companies interest is not lifting their people up, rather to abstract us under a brand.

I write blogs, articles and all kinds of texts. It changes what some people do. Yet when they do it, doing it is their own success.

I build talks of my experiences, I show up to share, and I discuss with people. Meeting people gives me a lot of energy, new ideas and drive.

I'm not exactly a one trick pony within software - but software all in all is my trick. My interests are manyfold. I speak and write about exploratory testing, test automation, teaching programming, mob & pair programming, agile, management, self-development, conference organizing, speaking, diversity, and any observations around software that I feel like. I'm usually known for things I do on the side, rather than the things I focus on.

What I'm particularly proud of is my ability to re-invent myself and see my belief systems shattered - with my own initiative. Listing things that I believed to be true that aren't so is one of my favorite pastimes.

A Vague Timeline

In recent reflections, I have come to appreciate how large chunks of my work during my career has been left to oblivion as per how things are and personal choices of not sticking with them. They've all given me a platform to observe things from. They also bring out feelings of wishing someone would have taught my younger self some of the things I now know. But I also recognize that my younger self did what she could with the conditions she was under, and every experience I have had has made me the person I am today. Hindsight bias makes us feel like we could have known things, and if there is one thing exploratory testing really enforces in learning, it is that the reality of missing things is the reality and we outcomes are unpredictable. 


  • Describing test automation at work as a baseline for returning to research - on applying AI in testing, and applying testing in AI-based systems
  • Building a self-organized developer-centric team with modern agile practices that have enough structure for the powerless
  • Writing further my books on Mob Programming, Strong-Style Pair Programming and Exploratory Testing
  • Organizing a conference as experimentation platform to change the world of conferences
  • Helping aspiring speakers by finding them mentors with SpeakEasy (or mentoring them myself)
Before, each step going sort of backwards in time in a way that makes sense to me:
  • Becoming an expert in exploratory testing. I've done this all my career, and it is the one thing that has been my continued focus. 
  • Becoming an expert in engineering management. I did not realize I had been learning this in my test manager role before. A few decades of reading every book on the topic to manage up effectively as a tester did help. 
  • Becoming an expert in test automation. Moving it from none to some, and from some to better. Knowing well what better looks like. 
  • Speaking in conferences, meetups and delivering training sessions that total 399 sessions. 
  • Discussing (and improving) conference proposals in 15 minute time-slots over three years with about 500 people and discovering a process I call "Call for Collaboration". 
  • Popularizing "Testers don't break your code, they break your illusions about the code" by speaking about it, elaborating it with samples from my professional life, beyond testing conferences. The guy who said it did not do the work I did around it. Google for evidence and stop assuming my work belongs to him. 
  • Introducing frequent product releases where it was "impossible" as release updates computers in the millions. 
  • Introducing daily product releases where it was "impossible" as there was no test automation. 
  • Organizing 5 years of European Testing Conference to learn how (if) conferences should pay the speakers, to create a true networking conference and to bring together developers and testers on a shared testing agenda.
  • Becoming an expert in pair and mob testing (and programming). 
  • Teaching programming (in Java) to women over 30 and kids with the Intentional method using pair and mob programming as core instruments in teaching.
  • Teaching Software Testing at Aalto University of Applied Sciences / Helsinki University of Technology both as main lecturer but also as visiting industry speaker
  • Doing my first keynote to only be known as the woman the other keynoter spent their keynote bashing "out of respect and surprise how alike we think". 
  • Building and teaching a 22-day on-site Testing training program to enable unemployed career changers into the industry. Delivering a second iteration as independent trainer. 
  • Running Finnish Association for Software Testing for decade and letting it wither away as a man was rewarded and thanked for starting the thing. Starting Software Testing Finland (Ohjelmistotestaus ry) to start over, only to realize that there was no correcting as any communities around the topic in Finland are intertwined in people's minds. 
  • Becoming an expert in complex test environments. If you ever feel like talking about the kinds of environments that cost a million and take minimum of 6 months to deliver, then we have similar experiences. 
  • Becoming an expert in defect management and bug advocacy. Analyzing a large set of defect management tools in order to select one against requirements gathered in a fairly large organization. 
  • Becoming an expert in acceptance testing. I know how to get domain experts clueless on testing just enough structure to excel and not waste effort and impact the quality at start of acceptance testing through contracts and collaboration. I spent some years intensively learning it. 
  • Becoming an expert in test management. Running multi-million projects as test manager, but also running smaller ones. I did this for different companies to get the crux of it.
  • Becoming an expert in software contract quality and testing -related aspects. If you ever want to spend a few hours on discussing how badly contractors can behave and how you recognize loopholes in contracts around this, I'm your person. 
  • Becoming an expert in software processes leading up to agile. When Alistair Cockburn asks who has read his work on Crystal, there were not many others in the room that had. Research gives you chances to read and think deeply about what others are saying. 
  • Becoming an expert in benchmarking with the TPI-model. Analyzing 25 Finnish companies with TPI-model and doing a benchmark on state of testing in Finland. I can still speak on the details because I did the work even if the company kept me in the background. 
  • Doing my first talk on the topic of Extreme Programming in 2001. 
  • Researching (and publishing) on software product development, and (exploratory) testing
  • Becoming an expert in localization testing. I spent years running localization testing projects and doing it myself and learning everything I could read on then and since on how localization testing works. 

Even if I have my "Maaret to wikipedia" project, it serves more as a way of thinking through what there is that I could even do. In the end of the day, I go back to my heuristics: do what you enjoy, and always be learning. Goals move, but appreciation of learning with great people remains. 

Rethinking Test Automation - From Radiators to Telemetry

Introducing Product Telemetry

A week after we started our "No Product Owner" experiment a few years back, the developers now each playing their bit in product owner decided they were no longer comfortable making product decisions on hunches. In now common no hassle way, they made a few pull requests to change how things were, and our product started sending telemetry data on its use.

As so often is, things in the background were a little more complex. There was another product doing the pioneer work on what kind of events to send and sending events, so we could ride on their lessons learned and to a large extent, implementation. The thing I have learned to appreciate most in hindsight is the pioneer work they did on creating us an approach to care for privacy and consent as key design principles. I've come to appreciate it only through other players asking us on how we do it.

The data-driven ways took hold of us, and transformed the ways we built some of the features. It showed us that what our support folks know and what our real customers know can be very far apart, and we as a devops team could change the customer reality without support in the middle.

The concept of Telemetry was a central one. It is a feature of the product that enables us to extend other features so that they send us event information about their use.

At first product telemetry was telling us about positive events. Someone wanted to use our new feature, yay! From the positive, we could also deduct the negative: we created this awesome feature and this week only a handful of people used it, what are we not getting here?  We learned that based on those events, we did not need to ask all questions beforehand, but we could go back exploring the data to learn patterns that confirmed or rejected our ideas.

We soon came to conclusions that events about error scenarios would also tell us a lot, and experimented with building abilities to fix things so that the users wouldn't have to do the work of complaining.

This was all new to us and as such cool, but it is not like we invented this. We just did what the giants did before us, adapting it to ensure it fits to the ideas of how we are working with our customers.

We Could Do This in CI!

As telemetry was a product feature, we tested it as a feature, but did not at first realize that it could have other dimensions. It took us a while to realize that if we collected the same product telemetry from our CI (testing) environment than we did in production, it would not tell us about our customers but it would tell us about our testing.

As we did that, we learned things about the way we test (with automation in particular) that the scale of things creates fascinating coverage patterns. There were events that would never be triggered. There was a profile of events that was very different to that of production. A whole new layer of coverage discussions was available.

This was different use of the same feature we had in the product in test than in production.

The Test Automation Frustration

To test the product we are creating, we have loads of unit tests to do a lot of heavy lifting on giving feedback on mistakes we may make when changing things. As useful as unit tests are, we still need the other kinds of testing, and we bundle this all together in a system we lovingly call TA. As you may imagine, TA is shorthand for Test Automation, but the way I hear it, I rarely hear the long word at work but TA is all around.

"We need to change TA for this."
"We need to add this to TA."
"TA is not blue. Let's look at it."

TA for us is a fairly complex system, and I'm not trying to explain it all today. Just to give some keywords of it: Python3, Nosetest, DVMPS/KVM, Jenkins, and Radiators.

Radiator is something you can expect to see in every team room. The ones we're using were built by some consultants back in the days when this whole thing was new, and I have only recently seen modernized versions someone else built in some of the teams. It's a visual into all of the TA jobs we have and a core part of TA as such.

The Radiator builds a core principle on how we would want to do things. We would want it to be blue. As you see from the image of its state yesterday as I was leaving office, it isn't.

When a box in that view is not blue, you know a Jenkins job is failing. You can click on the job, and check the results. Effectively you read a log telling you what failed.

A lot of times what failed is that some part the TA relies on in its infrastructure was overloaded. "Please work on the infrastructure, or try again later."

A lot of times what failed is that while we test our functionalities, they rely on others. They may be unavailable or broken. Effectively we do acceptance testing of other folks changes in the system context.

Some people love this. I love it with huge reservations, meaning I complain about it. A lot. It frustrates me.

It turns me into either someone who ignores a red, or risking overlapping work. It requires a secretary that communicates for it. It begs people to ignore it unless reminded. It casts a wide net with poor granularity. It creates silent maintenance work where someone is continuously turning it back blue, that hides the problems and does not enable us to fix the system that creates the pain.

I admire the few people we have that  open a box and routinely figure out what the problem was. I just wish it already said the problem.

And as I get to complaining about the few people, I get to complain about the logs. They are not visitor friendly. I don't even want to get started on how hard it is for people to tell me what tests we have for X (I ask to share my pain) or for me to read that code (which I do). And logs reflect the code.

From Radiator to Telemetry

A month ago, I was facilitating a session to figure out how to improve what we have now in TA. My list of gripes is long, but I do recognize that what we do is great, lovely, wonderful and all that. It just can be better.

The TA we have:

  • spawns 14 000 windows virtual machines a day (older number, I am in process of checking a newer one)
  • serves three teams, where my team is just one 
  • tests 550 unique tests for my team for number of windows flavors on pull request
  • tests all the 15 products we are delivering from my team
  • runs 100 000 - 150 000 tests a day for my team
  • finds crashes and automatically analyzes them
  • finds regression bugs in important flows
  • enables us to not die out of boredom repeating same tests over and over again
  • allows us to add new OS support and new products very efficiently

The meeting concluded it was time for us to introduce telemetry to TA - and some of the numbers above on the unique tests and number of runs daily are our first results of that telemetry in action.

Just as with the product, we changed the TA product to include a feature that allows us to send event telemetry. 

We see things like passes and fails now in the context of the large numbers, instead of the latest results within a box on the radiator. 

We see things in multiple radiator boxes combined together into the reason we before needed to verify from the logs. 

We see what tests take long. We see what tests pass and what fail. 

And we have only gotten started.

The historic date of the feature going live was this week Thursday. I'm immensely proud of my colleague Tatu Aalto for driving through code changes to make it possible, and the tweets where he is correcting me on my optimism warning he had a few bugs he already fixed. I'm delighted that my colleague Oleg Fedorov got us to see a solution through seeing things. And I can't wait to see what we make out of it. 

Monday, November 4, 2019

A meeting culture transformation

As I was looking into mob programming some years back, we summarized a common theme of complaints into a little cartoon with people discussing in a meeting room.
Person 1:
My team is interested in trying Mob Programming.
The idea is everyone works together on one computer.
The person at the keyboard is just typing what the whole team tells them to. So everyone is involved, instead of 5 people watching 1 person work.
You rotate quickly, every 5 minutes, to develop cross-functional teams and eliminate knowledge silos.
Ideas get implemented the best way the team can no matter who has them.
Misunderstandings and bugs are minimized.

Person 2:
Sounds like I'd be paying 5 people to do 1 job.
Now let's stop talking such nonsense. I still have a lot of slides to go through.
The word around is that managers hate mob programming. As a manager who wants my team to do mob programming but they refuse, I think we love blaming managers for our own assumptions we did not keep in check.

Up until this morning when I came to office, I was discussing how Mob Programming is different than a meeting. What changed this morning is that a colleague read my latest Mob Programming Guidebook  and pointed out that while we don't really do full-on mob programming, we have managed to transform our meetings into little mob sessions.

It's funny how you need someone else's eyes to see how you're different.

For the last three years here, I have not gone to a single meeting with slides prepared.
I don't go unprepared. But I never ever write an agenda in advance.
When I start a meeting, we build an agenda. It might be that we actively  take time to build it. Or it might be that we build it by parking themes that pop up that are relevant but not about the thing we are trying to sort out right now.
We work the agenda within a timebox either by doing the most important work first, or by doing just enough of it that the rest can happen offline, outside the meeting without others losing context completely.

As my colleague points out: all our meetings are little mob sessions. How about yours?

Sunday, November 3, 2019

Mobbing with an Audience

I've run some hundreds of mob programming and testing sessions with new groups for purposes of conference talks and trainings, and while I prefer setting up a full day session so that I can mob with the whole group of 25 people, sometimes I end up splitting the group for demo purposes. I was writing about this for the new version of Mob Programming Guidebook, and thought it might make useful content just as a blog post. 

Mob programming with an audience is a special setup that is useful tool especially to someone teaching mob programming, teaching any skills in software development in a hands-on style making new kinds of sessions available for conferences, or generally running demo sessions with partial session participant involvement. As a conference speaker and a trainer, a lot of our mob programming experience comes from facilitating mob programming sessions with various groups. For a training, we usually set up the whole group into a mob where everyone rotates. For conference sessions where time constraints limit participant numbers for effective mobbing, we use mobbing with an audience.


For mobbing with an audience, you split the room to two groups:
  • The Mob. For the most effective mob made of complete strangers is small. You want to have a diverse set of mob programmers. These are the people doing the work. 
  • The Audience. The rest of the group sit in rows as audience. The role of the audience is to watch and make observations, and their participation is welcome when doing a retrospective.
For the mob, you will set up a basic mob setup in the front of the room with chairs for each person, whiteboard furthest away from the computer to ensure speaking volume for the designated navigator through the physical setup.

For this setup, you will need a room with chairs that are freely moving. Make sure text on the screen is big enough not only for the mob to see, but the audience to follow as well.


As we have run some hundreds of sessions with various groups in this format, we have had things go wrong in many ways.

Things you can do in advance to ensure less problems
  • If the room is big, ask for a microphone for both the driver and designated navigator. It is essential that people in the room can hear their dialog. While there are no decisions allowed on the driver seat, speaking back to the navigators pointing out things you see and they don’t is often necessary. 
  • If you have only one microphone, give that to the designated navigator. Even in smaller rooms, the microphone can work as a talking stick the designated navigator passes around for other navigators and can help create an atmosphere where everyone in the mob gets to contribute. 
  • Make sure the text on the screen is visible from the back row. Avoid dark theme, it does not serve you well for live coding and testing in front of an audience. 
  • When selecting the diverse mob, what you need to do for this depends on who you are. If you are a white man facilitator and want women, start with inviting women or facilitate mob member selection in a way that gives you a diverse set of mob programmers. As a white woman, women volunteer for me in ways they don’t for the men and I need to work and I need to work on other aspects of diversity. 
  • For a demo mob, you may want to demo a group with experience working on the problem and even together. If that is your aim, invite the people you want for the mob in advance. 
  • A new mob with different experiences highlights many powerful lessons around collaboration and people helping each other and your goal to set up a fluent demo is probably infrequent. The new programmers exclaiming “they now know how to do TDD” as equal contributors is a powerful teaching tool. 
Things you can do while mobbing to improve the experience
  • Encourage people in the audience who want to be navigating from the audience to join the mob. To be more exact, demand that or holding their perspective that can be very disruptive. 
  • If you want to introduce who is in the mob, you can do that on first round of rotation. If you want deeper introduction, you can have a different question to tell about themselves on each round of rotation. 
  • When people rotate, ask them to tell what they continue on. It helps to enforce the yes and -rule and is sometimes necessary when nervous participants have been building their private plan waiting for the hot seat. 
  • When group is stuck, ask questions. “Does it compile?”, “What should you do next?”, “Did you run the tests?”, “What are your options now?”. Your goal is not to do things for them but get them to see what they could be doing. 
  • When group is stuck in not knowing how to do a thing, say “Let me step in to navigate” and model how to do a thing for short timeframe. Expect the group to do that themselves the next time. 
Things you can do in retrospective to save up a messy session
  • Facilitate a retrospective towards discussions around reasons we could learn from for lack of progress
  • Introduce theories or ideas of how you could try doing things different the next time. 
  • Find your own style of facilitating groups of strangers. Having seen multiple people facilitate, there are style differences where one person’s approach would feel off on another. Strong-handed “supporting progress” and light-handed “enabling discovery” will result in sessions that are different. 

Saturday, November 2, 2019

Never Stop Learning

I have a full time work that I enjoy, and I very carefully review my own satisfaction to the impacts going on at work. I require myself a balance of being productive and generative. Not one to the other, but a balance of these two.

I'm being productive when I:

  • strategize testing and communicate strategies so that we are better aware of problems I will be looking for
  • test (possibly documenting as test automation) to add to coverage of what might work but particularly identify things that did not
  • have the gazillion discussions leading to over time to a process improvement or someone else's raise
  • when I fix problems, be in it the program or in the way people interact
I'm being generative when I: 
  • teach others how they do better testing when I am not around to do it
  • lead people into insights that make then do things in a way that is more productive
  • bring in ideas that inspire me and through me, us overall
The way I control my work weeks is that I try to be mindful doing things that are directly for my employer the 40 hours a week, and then have 'hobbies' that resemble work but are fully my choice, my control - even though these activities benefit my employer too. 

Realistically, I cannot split work and fun. Work is fun. So I manage my own expectations of what I do, and try being mindful of the work-life balance when the lines are blurred by my own choices.

Doing stuff that resembles work and could be work 140% is a better framing. On top of that there's family, friends and stuff that does not resemble work. Writing a blog post on a Saturday resembles work. 

I do this because my interest are divided. While I love the impact we are building for at work that I have defines as my purpose (while there, for now), I also love making a dent in the world outside helping new speakers get started, building my own talks, writing articles beyond what can fit in my work day frame. 

In theory, I could be giving more for the purpose at work. The 100% time I give them could arguably be more awake, more focused if I wasn't doing all the other things. But thinking this way would be shortsighted because the 40% time gives me learnings that change who I am and what I can do, both in providing motivation and actual skills. 

Having discussed this with a colleague with similar yet different profile, I'm taking a learning from it: 

It's not the hours and their efficiency today, it's the continuous growth on our ability to deliver. 

It's the math of never stopping learning.