Saturday, May 23, 2020

Five Years of Mob Testing, Hello to Ensemble Testing

With my love of reflection cycles and writing about it, I come back to a topic I have cared a great deal for in the last five years: Mob Testing.

Mob Testing is this idea that instead of doing our testing solo, or paired, we could bring together a group of people for any testing activities using a specific mechanism that keeps everyone engaged. The specific mechanism of strong-style navigation insists that the work is not driven by the person at the keyboard, but someone hands-off keyboard using their words enabling everyone to be part of the activity.

From Mob Programming to Mob Testing

In 2014, I was organizing Tampere Goes Agile conference and invited a keynote speaker from the USA with this crazy idea of whole team programming his team called Mob Programming. I remember sitting in the room listening to Woody Zuill speak, and thinking the idea was just too insane and it would never work. The reaction I had forced a usual reaction: I have to try this, as it clearly was not something I could reason with.

By August 2015, I had tried mob programming with my team where I was the only tester in the whole organization, and was telling myself I did it to experience it, that I did not particularly enjoy it, and that it was all for the others. True to my style, I gave an interview to Valerie Silverthorne, introduced through Lisa Crispin and said: "I'm not certain if I enjoy this style of working in the long term."

September 2015 saw me moving my experimenting with the approach away from my workplace into the community. In September, I run a session on Mob Testing on CITCON open space conference in Helsinki, Finland. A week later, I run another session on Mob Testing at Testival open space conference in Split, Croatia. A week later, in Jyväskylä, Finland. By October 22nd, I had established what I called Mob Testing as I was using it on my commercial course as part of TinyTestBash in Brighton, UK.

I was hooked on Mob Testing, not necessarily as a way of doing testing, but as a way of seeing how other people do testing, for learning and teaching. Something with as much implicit knowledge and assumptions, doing the work together gave me an avenue to learn how others thought while they were testing, what tools they were using and what mechanisms they were relying on. As a teacher, it allowed me to see if a model I taught was something the group could apply. But more than teaching, it created groups that learned together, and I learned with them.

I found Mob Testing at a time when I felt alone as a tester, in a group of programmers. Later as I changed jobs and was no longer the only one of my kind, Mob Testing was my way of connecting with the community beyond chitchat of conceptual talk and definition wars. While I run some trainings specifically on Mob Testing, I was mostly using it to teach other things testing: exploratory testing (incl. an inkling to documenting as automation), and specific courses on automating tests.

Mob Testing was something I was excited about so that I would travel to talk about to Philadelphia, USA as well as Sydney, Australia, and a lot of different places between those. November 2017 I took my Mob Testing course to Potsdam, Germany for Agile Testing Days. I remember this group as a particularly special one, as it had Lisi Hocke as participant, and from learning what I had learned, she has taken Mob Testing further than I could have imagined. We both have our day jobs in our organizations, and training, speaking and sharing is a hobby more than work.

A year ago, I learned that Joep Schuurkes and Elizabeth Zagroba were running Mob Testing sessions at their work place, and was delighted to listen to them speak of their lessons on how it turned out to be much more of learning than contributing.

We've seen the community of Mob Programming as well as Mob Testing grow, and I love noticing how many different organizations apply this. Meeting a group I talk to about anything testing, it is more of a rule that they mention that somehow them trying out this crazy thing is linked back to me sharing my experiences. Community is powerful.

Personally, I like to think of Mob Testing as a mechanism to give me two things:
  1. Learning about testing
  2. Gateway to mob programming 
I work to break teams of testers and grow appreciation of true collaboration where developers and testers work so closely that it gets easy renaming everyone developers.

Over the years, I wrote a few good pieces on this to get people started:

With a heavy heart, I have listened to parts of the community so often silenced on the idea that mob programming and testing as terms are anxiety inducing, and I agree. They are great terms to specifically find this particular style of programming or testing, but need replacing. I was working between two options: group programming/testing and ensemble programming/testing. For recognizability, I go for the latter. I can't take out all the material I have already created with the old label, but will work to have new materials with the new label. Because I care for the people who care about stuff like this.

Feature and Release Testing

Back in the day, we used to talk about system testing. System testing was the work done by testers, with an integrated system where hardware and software were both closer to whatever we would imagine having in production. It usually came with the idea that it was a phase after unit and integration testing, and in many projects integration testing was same testing as system testing but finding a lot of bugs, where system testing was to find a little bugs and acceptance testing ended up being the same tests but now by the customer organization finding more bugs that what system testing could find.

I should not say "back in the day", as for the testing field certification courses, these terms are still being taught as if they were the core of smartassery testers need. I'm just feeling very past the terms and find them unhelpful and adding to the confusion.

The idea that we can test our software as isolated units of software and in various degrees of integrated units towards a realistic production environment is still valid. And we won't see some of the problems unless things are integrated. We're not integrating only our individual pieces, but 3rd party software and whatever hardware the system runs on. And even if we seek problems in our own software, the environment around matters for what the right working software for us to build is.

With introduction of agile and continuous integration and continuous delivery, the testing field very much clung to the words we have grown up with, resulting in articles like the ones I wrote back when agile was new to me showing that we do smaller slices of the same but we still do the same.

I'm calling that unhelpful now.

While unit-integration-system-acceptance is something I grew up with as tester, it isn't that helpful when you get a lot of builds, one from each merge to master, and are making your testing way through this new kind of jungle where the world around you won't stop just so that you'd get through testing that feature you are on on that build you're on, that won't even be the one that production will see.

We repurposed unit-integration-system-acceptance to test automation, and I wish we didn't. Giving less loaded names to things that are fast to run or take a little longer to run would have helped us more.

Instead of system testing I found myself talking about feature/change testing (anything you could test for a change or a group of changes comprising a feature that would see the customer's hands when we were ready) and release testing (anything that we needed to still test when we were making a release, hopefully just run of test automation but also a check of what is about to go out).

For a few years, I was routinely making a checklist for release testing:

  • minimize the tests needed now, get to running only automation and observing automation running as the form of visual, "manual" testing
  • Split into features in general and features being introduced, shining special light to features being introduced by writing user oriented documentation on what we were about to introduce to them
From tens of scenarios that the team felt that needed to be manually / visually confirmed to running a matrix test automation run on the final version, visually watching some of it and confirming the results match expectations. One automated test more at a time. One taken risk at a time, with feedback on its foundation. 

Eventually, release testing turned into the stage where the feature/change testing that was still leaking and not completed was done. It was the moment of stopping just barely enough to see that the new things we are making promises on were there. 

I'm going through these moves again. Separating the two, establishing what belongs in each box and how that maps into the work of "system testers". That's what a new job gives me - appreciation of tricks I've learned so well I took them for granted. 

Thursday, May 21, 2020

Going beyond the Defaults

With a new job, comes a new mobile phone. The brand new version of iPhone X is an upgrade to my previous iPhone 7, except for color - the pretty rose gold I come to love is no longer with me. The change experience is fluent, a few clicks and credentials, and all I need is time for stuff to sync for the new phone.

As I start using the phone, I can't help but noticing the differences. Wanting to kill an application that is stuck, I struggle when there is no button to double click to get to all running applications. I call out for my 11-year-old daughter to rescue and she teaches me the right kind of swipes.

Two weeks later, it feels as if there was never a change.

My patterns of phone use did not change as the model changed. If there's more power (features) to the new version, I most likely am not taking advantage of them, as I work on my defaults. Quality-wise, I am happy as long as my defaults work. Only the features I use can make an impact on my perception of quality.

When we approach a system with the intent of testing it, our own defaults are not sufficient. We need a superset of everyone's defaults. We call out to requirements to get someone else's model of all the things we are supposed to discover, but that is only a starting point.

For each claim made in requirements, different users can approach it with different expectations, situations, and use scenarios. Some make mistakes, some do bad things intentionally (especially for purposes of compromising security). There's many ways it can be used right ("positive testing") and many ways it can be used wrong ("negative testing" - hopefully leading to positive testing of error flows).

Exploratory testing says we approach this reality knowing we, the testers, have defaults. We actively break out of our defaults to see things in scale of all users. We use requirements as the skeleton map, and outline more details through learning as we test. We recognize some of our learnings would greatly benefit us in repeating things, and leave behind our insights documented as test automation. We know we weren't hired to do all testing, but to get all testing done and we actively seek everyone's contributions.

We go beyond the defaults knowing there is always more out there. 

Monday, May 18, 2020

The Foundation Moves - Even in Robot Framework with Selenium

As my work as tester shifts with a new organization, so do things I am watching over. One of the things the shift made me watch over more carefully is the Robot Framework community. While I watch that with eyes of a new person joining, I also watch it with eyes of someone with enough experience to see patterns and draw parallels.

The first challenge I pick on is quite a good one to have around: old versions out there and people not moving forward from those. It's a good one because it shows the tool and its user community has history. It did not emerge out of the blue yesterday. But history comes with its set of challenges.

Given a tool with old versions out there that the main maintainers do their best deprecating, the maintainers have little power over the people in the ecosystem:

  • Course providers create their materials with a version, and for free and open courses, may not update them. Google remembers the versions with outdated information.
  • People in companies working with automating tests may have learned the tricks of their trade with a particular version, and  they might not have realized that we need to keep following the versions and the changes, and maintaining tests isn't just keeping them running in our organization but proactively moving them to latest versions.
Robot Framework has a great, popular example of this: the Robot Framework Selenium Library. Having had the privilege of working side by side in my previous place with the maintainer Tatu Aalto, I have utmost respect for the work he does. 

The Robot Selenium Library is built on top of Selenium. Selenium has moved a lot in recent years, first to the Selenium WebDriver over the Selenium RC, and then later to versions 3 and 4 of the Selenium WebDriver to move Selenium towards a browser controlling standard integrated quite impressively with the browsers. As something built on top, it can abstract a lot of this change for its users, for both good and bad. 

The good is, the users can rely on someone else's work (your contributions welcome, I hear!)  to keep looking at how Selenium is moving. As a user, you can get the benefits of changes by updating to the newer version. 

The bad is, there might be many changes that allow you to not move forward, and it is common to find versions you would expect to be deprecated in code created recently. 

Having the routine of following your dependencies is an important one. Patterns of recognizing that you are not following your dependencies are sometimes easy to spot. 

With Robot Framework, let me suggest a rule: 
If your Robot Framework code introduces a library called Selenium2Library, it is time for you to think about removing the 2 in the middle and learning how you can create yourself a cadence of updating to latest regularly. 
Sometimes it is just as easy as allowing for new releases (small steps help with that!). Sometimes it requires you join information channels and follow up on what goes on. 

In the last years, we have said goodbye to Python 2 (I know, not all - get to that now, it is already gone) and Selenium has taken leaps forward. Understanding your ecosystem and being connected with the community isn't good thing to abstract away.  

In case you did not know, Robot Framework has a nice slack group - say hi to me in case you're there!  

Monday, April 13, 2020

Blood on the Terrace

On a Finnish summer evening, a group of friends got together on a summer cottage. They enjoyed their time and each other's company, with a few drinks. But as things unfold in unexpected ways, all things coming together,  one of them decided it was a good idea to play with a hammer. End result: hammer hitting a foot, lot of blood flowing to the terrace.
The other person on the terrace quickly assessed the situation, with a feeling of panic summing it simply: blood on the terrace. And what do you do when you have blood on the terrace? You go and get a bucket and a rag to clean it up. After all, you will have to do this. Blood on the terrace could ruin the terrace! All of this, while the friend in need of patching, was still in need of patching, bleeding on the terrace. "Hey, go get Anna", coming from the person bleeding on the terrace corrected the action and the incident became a funny story to recount to people. 
I could not stop laughing as my sister just told me this story of something she did. It was a misplaced reaction made under sense of helplessness and panic. How something that was necessary to do, but was not necessary to do right there and then turned out a funny story of how people behave. I'm telling this here with her permission and for a reason.
Misplaced reactions are common when we build software. Some of the worse reactions happen when we feel afraid, panicking and out of control. When we don't know what is the right thing to do and how to figure that out in a moment. And we choose to do something, because something is better than nothing. 
You might recognize this situation: a bug report telling you to hide a wrong text. You could do what the report says and hide the text. Or, you could stop to think why that text is there, why it should not be there and why it being there is a problem in the first place. If you did more than what the immediate ask was, you would probably be better off. Don't patch symptom by symptom, but rather understand the symptoms and fix the cause. 
While a lot of times the metaphor we use is adding bandaids, blood on the terrace describes the problem we face better. It is not that we are just doing something that really isn't addressing the problem in its full scale. It is that we are, under pressure with limited knowledge, making rash judgements out of all the things we could be doing, and timing the right action to the wrong time. 
When you feel rushed, you do what you do when time is of essence: something RIGHT NOW. What would it take to approach a report you get as your first step being to really pay attention and understand what the problem is, why it is a problem and if whoever told you it was a problem was a definitive source. Or if that matters. 
It was great that the terrace did not end up ruined. The foot healed too. All levels of damage controlled. 

Friday, April 10, 2020

Reporting and Notetaking

Exploratory Testing. The wonderful transformation machinery that takes in someone skilled and multidisciplinary and time, crunches whatever activities necessary and provides an output of someone more skilled, whatever documentation and numbers we need and information about testing. 
What happens in the machinery can be building tools and scripts, it can be operating and observing the system under test but whatever it is, it is chosen under the principle of opportunity cost and focused on learning

When the world of exploratory testing meets the world of test automation, we often frame same activities and outputs differently. When a test automation specialist discusses reporting, it seems they often talk about what an exploratory tester would describe as note taking. 

Note taking is when we keep track of the steps we do and results we deliver and explain what we tested to other people who test. Reporting is when we summarize steps and results over a timeframe and interpret our notes to people who were not doing the testing. 

Note taking produces:
  • freeform or structured text
  • screenshots and conceptual images
  • application logs
  • test automation that can be thrown away or kept for later
Reporting produces something that says what quality is, what coverage is, how long what we say now stays true, and what we recommend we would do about that. 

Automation can take notes, count the passes and fails, summarize use of time. The application under test takes notes (we call  that logging) and it can be a central source of information on what happened. 

When automation does reporting, that usually summarizes numbers. I find our managers care very little for these numbers. Instead, the reporting they seek is releases and their contents. 

When automation does log of test case execution, we can call that a report. But it is a report in scope very different than what I mean by a report from exploratory testing - including automation. 

Thursday, April 9, 2020

It does what it is supposed to do, but is this all we get?

Yesterday in a testing workshop I run 3rd time in this format online, something interesting happened. I noticed a pattern, and I am indebted to the women who made it so evident I could not escape the insight.

We tested two pieces of software, to have an experience on testing that enabled us to discuss what testing is, why it is important and if it interests us. On my part, the interest is evident, perhaps even infectious. The 30 women participating were people from the Finnish MimmitKoodaa program, introducing women new to creating software to the skills in that space.

The first software we tested was the infamous Park Calculator. The insight that I picked up on came quite late to a bubbling discussion on how many problems and what kind of problems we were seeing, when someone framed it as a question: Why would the calculator first ask the type of parking and then give the cost of it, when a user would usually start with the idea that they know when they are traveling, and would benefit from a summary of all types of parking for that timeframe? The answer seems to be that both approaches would fit some way of thinking around the requirements, and what we have was the way who ever implemented this decided to frame that requirement. We found a bug, that would lead to a redesign of what we had, even if we could reuse many of the components. Such feedback would be more welcome early on, but if not early, discussing this to be aware was still a good option.

The second software we tested was the GildedRose. A piece of code, intertwined with lovely clear requirements, often used as a way of showing how a programmer can get code under tests without even reading the requirements. Reading the requirements leads us to a different, interesting path though. One of the requirements states that quality of an item can never be negative and another one tells it is maximum 50. Given a list of requirements where these two are somewhere in the middle, the likelihood of a tester picking these up to test first is very high - a fascinating pattern in itself. However, what we learn from those is that there is no error message on an input or output beyond the boundaries as we expect, instead given inputs outside boundaries the software stays on the level of wrongness of input not making it worse, and given inputs barely at boundaries it blocks the outputs from changing to wrong. This fits the requirement, but goes as a confusing design choice.

The two examples together bring out a common pattern we see in testing: sometimes what we have is what we intended to have, and fits our requirements. However, we can easily imagine a better way of interpreting that requirement, and would start a discussion on this as a bug we know will stretch some folks ideas of what a bug is.