Thursday, July 2, 2020

Never tested an API? - A Python Primer from My Summer Trainee

With first of our release, I taught the most straightforward way I could to test an API for my summer trainee. I gave them a URL (explaining what a URL is), showed different part of it indicated where you connected and what you were asking for and ended up leaving office for four hours letting them test for the latest changes just as other people in the team wanted to get out of office for their summer vacation. They did great with just that in my absence, even if they felt the responsibility of releasing was weighing on them. 

No tools. No postman. Just a browser and an address. Kind of like this:

The API we were testing returned a lot more values. We were testing 20000 items as the built-in limit for that particular release, and it was clear that the approach to determine correctness was sampling. 

Two weeks later, today we returned to that API, with the idea that it was time to do something more than just looking at results in the browser. 

Python, in the interpreter

We started off by opening a command line, and starting python. 

As we were typing in import requests, I explained that we're taking a library into use. Similarly I explained print(requests.get("")), forgetting the closing parenthesis at first and adding it on a line after. 

With the 200 response, I explained the idea of this code meaning it was ok, but we'd need more to see the message we had earlier seen in a browser, and that while we could also use this for testing, we'd rather move to writing our code to a file in an IDE. 

Python like a script, in Pycharm

As we opened Pycharm and created a .py file to write things in, the very first lines were exactly the same ones we had been running from command line. We created two files. First requirements.txt in which we only wrote requests and second file ended up with name As the two lines were in, Pycharm suggested installing what requirements.txt defined and we ensured it was still running just the same. At first we found the Run menu in IDE, later the little green play buttons started to seem more appealing as well as the keyboard shortcut for doing this one often. 

We replaced the print with a variable that could keep our response to explore it further
response = requests.get("")
typing in response. and ctrl+space, we could see options of what to do with it and settled with 
At this point, we could see the same text we had seen before in browser, visually verify it just as much as with the browser and were ready to move on. 

Next we started working on the different pieces of the URL, as we wanted to test same things in different environments, and our API had a few more options than this one I use for educational purposes here. 

We pulled out the address into a variable, and the rest of it into another, and concatenated them together. for the call. 
import requests
address = ""
rest_of_it ="us/90210"
whole_thing = address + rest_of_it
response = requests.get(whole_thing)
The API we were playing with had a lot more pieces. With environments, names, id's, dates, limits and their suffixes in the call we had a few more moving parts to pull out with the very same pattern. 

As we were now able to run this for one set of values, our next step was to see it run for another set of values. On our API, we're working on a data-specific bug that ends up giving us a different status code of 500, we wanted to move for the idea of seeing that here. 

Making the status code visible with 
we started our work to have calls of the whole_thing where it wasn't what we started with but had multiple options. 
#rest_of_it ="us/90210"
rest_of_it = "fi/00780"
Every option we would try got documented, but the state of changing one into a comment and another into the one we would was not what we'd settle for. 

We wanted two things: 
  • a method that would take in the parts and form the whole_thing for us
  • a way of saving the results of calls 
We started with keeping a part of the results introducing pytest writing that into requirements.txt as second line. 
Again we clicked an ok adding what our environment was missing as Pycharm pinged us on that, and saved the response code codifying it into an assert. We remembered to try other values to see it fail to trust it in the first place. 
assert response.status_code == 200
Us still wanting the two things above, I interrupted our script creation to move us a step in a different direction. 

Python like a Class and Methods, in Pycharm

We googled for "pytest class example" under my instructions, and after not liking the first glance of the first hits, we ended up on a page:

We copied the example as file contents on our IDE. 

We hit a mutual momentary hiccup, to figure out three things: 
  1. We needed to set pytest as our default test runner from File | Settings | Tools | Python integrated tools | Default test runner. 
  2. The file must have Test in name for it to be recognized as tests
  3. We could run a single test from the green play button next to it
The original example to illustrate setup and teardown had a little bit too much noise, so we cleaned that up before starting to move our script in to the structure.
class TestClass():
def setup_class(self):

def teardown_class(self):

def setup_method(self):

def teardown_method(self):

def test_one(self):
assert True
We moved everything from the script we had created inside test_one() 
def test_one(self):
import requests
address = ""
# rest_of_it ="us/90210"
rest_of_it = "fi/00780"
whole_thing = address + rest_of_it
response = requests.get(whole_thing)
assert response.status_code == 200
And we moved the import from inside the test to beginning of the file to have it available for what we expected to be multiple tests. With every step, we run the tests to see they were still passing. 

Next, I asked the trainee to add a line right after def test_one(self) that would be like we imagined what we'd like to call to get our full address. We ended up with
define_address("foo", "bar")
representing us giving two pieces of text that would end up forming the changing parts of the address. 

A little red bulb emerged on the IDE next to our unimplemented method (interjecting TDD here!) and we selected Define function from the little menu of options on the light bulb. IDE created us a method frame.
def define_address(param, param1):
We had already been through the idea of Refactor | Rename coming up with even worse names and following the "let's rename every time we know a name that is better than what we have now" principle. I wouldn't allow just typing in a new name, but always go through Refactor to teach the discipline that would be benefiting from the tooling. Similarly, I would advice against typing whole words but allowing IDE to complete what it can. 

We moved the piece of concatenating two parts together into the method (ours had a little more parts than the example). 
def define_address(part1, part2):
whole_thing = part1 + part2
return whole_thing
and were left with a test case where we had to call the method with relevant parts of the address
def test_one(self):
# rest_of_it ="us/90210"
response = requests.get(define_address("", "fi/00780"))
assert response.status_code == 200
The second test we'd want as comment in the first became obvious, and we created a second test. 
def test_two(self):
response = requests.get(define_address("", "us/90210"))
assert response.status_code == 200
Verifying that response.text

Now that we had established the idea of test cases in a test class and structure of a class over writing just a script with a hint of TDD, we moved our attention to saving results of the calls we were making. Seeing "200 success" isn't quite what we'd look for. 

In the final step of the day, we introduced approvaltests into requirements.txt file.
We edited two line of our file, adding
from approvaltests.approvals import verify
and changing print to verify
We run the tests from terminal once to see them fail (as we saw them be ignored without this step on the usual run) 
pytest --approvaltests-use-reporter='PythonNative'
We saw a file TestClass.test_one.received.txt emerge in our files, and after visually verifying it captured what we had seen printed before, we renamed the file as TestClass.test_one.approved.txt. We run the tests again from the IDE to now see them pass, edited the approved-file to see it fail and corrected it back to verifying our results match. 

As finalization of the day, we added verification on our second test, again visually verifying and keeping the approved file around. 
def test_one(self):
response = requests.get(define_address("", "fi/00780"))
assert response.status_code == 200
And finally, we defined approvaltests_config.json file to include information where the files approvaltests create should go
"subdirectory": "approved_files"
These steps give us what we could do in a browser, and allow us to explore. They also help us save results for future with minimal effort, and introduce a baseline from which we can reuse things we've created. 

Looking forward to see what our testing takes us to next with the trainee. 

Wednesday, July 1, 2020

Learning about Learning

As an exploratory tester, I've come to appreciate that a core of my skills is that I have been learning about learning, and having practiced mostly learning about products, technology, organizations, businesses and people for a quarter of a decade, I have somewhat of a hang of it.

Having a hang of it shows particularly when I change organizations, like I did 2 months ago. Even if I say so myself, I've taken in the new organization at a good pace and have been contributing since beginning, but to my expected level of exceeding expectations starting from the second month. 

Even though I still consider testing (and software productivity) more professional core, I find that the stuff I am learning about learning applies just as much to other roles. Today I took a moment to deliver a 30-minute broadcast inside my organization, talking just about learning. Since most of you could not join an internal session, I decided on a blog. 

Foundation, the Math

Imagine you were awesome. Your results are great. You know how to get the job done. Every day when you come to work, you deliver steadily. Sounds great? 

Many of us are awesome and deliver steadily. We are as productive today as we are in a year. Solid delivery. 

But learning changes the game. 

Imagine you and your colleague are equally awesome. You both deliver steadily today. But your colleague, unlike you, takes time away from every single working day to improve their results. They find a way to become 1% better every week, shaving off 4 minutes of time from completing something of significance. In a year, you're still awesome like you were before. But your colleague is 1.7 times their past self due to learning. 

1% a week may sound a lot, or a little, but the learning accumulates. If we learned in ways that transform out results 1% each day, a year gives us 37.8 times our past selves. 
This sums our working days into two activities: We are either learning or contributing. Both are valuable. We could use most of our office hours to achieve that 1% improvement every day to match our past selves in a year. The investment to learning is worthwhile.

From Learning Alone to Learning Together

Now that we've established the idea that learning is worthwhile investment, we can discuss our options for using that investment. Learning does not happen only while we take special learning time to show up on a course, but most of it is on the job. Volunteering to do that cloud configuration you've never done before - now you have. Volunteering to take a first effort at the UX design even though you're not a UX designer - now that you are in control of the tasks and your learning, those with more experience can help you learn. Learning is a deliberate action. 

The usual way we work is solo. We bring our best and worst into the outcome we're producing, and a traditional way of approaching this is that others will join after you in giving you feedback of things you may have missed. 
Every comment to a pull request helps you address something you missed now, and learn for later cycles. Every bug someone reports after you both internally and externally, does the same. Every time a new requirement emerges because the application serves as someone else's external imagination, you learn about how you could see things coming and are able to make informed rather than accidental choices. 

With the traditional solo - handoff style, every one of us needs to learn just enough about the work to be able to contribute our bit. If we don't know much about the work, that limits our contribution to what we know of. 

Imagine you were rather learning through pairing. Building the understanding of the task together. Not filtering the feedback based on what makes sense to ask of you as you already implemented it in one way. Instead of getting the best out of you into the work you're doing, you get the best out of both of you. 
Ensemble programming is bringing a pair to more people, a whole team and seeing the curve positively flatten as everyone is learning and contributing, provided we first learn how to listen and to work well together. 

From individuals to seeing the system

Learning on its own is a little abstract. What is it that we are learning about? 

What I talked about today, is that we're learning everything we can to optimize the meaningful outcomes from software development. It might be learning a keyboard shortcut to save time in completing an action (microlearning). It might be learning to innovate how collaboration works in our organization. And to frame that in software development, understanding it as a process there smart people transform ideas into code without being alone with all this responsibility helps frame it. 

Nothing changes for the users unless we change the code. 

If we know the right idea to change the code without people other than developers making the change, we could do just that. But we understand that fine-tuning ideas is where the rest of the organization comes to play, and that software does not exist in a vacuum without the services around it. 

Some of those percents of betterment come from stopping at looking at ourselves and starting to look at the system of people that co-creates the value. 

Learning Never Stops

The final piece I discussed today was about the idea of a Senior vs. Junior. It's not that the first knows more than the latter in some basic absolute scale. Knowing something is multidimensional, and even those of us who are seniors don't know everything. Partially this comes from the fact that there is already too much to know for a person, but also from the fact that more to know emerges every day. 

Just like a senior takes on work they need to figure out doing it, so does the junior. The complexity of the tasks expected to be figured out is very different, but one of the powers of great seniors is that we can accelerate the learning of the juniors. We don't have to put them through our struggles, they can find a new innovative struggle even when the latest of how we enable them is in place. 
Even if a senior knows more things, there are still things they can learn from the junior if they listen and pay attention.  

Ideas to take this further

As part of my in-company broadcast series on things I want to talk on allowing people to join me and have a conversation, today's conversation part was particularly successful. My theme today was ROI (Return on Investment) of Learning, and three themes stood out from the comments: 
  • Unlearning to make space for new learning - can take double the effort and requires listening to new people giving hints on things you may need to act on
  • New to industry or new to an organization - no need of deliberately looking for things to learn, the work already stretches you. 
  • Microlearning - more examples of the little stretches, more stories of things we didn't know but learned would help us a long way. 
There's a whole book I'm writing on this in the context of Exploratory Testing. I'm always open for a good conversation on this and prefer video call over wall of text, wall of text in public over private, and twitter-size over a wall of text. 

Sunday, June 14, 2020

Automation First Microheuristic

Developers announce a new feature is available in the build and could use a second pair of eyes. What is the first thing to do? Changing companies made me realize I have a heuristic on deciding when I automate test cases as part of exploratory testing. 

Both automating and not automating end up bringing in that second pair of eyes, that seeking of understanding the feature and how it shows in the relevant flows. The first level of making the choice if you start with automating is if you are capable of automating. It makes the choice available on an individual level, and only after that it can be a choice. 

When that choice is available, these things could impact choosing Automation First. 
  • Belief that change in the basic flow matters beyond anything else you imagine wrong with it
    • When automating, you will visually and programmatically verify the basic flow as you are building it. Building it to a good reliable level takes longer than just looking at it but then remains around to see if changes in software change the status of it. 
  • Availability of quality dimensions (reliability, environment coverage) through automation
    • If your application domain's type issues are related to timing of use or multitudes of environments where one works while others may not. automating first gives you a wider scope than doing it manually ever could. 
  • Effort difference isn't delaying feedback. 
    • With an existing framework and pipeline, extending it is an effort to consider. Without them, having to set things up can easily become the reason why automating takes so long it makes sense to always first provide feedback without it to ensure it can work.
  • Brokenness of application
    • Humans work around broken / half-baked features whereas writing automation against it may be significantly harder. 
I was thinking of this as I realized that the automated tests on my current system see very few problems. There is no relevant environmental difference, like with my previous job. Automation works mostly in the change dimension, unlike my previous job. 

Going into the moment of making this choice, I find I still go back to my one big heuristic that guides it all: Never be bored. First or Second does not matter as much as the idea that keeping things varied helps keep me away from boredom. Documenting with automation makes sense to avoid that boredom in the long run. 

Saturday, June 13, 2020

Training an Exploratory Tester from the Ground Up

This summer gives me the perfect possibility - a summer intern with experience of work life outside software and I get to train them into being a proper Exploratory Tester. 

Instead of making a plan of how to do things, I do things from a vision, and adapt as I learn about what the product team needs (today) and what comes easy for trainee trusted into my guidance. 

Currently my vision is that by end of the summer, the trainee will:
  • Know how to work effectively in scope of a single team as tester inside that team
  • Understand the core a tester would work from and regularly step away from that core to developer and product owner territory 
  • Know how to see versatile issues and prioritize what issues make sense to report, as each report creates a response in the team
  • Know that best bug reports are code but it's ok to learn skills one by one to get to that level of reporting ability - being available is second best thing 
  • Understand how change impacts testing and guide testing by actual change in code bases in combination of constraints communicated for that change
  • Write test automation for WebUI in Jest + Puppeteer and Robot Framework and take part in team choice of going with one or the other
  • Operate APIs for controlling data creation and API-based verifications using Java, Python and JavaScript.
  • Understand how their testing and test automation sits in the context of environments it runs in: Jenkins, Docker and the environment the app runs in: Docker, Kubernetes and CI-Staging-Prod for complex set of integrated pieces
  • Communicate clearly the status of their testing and advocate for important fixes to support 'zero bugs on product backlog' goal in the team
  • Control their own balance of time to learning vs. contributing that matches their personal style to not require task management but leading the testing they do on their own
  • Have connections outside the company in the community to solve problems in testing that are hard to figure out internally
We got started this week, are are one week into the experience. So far they have:
  • Reported multiple issues they recognized are mostly usability and language. I jumped on the problems with functionality and reported them, demoing those enforced the idea that they are seeing only particular categories now. 
  • Navigated command line, filesystem, Git, and IDE in paired setting and shown they pick things up from examples they experience, repeating similar moves a day later from learning the concepts. 
  • Skipped reporting for a language bug and fixed it with PR instead. 
  • Covered release testing with a provided one-liner checklist for the team's first release. 
  • Provided observations on their mentors (mine) models of how I train them, leading me to an insight that I both work hard to navigate on higher level (telling what they should get done, and only after digging into exactly how to do it if they already don't know that) and respond questions with questions to reinforce they already know some of the stuff.
  • Taken selective courses from Test Automation University on keywords they pick up as I explain, as well as reading tool-specific examples and guidelines. 
  • Explained to me how they currently model unit - service - UI tests and mixed language set the team has. 
  • Presented a plan of what they will focus on achieving next week with Jest-Puppeteer 1st case with our application. 
After the week, I'm particularly happy to see the idea of self-management and *you leading your own work but radiating intent* is catching up. Them recognizing they can't see all types of bugs yet is promising as is their approach to learning. 

Every step, I prepare them for the world where I won't be there to guide them but they know how to pull in help when they need it - inside the company and outside. 

Saturday, May 23, 2020

Five Years of Mob Testing, Hello to Ensemble Testing

With my love of reflection cycles and writing about it, I come back to a topic I have cared a great deal for in the last five years: Mob Testing.

Mob Testing is this idea that instead of doing our testing solo, or paired, we could bring together a group of people for any testing activities using a specific mechanism that keeps everyone engaged. The specific mechanism of strong-style navigation insists that the work is not driven by the person at the keyboard, but someone hands-off keyboard using their words enabling everyone to be part of the activity.

From Mob Programming to Mob Testing

In 2014, I was organizing Tampere Goes Agile conference and invited a keynote speaker from the USA with this crazy idea of whole team programming his team called Mob Programming. I remember sitting in the room listening to Woody Zuill speak, and thinking the idea was just too insane and it would never work. The reaction I had forced a usual reaction: I have to try this, as it clearly was not something I could reason with.

By August 2015, I had tried mob programming with my team where I was the only tester in the whole organization, and was telling myself I did it to experience it, that I did not particularly enjoy it, and that it was all for the others. True to my style, I gave an interview to Valerie Silverthorne, introduced through Lisa Crispin and said: "I'm not certain if I enjoy this style of working in the long term."

September 2015 saw me moving my experimenting with the approach away from my workplace into the community. In September, I run a session on Mob Testing on CITCON open space conference in Helsinki, Finland. A week later, I run another session on Mob Testing at Testival open space conference in Split, Croatia. A week later, in Jyväskylä, Finland. By October 22nd, I had established what I called Mob Testing as I was using it on my commercial course as part of TinyTestBash in Brighton, UK.

I was hooked on Mob Testing, not necessarily as a way of doing testing, but as a way of seeing how other people do testing, for learning and teaching. Something with as much implicit knowledge and assumptions, doing the work together gave me an avenue to learn how others thought while they were testing, what tools they were using and what mechanisms they were relying on. As a teacher, it allowed me to see if a model I taught was something the group could apply. But more than teaching, it created groups that learned together, and I learned with them.

I found Mob Testing at a time when I felt alone as a tester, in a group of programmers. Later as I changed jobs and was no longer the only one of my kind, Mob Testing was my way of connecting with the community beyond chitchat of conceptual talk and definition wars. While I run some trainings specifically on Mob Testing, I was mostly using it to teach other things testing: exploratory testing (incl. an inkling to documenting as automation), and specific courses on automating tests.

Mob Testing was something I was excited about so that I would travel to talk about to Philadelphia, USA as well as Sydney, Australia, and a lot of different places between those. November 2017 I took my Mob Testing course to Potsdam, Germany for Agile Testing Days. I remember this group as a particularly special one, as it had Lisi Hocke as participant, and from learning what I had learned, she has taken Mob Testing further than I could have imagined. We both have our day jobs in our organizations, and training, speaking and sharing is a hobby more than work.

A year ago, I learned that Joep Schuurkes and Elizabeth Zagroba were running Mob Testing sessions at their work place, and was delighted to listen to them speak of their lessons on how it turned out to be much more of learning than contributing.

We've seen the community of Mob Programming as well as Mob Testing grow, and I love noticing how many different organizations apply this. Meeting a group I talk to about anything testing, it is more of a rule that they mention that somehow them trying out this crazy thing is linked back to me sharing my experiences. Community is powerful.

Personally, I like to think of Mob Testing as a mechanism to give me two things:
  1. Learning about testing
  2. Gateway to mob programming 
I work to break teams of testers and grow appreciation of true collaboration where developers and testers work so closely that it gets easy renaming everyone developers.

Over the years, I wrote a few good pieces on this to get people started:

With a heavy heart, I have listened to parts of the community so often silenced on the idea that mob programming and testing as terms are anxiety inducing, and I agree. They are great terms to specifically find this particular style of programming or testing, but need replacing. I was working between two options: group programming/testing and ensemble programming/testing. For recognizability, I go for the latter. I can't take out all the material I have already created with the old label, but will work to have new materials with the new label. Because I care for the people who care about stuff like this.

Feature and Release Testing

Back in the day, we used to talk about system testing. System testing was the work done by testers, with an integrated system where hardware and software were both closer to whatever we would imagine having in production. It usually came with the idea that it was a phase after unit and integration testing, and in many projects integration testing was same testing as system testing but finding a lot of bugs, where system testing was to find a little bugs and acceptance testing ended up being the same tests but now by the customer organization finding more bugs that what system testing could find.

I should not say "back in the day", as for the testing field certification courses, these terms are still being taught as if they were the core of smartassery testers need. I'm just feeling very past the terms and find them unhelpful and adding to the confusion.

The idea that we can test our software as isolated units of software and in various degrees of integrated units towards a realistic production environment is still valid. And we won't see some of the problems unless things are integrated. We're not integrating only our individual pieces, but 3rd party software and whatever hardware the system runs on. And even if we seek problems in our own software, the environment around matters for what the right working software for us to build is.

With introduction of agile and continuous integration and continuous delivery, the testing field very much clung to the words we have grown up with, resulting in articles like the ones I wrote back when agile was new to me showing that we do smaller slices of the same but we still do the same.

I'm calling that unhelpful now.

While unit-integration-system-acceptance is something I grew up with as tester, it isn't that helpful when you get a lot of builds, one from each merge to master, and are making your testing way through this new kind of jungle where the world around you won't stop just so that you'd get through testing that feature you are on on that build you're on, that won't even be the one that production will see.

We repurposed unit-integration-system-acceptance to test automation, and I wish we didn't. Giving less loaded names to things that are fast to run or take a little longer to run would have helped us more.

Instead of system testing I found myself talking about feature/change testing (anything you could test for a change or a group of changes comprising a feature that would see the customer's hands when we were ready) and release testing (anything that we needed to still test when we were making a release, hopefully just run of test automation but also a check of what is about to go out).

For a few years, I was routinely making a checklist for release testing:

  • minimize the tests needed now, get to running only automation and observing automation running as the form of visual, "manual" testing
  • Split into features in general and features being introduced, shining special light to features being introduced by writing user oriented documentation on what we were about to introduce to them
From tens of scenarios that the team felt that needed to be manually / visually confirmed to running a matrix test automation run on the final version, visually watching some of it and confirming the results match expectations. One automated test more at a time. One taken risk at a time, with feedback on its foundation. 

Eventually, release testing turned into the stage where the feature/change testing that was still leaking and not completed was done. It was the moment of stopping just barely enough to see that the new things we are making promises on were there. 

I'm going through these moves again. Separating the two, establishing what belongs in each box and how that maps into the work of "system testers". That's what a new job gives me - appreciation of tricks I've learned so well I took them for granted. 

Thursday, May 21, 2020

Going beyond the Defaults

With a new job, comes a new mobile phone. The brand new version of iPhone X is an upgrade to my previous iPhone 7, except for color - the pretty rose gold I come to love is no longer with me. The change experience is fluent, a few clicks and credentials, and all I need is time for stuff to sync for the new phone.

As I start using the phone, I can't help but noticing the differences. Wanting to kill an application that is stuck, I struggle when there is no button to double click to get to all running applications. I call out for my 11-year-old daughter to rescue and she teaches me the right kind of swipes.

Two weeks later, it feels as if there was never a change.

My patterns of phone use did not change as the model changed. If there's more power (features) to the new version, I most likely am not taking advantage of them, as I work on my defaults. Quality-wise, I am happy as long as my defaults work. Only the features I use can make an impact on my perception of quality.

When we approach a system with the intent of testing it, our own defaults are not sufficient. We need a superset of everyone's defaults. We call out to requirements to get someone else's model of all the things we are supposed to discover, but that is only a starting point.

For each claim made in requirements, different users can approach it with different expectations, situations, and use scenarios. Some make mistakes, some do bad things intentionally (especially for purposes of compromising security). There's many ways it can be used right ("positive testing") and many ways it can be used wrong ("negative testing" - hopefully leading to positive testing of error flows).

Exploratory testing says we approach this reality knowing we, the testers, have defaults. We actively break out of our defaults to see things in scale of all users. We use requirements as the skeleton map, and outline more details through learning as we test. We recognize some of our learnings would greatly benefit us in repeating things, and leave behind our insights documented as test automation. We know we weren't hired to do all testing, but to get all testing done and we actively seek everyone's contributions.

We go beyond the defaults knowing there is always more out there. 

Monday, May 18, 2020

The Foundation Moves - Even in Robot Framework with Selenium

As my work as tester shifts with a new organization, so do things I am watching over. One of the things the shift made me watch over more carefully is the Robot Framework community. While I watch that with eyes of a new person joining, I also watch it with eyes of someone with enough experience to see patterns and draw parallels.

The first challenge I pick on is quite a good one to have around: old versions out there and people not moving forward from those. It's a good one because it shows the tool and its user community has history. It did not emerge out of the blue yesterday. But history comes with its set of challenges.

Given a tool with old versions out there that the main maintainers do their best deprecating, the maintainers have little power over the people in the ecosystem:

  • Course providers create their materials with a version, and for free and open courses, may not update them. Google remembers the versions with outdated information.
  • People in companies working with automating tests may have learned the tricks of their trade with a particular version, and  they might not have realized that we need to keep following the versions and the changes, and maintaining tests isn't just keeping them running in our organization but proactively moving them to latest versions.
Robot Framework has a great, popular example of this: the Robot Framework Selenium Library. Having had the privilege of working side by side in my previous place with the maintainer Tatu Aalto, I have utmost respect for the work he does. 

The Robot Selenium Library is built on top of Selenium. Selenium has moved a lot in recent years, first to the Selenium WebDriver over the Selenium RC, and then later to versions 3 and 4 of the Selenium WebDriver to move Selenium towards a browser controlling standard integrated quite impressively with the browsers. As something built on top, it can abstract a lot of this change for its users, for both good and bad. 

The good is, the users can rely on someone else's work (your contributions welcome, I hear!)  to keep looking at how Selenium is moving. As a user, you can get the benefits of changes by updating to the newer version. 

The bad is, there might be many changes that allow you to not move forward, and it is common to find versions you would expect to be deprecated in code created recently. 

Having the routine of following your dependencies is an important one. Patterns of recognizing that you are not following your dependencies are sometimes easy to spot. 

With Robot Framework, let me suggest a rule: 
If your Robot Framework code introduces a library called Selenium2Library, it is time for you to think about removing the 2 in the middle and learning how you can create yourself a cadence of updating to latest regularly. 
Sometimes it is just as easy as allowing for new releases (small steps help with that!). Sometimes it requires you join information channels and follow up on what goes on. 

In the last years, we have said goodbye to Python 2 (I know, not all - get to that now, it is already gone) and Selenium has taken leaps forward. Understanding your ecosystem and being connected with the community isn't good thing to abstract away.  

In case you did not know, Robot Framework has a nice slack group - say hi to me in case you're there!  

Monday, April 13, 2020

Blood on the Terrace

On a Finnish summer evening, a group of friends got together on a summer cottage. They enjoyed their time and each other's company, with a few drinks. But as things unfold in unexpected ways, all things coming together,  one of them decided it was a good idea to play with a hammer. End result: hammer hitting a foot, lot of blood flowing to the terrace.
The other person on the terrace quickly assessed the situation, with a feeling of panic summing it simply: blood on the terrace. And what do you do when you have blood on the terrace? You go and get a bucket and a rag to clean it up. After all, you will have to do this. Blood on the terrace could ruin the terrace! All of this, while the friend in need of patching, was still in need of patching, bleeding on the terrace. "Hey, go get Anna", coming from the person bleeding on the terrace corrected the action and the incident became a funny story to recount to people. 
I could not stop laughing as my sister just told me this story of something she did. It was a misplaced reaction made under sense of helplessness and panic. How something that was necessary to do, but was not necessary to do right there and then turned out a funny story of how people behave. I'm telling this here with her permission and for a reason.
Misplaced reactions are common when we build software. Some of the worse reactions happen when we feel afraid, panicking and out of control. When we don't know what is the right thing to do and how to figure that out in a moment. And we choose to do something, because something is better than nothing. 
You might recognize this situation: a bug report telling you to hide a wrong text. You could do what the report says and hide the text. Or, you could stop to think why that text is there, why it should not be there and why it being there is a problem in the first place. If you did more than what the immediate ask was, you would probably be better off. Don't patch symptom by symptom, but rather understand the symptoms and fix the cause. 
While a lot of times the metaphor we use is adding bandaids, blood on the terrace describes the problem we face better. It is not that we are just doing something that really isn't addressing the problem in its full scale. It is that we are, under pressure with limited knowledge, making rash judgements out of all the things we could be doing, and timing the right action to the wrong time. 
When you feel rushed, you do what you do when time is of essence: something RIGHT NOW. What would it take to approach a report you get as your first step being to really pay attention and understand what the problem is, why it is a problem and if whoever told you it was a problem was a definitive source. Or if that matters. 
It was great that the terrace did not end up ruined. The foot healed too. All levels of damage controlled. 

Friday, April 10, 2020

Reporting and Notetaking

Exploratory Testing. The wonderful transformation machinery that takes in someone skilled and multidisciplinary and time, crunches whatever activities necessary and provides an output of someone more skilled, whatever documentation and numbers we need and information about testing. 
What happens in the machinery can be building tools and scripts, it can be operating and observing the system under test but whatever it is, it is chosen under the principle of opportunity cost and focused on learning

When the world of exploratory testing meets the world of test automation, we often frame same activities and outputs differently. When a test automation specialist discusses reporting, it seems they often talk about what an exploratory tester would describe as note taking. 

Note taking is when we keep track of the steps we do and results we deliver and explain what we tested to other people who test. Reporting is when we summarize steps and results over a timeframe and interpret our notes to people who were not doing the testing. 

Note taking produces:
  • freeform or structured text
  • screenshots and conceptual images
  • application logs
  • test automation that can be thrown away or kept for later
Reporting produces something that says what quality is, what coverage is, how long what we say now stays true, and what we recommend we would do about that. 

Automation can take notes, count the passes and fails, summarize use of time. The application under test takes notes (we call  that logging) and it can be a central source of information on what happened. 

When automation does reporting, that usually summarizes numbers. I find our managers care very little for these numbers. Instead, the reporting they seek is releases and their contents. 

When automation does log of test case execution, we can call that a report. But it is a report in scope very different than what I mean by a report from exploratory testing - including automation. 

Thursday, April 9, 2020

It does what it is supposed to do, but is this all we get?

Yesterday in a testing workshop I run 3rd time in this format online, something interesting happened. I noticed a pattern, and I am indebted to the women who made it so evident I could not escape the insight.

We tested two pieces of software, to have an experience on testing that enabled us to discuss what testing is, why it is important and if it interests us. On my part, the interest is evident, perhaps even infectious. The 30 women participating were people from the Finnish MimmitKoodaa program, introducing women new to creating software to the skills in that space.

The first software we tested was the infamous Park Calculator. The insight that I picked up on came quite late to a bubbling discussion on how many problems and what kind of problems we were seeing, when someone framed it as a question: Why would the calculator first ask the type of parking and then give the cost of it, when a user would usually start with the idea that they know when they are traveling, and would benefit from a summary of all types of parking for that timeframe? The answer seems to be that both approaches would fit some way of thinking around the requirements, and what we have was the way who ever implemented this decided to frame that requirement. We found a bug, that would lead to a redesign of what we had, even if we could reuse many of the components. Such feedback would be more welcome early on, but if not early, discussing this to be aware was still a good option.

The second software we tested was the GildedRose. A piece of code, intertwined with lovely clear requirements, often used as a way of showing how a programmer can get code under tests without even reading the requirements. Reading the requirements leads us to a different, interesting path though. One of the requirements states that quality of an item can never be negative and another one tells it is maximum 50. Given a list of requirements where these two are somewhere in the middle, the likelihood of a tester picking these up to test first is very high - a fascinating pattern in itself. However, what we learn from those is that there is no error message on an input or output beyond the boundaries as we expect, instead given inputs outside boundaries the software stays on the level of wrongness of input not making it worse, and given inputs barely at boundaries it blocks the outputs from changing to wrong. This fits the requirement, but goes as a confusing design choice.

The two examples together bring out a common pattern we see in testing: sometimes what we have is what we intended to have, and fits our requirements. However, we can easily imagine a better way of interpreting that requirement, and would start a discussion on this as a bug we know will stretch some folks ideas of what a bug is. 

Tuesday, April 7, 2020

Developer-centric way of working with three flight levels

The way we have been working for the last years can best be described as a developer-centric way of working.

Where other people draw processes as filters from customer idea to the development machinery, the way I illustrate our process is like I have always illustrated exploratory testing - putting the main actor in the center. With the main actor, good things either become reality or they fail doing so.

In the center of software development is two people: 
  • the customer willing to pay for whatever value the software is creating for them
  • the developer turning ideas into code
Without code, the software isn't providing value. Without good ideas of value, we are likely to miss the mark for the customer. 

Even with developers in the center, they are not alone. There's other developers, and other roles: testers, ux, managers, sales, support, just to mention a few. But with the developer, the idea either gets turned into code and moved through a release machinery (also code), and only through making changes to what we already have something new can be done with our software. 

As I describe our way of working, I often focus on the idea of having no product owner. We have product experts and specialists crunching through data of a versatile customer base, but given those numbers, the product owner isn't deciding what is the next feature to implement. The data drives the team to make those decisions. At Brewing Agile -conference, I realized there was another way of modeling our approach that would benefit people trying to understand what we do and how we get it done: flight levels.

Flight levels is an idea that when describing our (agile) process, we need to address it from three perspectives:
  1. Doing the work - the team level perspective, often a focus in how we describe agile team working and finding the ways to work together
  2. Getting the work coordinated - the cross team level of getting things done in scale larger than a single team
  3. Leading the business - the level where we define why the organization exists and what value it exists to create and turn into positive finances
As I say "no product owner", I have previously explained all of this dynamic in the level of the team doing the work, leaving out the two other levels. But I have come to realize that the two other levels are perhaps more insightful than the first.

For getting work coordinated, we build a network from every single team member to other people in the organization. When I recognize a colleague is often transmitting messages from another development team in test automation, I recognize they fill that hole and I serve my team focusing on another connection. We share what connections give us, and I think of this as "going fishing and bringing the fish home". The fluency and trust in not having to be part of all conversations but tapping into a distributed conversation model is the engine that enables a lot of what we achieve. 

For leading the business, we listen to our company high level management and our business metrics, comparing to telemetry we can create from the product. Even if the mechanism is more of broadcast and verify, than co-create, we seem to think of the management as a group serving an important purpose of guiding the frame in which we all work for a shared goal. This third level is like connecting networks serving different purposes. 

The three levels in place, implicitly, enabled us to be more successful than others around us. Not just the sense of ownership and excellence of skills, but the system that supports it and is quite different from what you might usually expect to see. 

Saturday, March 28, 2020

A Python Koans Learning Experiment

I'm curious by nature. And when I say curious, I mean I have hard time sticking to doing what I was doing because I keep discovering other things.

When I'm curious while I test, I call it exploratory testing. It leads me to discover information other people benefit from, and would be without if I didn't share my insights.

When I'm curious while I learn a programming language, I find myself having trouble completing what I intended, and come off a learning activity with a thousand more things to learn. And having a good plan isn't half the work done, it is not having started the work.

On my list of activities I want to complete on learning Python, I have had Python Koans. Today I want to complete that activity by reporting on its completion and what I learned with it.

Getting Set Up

The Python Koans I wanted to do were ones created by Felienne Hermans. On this round of learning yet-another-programming-language (I survived many with passing grades at Helsinki University of Technology as Computer Science major), I knew what I wanted to do. I picked Koans as the learning mechanisms because:
  • Discovery learning: learning sticks in me much better when instead of handing me theory to read, I get examples illustrating something and I discover the topic myself
  • Small steps: making steady progress through material over getting stuck on a concept - while Koans grow, they are usually more like a flashlight pointed at topics than requiring a significant  step between one and the other 
  • Test first: as failing test cases, they motivate a tester like myself to discover puzzles in a familiar context
  • Great activity paired: social learning and learning through another person's eyes in addition to one's own is highly motivating. 
  • Exploratory programming: you do what you need to do, but you can do what else you learn you need to do. Experiment away from whatever you have to deeper understanding works for me. 
This time I found a pair and mechanism that worked to get us through it. Searching for another learner with similar background (other languages, tester) on Twitter paired me up with Mesut Durukal, and we worked pretty consistently an hour a day until we completed the whole thing in 15 hours. 

The way we worked together was sharing screen and solving the Koans actively together. After completing each, we would explore around the concept with different values or extending with lessons we had learned earlier in Koans, testing if what we thought was true was true. And we wrote down our lessons after each Koan on a shared document.

The Learning

Being able to look back to doing this with the document as well as tweets two months after we completed the exercise is interesting.  I picked up some key insights from Twitter.

Looking at out private document, the numbers are fascinating: 382 observations of learning something.

With 15 hours, that gives us an average of 25 things in an hour.

On top of those 15 hours, I had a colleague wanting to discuss our learning activity, and multiple whiteboarding sessions to discuss differences of languages the learning activity inspired.

Next up, I have so many options for learning activities. Better not make promises, because no matter how publicly I promise, the only thing keeping me accountable is activities that we complete together. Thanks for the super-fun learning with you, Mesut!

Users test your code

In a session on introduction to testing (not testers), I simplified my story to having two kinds of testing:

  • Your pair of eyes on seeing problems
  • Someone else's pair of eyes on seeing problems
My own experience in 99% of what I have ended up doing on my 25-year is that I'm providing that second pair of eyes, and working as that has made me a tester by profession.

Sometimes the second pair of eyes spend only a moment on your code as they are making their own changes adding features (another developer) and you do what you do for testing yourself. Sometimes it becomes more of a specialty (tester). And while the second pair of eyes often is used to bring in perspectives you may be lacking (domain knowledge), there is nothing preventing that second pair of eyes having as strong or stronger programming knowledge that you do. 

You may not even notice your company has second pair of eyes, as there's you and then production. Then whatever you did not test gets tested in production, by the users. And it is only a problem if they complain about the quality, with feeling strong enough to act. 

To avoid complaining or extensive testing done slowly after making changes, modern developers write tests as code. As any second pair of eyes notices something is missing, while adding that, we also add tests as code. And we run them, all the time. Being able to rely on them is almost less of a thing about testing and quality, and more of a thing about peace of mind to move faster. 

In the last year or so, my team's developers have gotten to a point where they no longer benefit from having a tester around - assuming a non-programmer tester covering the features like a user would. While one is around, it is easy to not do the work yourself, creating the self-fulfilling prophecy of needing one. 

Over an over again, I go back to thinking of one of my favorite quotes:
"Future is already here, it is just not equally divided"
I believe future is without testers, with programmers co-creating both application software and software that tests it. I believe I live at least one foot in that future. It does not mean that I might not find myself using 80% of my time testing and creating testing systems. It means that the division is more fluid, and we are all encouraged to grow to contribute beyond our abilities of the day.

The past was without testers but also without testing. To see the difference of past and future, you need to see how customer perceives value and speed. Testing (not testers) is the way to improve both.

Wednesday, March 25, 2020

One Eight Fraction of a Tester

As I was browsing through LinkedIn, I spotted a post with an important message. With appropriate emphasis, the post delivered its intended point: TEST AUTOMATION IS A FULL TIME JOB. I agree. 

The post, however, brought me in touch with a modeling problem I was working through, for work. How would I explain that the four testers we had, were all valuable yet so very different? The difference was not in their seniority - all four are seniors, with years and years of experience. But it is in where we focus. Because, TEST AUTOMATION IS A FULL TIME JOB. But also, because OTHER TESTING IS A FULL TIME JOB. 

As part of me pondering this all, I posted on Twitter: 

The post started a lively discussion on where (manual) testers are moving, naming the two directions: quality coaches teaching others to build and test quality and product owners confirming features they commenced. 

The Model of One Eight Fraction of a Tester

Taking the concepts I was using to clarify my thinking about different testers, a discussion with Tatu Aalto over a lovely refreshing beverage enjoyed remotely together drew the mental image of a model I could use to explain what we have. With two dimensions of 4x2 boxes, I'm naming the model "One Eight Fraction of a Tester".

1st Data Point

In our team, we have six developers and only one full-time manual tester. I use the word manual very intentionally, to emphasize that they don't read or write code. They are too busy with other work! The other work comes from the 6 super-fast developers (who also test their own things, and do it well!) and 50+ other developers working in the same product ecosystem. Just listing what goes on as changes on a daily basis is a lot of work, let alone seeing those changes in action. Even when you leave all regression testing for automation. 

The concern  here is that story and release testing both in our context could be intertwined with creating test automation. For level 1 testing to see features with human eyes, that could also happen while creating automation. 

Yet as the context goes, it is really easy to find oneself in the wheel, chipping away level 1 story testing "I saw it work, maybe even a few times", story after story, and then repeating pieces of it with releases. 

2nd Data Point 

A full time exploratory tester in the team, taking a long hard look at where their time goes, is now confessing that the amount of testing they get done is small and the testing is level 1 in nature. The coverage of stories and releases is far from the tester focusing there full time. Instead, where time goes is enabling others in building the right thing incrementally (product owner perspective) and creating space for great testing to happen (quality coach perspective). While they read code, they struggle to find time to write it, and they use code for targeted coaching rather than automating or testing.
The concern  here is that no testing is getting done by themselves. Even if they could do deeper story testing, they never practically find the time. 

As the context goes, they are in a wheel that they aren't escaping, even if they recognize they are in it.  

3rd Data Point

A most valued professional in the team, a spine of most things testing is the test automation specialist. They find themselves recognizing tests we don't yet have and turning those ideas into code. While they've found, with support of the whole team, particularly developers, time to add to coverage not only maintain things functional, maintenance of tests and coordinating that is a significant chunk of their work. While they automate, they will test the same thing manually. While they run the automation, they watch automation run to spot visual problems programmatic checks are hard to create for. That is their form of "manual testing" - watch it run and focus on things other than what the script does. 

The concern  here is that all testing is level 1. Well, with the number of stories flying around, even with all groups groups of developers having someone like this writing executable documentation on expectations exist, they still have a lot of work as is.

As context goes, they too are in a wheel of their own with their idea of priorities that make sense.

4th Data Point

Automation and Infrastructure is a significant enabler, and it does not stay around any more than any other software unless it is maintained and further developed. The test automation programmer creates and maintains a script here and there, test a thing here and there but find that creating that new functionality we all could benefit from needs someone to volunteer for it. Be it turning manually configured Jenkins to code in a repository, or our most beloved test automation telemetry to deal with the scale, there is work to be done. As frameworks are best being used by many, they make their way to sharing and enabling others too.

The concern here is that no testing gets done with a framework alone. But it without framework, it is also slower and more difficult than it should be. There are always at least three main infrastructure contributions they could make when they can fit one into their schedule, like any developers. 

They have a wheel of their own they are spinning and involving every in. 

Combining the data points

In a team of 10 people, we have 10 testers, because every single developer is a tester. With the four generalizing specializing testers, we cover quite many of the Eights.
The concern here is that we are not being always intentional in how we design this to work, it is more of a product of being lucky with very different people.

The question remains for me: is the "Story Testing lvl 10" as necessary and needed I would like to believe it is? Is the "Story Testing lvl 1" as unnecessary to separate from automation creation as I believe it is? And how things change when one is pulled out - who will step up to fill the gaps?

How do you model your team's testing?

Monday, February 10, 2020

Business Value Game - What if You Believed Value is Defined by Customer, Delivery-time?

Over the years of watching projects unfold, I've grown uneasy with the difficulty of understanding that while we can ask the customer what they want in advance, we really know the value they experience after we have already delivered. All too often "agile" has ended up meaning we optimize for being busy and doing something, anything, and find it difficult to focus on learning about the value. To teach this through experiences, I've been creating a business value game to move focus on learning on value.

We played this game at European Testing Conference, and it reminded me that even I forget how to run the game after some months of not doing it. Better write about it then!

Crystal Mbanefo took a picture of us playing at European Testing Conference. 

The Game Setup

You need:

  • 5 colors of token, 25 tokens each color
  • "Customer secrets" - value rules for the customers eyes only where some value is 
    • Positive
    • Negative
    • Changes for the color 
    • Depends on another color
  • Precalculated "project max budget" that is the best value team can achieve learning the rules of how customer values things
  • Placeholders for each month of value delivered on the table
  • Timer to run multiple rounds of 3 minutes - 6 months - projects and reflection / redesign time, total 60-90 minutes. 30 seconds is a month, reflected by the placeholders on the table.
More specific setup:
  • Create 5 batches of "work", each batch with 5 tokens of each of the 5 colors
  • Place post-its in front of where customer is sitting so that work can be delivered 
  • Hand "customer secrets" to customer and allow them to clarify with you how their value calculation rules work
  • Post "project max budget" on a whiteboard as reference
  • Explain rules:
    • 6 people will need to do the work
    • The work is flipping a chip around with left hand
    • The work is passed forward in batches, starting batch size is 25
    • After one finishes the work, the other can start
    • Only value at customer is paid for, and customer is available at the end of the 6-month project to announce and make the payment. 

First round:

Usually on the first round, the focus is on the work and trying to get as much of it done under the given constraints as possible. With large batch size moving through the system, it takes a long time before the team starts delivering.  Usual conclusion: smaller batch size. 

During retrospective, we identify what they can and cannot change:
  • They cannot pre-arrange the chips, the work can only start when project starts. 
  • They can ask the customer to be available for announcing and making payments earlier. Monthly payments are easy to ask for. 
  • They can do the work of 6 flips in any order, but the all 6 people doing the work must be involved in each work item before it is acceptable for the customer.
  • They can do smaller batches and order chips in any order they want - after the project has started.
  • They can use only one hand, but do not need to limit themselves to left hand only. 
Second round:

Different groups seem to take batch size idea in different scale on round 2. While batch size of 1 would seem smart and obvious, a lot of teams bring things down to batch size 5 first. It does not really matter, with both smaller batch sizes what usually happens on round 2 is that the team delivers with lot of energy, and all chips end up on the customer side. The customer is overwhelmed with saying how much anything is worth, so even if the team agreed on a monthly payment, customer is able to announce the value of the month only towards end of the project. With focus on delivery, if customer manages to announce the value, the team does not listen to react.

At the end, team gets smaller than the maximum value, regardless of their hard focused work. We introduce the concept of importance of learning, and how that takes time in the project.

During retrospective, they can identify the ways of working they agree to as a team to change the dynamic. Here I see a lot of variance in rules. Usually batch size is down, but teams struggle to control how the batches get delivered to customer or listen to the customer feedback. Often a single point of control gets created, and a lot of the workers stay idle while one is doing thinking.

Third-Fourth-Fifth rounds:

Depending on the team, we run more projects to allow experimenting with rules that work. We let the customer secrets stay the same, so each "project" fails to be unique but is yet another 6 months of doing the work with same value rules. Many teams fail at creating learning process, only a learning outcome for the rules at hand.

Final round:

On last round, customer can create new values for value and the team tests their process whether they are able to now learn during the project.

The hard parts: facilitation and the right secrets

I'm still finalizing the game design, and creating better rules for myself to facilitate this. A key part that I seem to struggle with still is the right values for "customer secrets". The negative values need to be large enough for the team to realize they are losing value by delivering things that take away the value, and the dependent and changing values can't be too complex for the customer to be able to do the math.

I've usually used values in multitudes of 100 000 euros because large numbers sound great, but less zeros would make customers life easier.

I play with poker chips because they have a nice heavy feel for "work" but since carrying around 10kg of poker chips isn't exactly travel friendly, I also have created impromptu chips from 5 colors of post-it notes.

Also, still optimizing the process myself on how to combine delivering and learning. There is more than one way to set this stuff up.

Let me know if you play this, would love to hear your experiences.