A Seasoned Tester's Crystal Ball: August 2020

Sunday, August 30, 2020

Pondering on Requirements

I remember the day on my career when I understood that Requirements was something special.

I had spent years with software testing where feedback was welcome even if the resulting changes got prioritized to wait, but this project was different.

I could have seen it coming from the advice people were giving on exact requirements traceability, like we were preparing for the war to defend what was rightfully ours.

I had tested a system to find out it had been built on a version of an open source component that was end of life, with problems that would as the project progressed lead us to a dead end with regards to our ability to react. It had been built in a way where the component change was not straightforward, quite the opposite. I reported this, getting the attention of all of my own organizations management team. We scheduled a meeting with the subcontractor's architect and he played the Requirements card. We had not been specifically saying that putting us into this position with a brand new in progress multi-million system should rely on something different.

My years with this company were filled with experiences like this. Continuous fights over contractual clauses. Meetings where we would discuss movement of money for yet another newly discovered Requirement like one saying that a machine intended to calculate monetary benefits should calculate at least roughly correctly. No, all of those were our mistakes in the Requirements.

Years passed, and I learned to choose my work so that instead of focusing on Requirements, we focused on value and features and progress. Requirements stayed in the role they should be: points of communication, aiming for mutual understanding and benefit to our customers. Useful for testing to know as the rough idea of what we were building, but not the focus or the limit of what there is for that system.

With agile, we learned that epics and stories were not requirements, they were a new kind of intermix of a mini project plan and something requirements-like. With continuous delivery, we could do small slices at a time, and running tested features supported by test automation and a caring team were our new normal.

When Requirements card gets played, it is now played to avoid responsibility on one side of the mutual relationship of building something good. It's played to say there needs to be a list and proof of covering all those, because someone expects something they are sure you cannot do without that. The cost of the proof - not just the direct work but the impact on being able to see things and be motivated - was irrelevant.

Friday, August 21, 2020

A Tester Hiring Experiment - Test with Them

For two summers in a row, for two different companies, I have been in the lovely position of being able to offer a temporary trainee position for some person for that summer. As one can imagine, when there is a true beginner position available, there are a lot of applicants.

Unfortunately, it is really hard to make a difference between an applicant and another when what you look for is potential. Our general approximations of potential are off, and the biases we have will impact our choices.

The first trainee that we selected for last summer came through the HR pipeline. The lucky 5 selected out of thousands to be viewed at the final stages were fascinating people to talk with.

First of all, they all had programming experience. I particularly remember a woman with 2 years in a position in another company that my co-interviewer rejected based on "not knowing this piece trivia means she does not know anything" and a 15-year-old boy with 6 years of programming experience. While I'm delighted the promising young man got a chance, that recruitment filled my heart with hopelessness for anyone who starts later in life.

Again, we are recruiting on potential. Obviously we want the person to contribute in the work. But we want also the person to learn, to grow, and become someone they are not yet when they start their work with us. And you can't see that potential from their past achievements when we talk about an entry level position.

Entry level positions are like placing bets. The chances are we will never know we missed an awesome candidate. Or, chances are, we will know later in life when that person we rejected on trivia shows up as our boss.

With my hopeless heart, I needed an experiment to bring out hope. To balance the last line of trainees all be men, I facilitated creation of another position open to everyone but primarily marketed in women's spaces. I did so well with targeting my marketing that men did not apply. Where you post matters on who you get.

Also, I wanted to try a different criteria. Instead of selecting them based on how they write and talk about their aspirations and experiences, I refused to read their CVs to prioritize who I would talk to. I talked to every single one of them, for 15 minutes. Or rather, as I set expectations already in the invite to apply, we wouldn't talk of them. We would pair test an application because the main skill I would look for in a candidate under my supervision is learning under my supervision.

Looks like I tweeted Fri Jun 21 2019 that I went through 28 people.

    "created_at" : "Fri Jun 21 08:29:11 +0000 2019",
      "full_text" : "Gilded Rose brought us ‘junior developer’, our summer intern. I made 
       28 people do it for 15 minutes each. Never read their CV. It shows interesting 
       things about how you approach a problem. https://t.co/iWebQMcJwx",
      "lang" : "en"

I saw the people on the video calls as we worked together on the same problem, over and over again with the candidate changing. I would write notes of how they approaches the problem and how they incorporated my guidance as we were strong-style pairing on test code. And after a lot of calls, I had a small group of people with tester kind of thinking and learning pattern that I would select from. The final one was based on luck, I put the three names in a hat and pulled one out.

Turns out she was a 47-year old career changer. The moment where I felt like I should have given this rare opportunity to a younger woman was a great revelation on my personal built-in ageism. From acknowledging my bias, I set out to help her succeed.

During the summer, she learned to write test code in python and include that into a continuous integration system. She explored and analyzed a feature, and got multiple things in it corrected. The tests she contributed were our choice, and in hindsight, our choice sucked. She was part of starting a larger discussion on what types of tests are worth it, and what are little value. Her coded tests didn't fail because she couldn't code or analyze a feature, but because we pointed her at a feature that we really should have thought twice on.

The lessons I drew of this are invaluable to me:

Choosing a tester by testing with them is a better foundation
Choosing a tester by testing can happen in short sessions and overall time is better used in this activity over deciphering a CV
The work we allocate to someone starting as new does matter, and their success is founded on our choices
Diversity of our work force will never change if we expect our 15-year old summer trainees to come with 6 years of programming experience. The field evidence shows that late start does not hinder later usefulness.

So this summer, my experiment has been around how I teach at work. I throw new people at the versatility of real project and protect their corner less. I work to make myself somewhat available on moving them forward. And I am delighted with the results I watch after two months of attending on how well they do the basic tester job: finding information, driving fixes and doing some themselves, and automating with a selection of multiple programming languages. Obviously they have more work to do on learning, but so do I, and I have been on this for 25 years.

Saturday, August 8, 2020

Recall Heuristics for Test Design

Good exploratory testing balances our choices what to do now so that whenever we are out of time, we've done the best job testing we could in the time we were given, and are capable of having a conversation about our ideas of risks we have not assessed. To balance choices, we need to know there are choices and recently I have observed that the amount of choices some testers make is limited. A lot of what we call test design nowadays is recalling information to make informed selections. Just like they say:

If the only tool you know is a hammer, everything starts to look like a nail.

We could add an exploratory testing disillusionment corollary:

It's not just that everything starts to look like a nail, we are only capable of noticing nails.

The most common nail of testers that I see is the error handling cases of any functionality. This balances the idea that most common nail programmers see is the sunny day scenario of any functionality, and with the two roles working together, we already have a little better coverage over functionality in general.

To avoid the one ingredient recipe, we need awareness of all kinds of ingredients. We need to know a wide selection of options for how to document our testing from writing instructional test cases, to making freeform notes to making structural notes on individual level to making structural notes on group level to documenting tests as automation as we are doing it. We need to know a selection of coverage perspectives. We need to know that while we are creating programs in code, they are made for people and a wide variety of people and societal disciplines from social sciences to economics to legal apply. We need to know relevant ways things have failed before, being well versed in both generally available bug folklore as well as local bug folklore, and to consider both not failing the same way, but also not allowing our past failures to limit our future potential and drive testing by risk, not fear.

This all comes down to the moment you sit in a team meeting, and you do backlog refinement over the new functionality your team is about to work on. What are the tasks you ensure the list includes so that testing gets done?

In that moment, what I find useful being put on the spot is recall heuristics. Something that helps me remember and explain my thoughts in a team setting. We can't make a decision in the moment, without knowing our options.

I find I use three different levels of recall heuristics to explore what I need to recall my options in a moment. Each level explores at a different level of abstraction:

change: starting from a baseline where the code worked, a lot of times what I get to test is on a level of code commit to trunk (or about to head to trunk).
story: starting from a supposingly vertical slice of a feature, a user story. In my experience though people are really bad at story-based development in teams, and this abstraction is available rarely even if it is often presented as the go-to level for agile teams.
feature: starting from value collection in the hands of customers where we all can buy into the idea of enabling new functionality.

For a story level recall heuristic, I really like what Anne-Marie Charrett has offered in her post here. Simultaneously, I am in a position of not seeing much of story-based development but backlogs around me tend to be on value items (features and capabilities) and the story format not considered essential.

Recall on level of change

The trigger for this level of recall is a chance in code. Not a Jira ticket, but seeing lines of code change with a comment that describes the programmer intent for the change.

Sometimes this happens in a situation of pairing, on the programmer's computer, the two of you working together on a change.

Sometimes this happens on a pull request, someone having made a change and asking for approval to merge it to trunk.

Sometimes this happens on seeing a pull request merged and thus available in the test environment.

This moment of recall happens many times a day, and you thinking quickly on your feet under unknown change is a difference in fast feedback and delayed feedback.

How I recall here:

(I) intent: What is supposed to be different?
(S) scope: How much code changed? Focused or dispersed?
(F) fingerprint: Whose change, what track record?
(O) on it: How do I see it work?
(A) around it: How do I see other potentially connected things still work?

Recall on level of feature

The trigger for this level of recall is need of test planning on a scale of feature to facilitate programmers carrying their share of testing but also to make space for testing.

Sometimes this happens in a backlog refinement meeting, the whole team brainstorming how we would test a feature.

Sometimes this happens in a pair, coming up with ideas of what we'd want to see tested.

Sometimes this happens alone, thinking through the work that needs doing for a new feature when the work list is formed by process implying "testing" happens on every story ticket and epic ticket level without agreeing what it specifically would mean.

(L) Learning: Where can we get more information about this: documents, domain understanding, customer contacts.
(A) Architecture: What building it means for us, what changes and what new comes in, what stays.
(F) Functionality: What does it do and where's the value? How do we see value in monitoring?
(P) Parafunctional: Not just that it works, but how: usability, accessibility, security, reliability, performance...
(D) Data: What information gets saved temporarily, retained, and where. How do we create what we need in terms of data?
(E) Environment: What does it rely on? How we get to see it in growing pieces, and where?
(S) Stakeholders: People we hold space for. Not just users/customers but also our support, our documentation, our business management.
(L) Lifecycle: Features connect to processes, in time. Not just once but many times.
(I) Integrations: Other folks things we rely on.

Recalling helps make choices as we are aware of our choices. It helps call in help in making those choices.

Monday, August 3, 2020

An Analysis of Exploratory Testing

Open space conferences like Socrates UK Digital Summer provide a great platform for making a little progress on finding ways to teach about exploratory testing in writing. For purposes of writing, I run an ensemble testing session to compare notes of what I did in prep alone vs. where the group ends up. Putting the two together could provide useful lessons for those who did not get to join.

For these sessions, I picked up a new test target. Eviltester posted some of his testing apps and games a while back, and EPrimer ended up as my choice as it promised

Not heavy on bugs - could actually focus on testing instead of bug reporting
Completely unknown domain: proper English language writing style "eprime" I had never heard of.
WebUI with beautiful IDs

At this point, I encourage you to follow the link to the app and stop reading what I say before you tried it out yourself. If you did not follow my encouragement, I suggest that after reading this, pick up another of the eviltester test targets and apply what you learned here.

Session Charter: Explore EPrimer focusing on two kinds of documentation as output: test automation you can run (using e.g Robot Framework) and a mindmap. Time: 1 hour plus learning time for test automation tool if you have no experience and no expert available answering your questions in the moment.

Two sessions, two results

As expected, the two session provided very different results that complement one another.

Session one produced ~30 tests one can run again, spread over 7 test suites, each named on the type of collection of data it was testing and a mindmap on realization that all tests were on single function while there were multiple but covered the domain description as specification well, identifying multiple problems against specification.

Session two produced 5 tests one can run again, all in 1 test suite where a bit of commenting out is necessary to get the tests to run later. The coverage of functions was significantly better and the session identified 2 bugs. No mindmap was created and better function coverage came from choosing to understand everything a little and not diving systematically into specification. Single created test covered more ground.

Breakdown of activities

Whenever we are doing exploratory testing, we get to make choices of where we use our limited time based on the best information available at the time of testing. We are expected to intertwine various activities, and when learning, it may be easier to learn one activity at a time before intertwining them.

If you think back to learning to drive (while stick gear was a thing), you probably have ended up in an intersection, about to move forward and your car shutting down as intertwining your actions with the gear and pedals were not quite as they should. You slowed down, made space for each activity and got the car moving again. Exploratory testing is like that, you control the pace and those who have practiced long will be intertwining activities in a way that appears magical.

For this testing target, we had multiple activities we needed to intertwine (learn / design / execute):

Quickly acquiring domain knowledge: no one knew what eprime is, and we had our choice of reading about it.
Acquiring functional knowledge: using the application and figuring out what it does.
Creating simple scripts with multiple inputs and outputs: using same test as template for data-driven helps repeat similar cases in groups.
Identifying css selectors: if you wanted test automation scripts, you needed to figure out what to click and verify and how to refer to those from the scripts.
Controlling scope of tests: see it yourself, see it blink with automation, repeat all, repeat only the latest.
Creating an invisible or visible model: Seeing SFDPOT (Structure, Function, Data, Platform, Operations, Time) to understand coverage in selected or multiple dimensions
Cleaning up test automation: Improving naming and structuring to make more sense than what was created in the moment.
Using the application: Making space for chances to see problems beyond the immediate test
Identifying problems: Recognizing problems with the application.
Documenting problems: Writing down problems in either test automation or otherwise.
Working to systematic coverage: Pick a dimension, and systematically cover it learning more on it.
Reading the code: We had the code and we could read it. That could add to our understanding.

Taking another two hours on top of the two hours on cleaning up the results. I summarized final results like this:

After 2 hours of #ExploratoryTesting documented with automation, it took another 2 hours to clean it up. 52 tests total, 38 passed, 14 failed. Quite many bugs as per specification. Time on app has been maybe 30 minutes of this 4 hours.
— Maaret Pyhäjärvi (@maaretp) August 1, 2020

The app isn't completely tested, as the exercise setting biased us towards documentation.

Examples of activities

Quickly acquiring domain knowledge. Reading the specification of eprime. Focusing on examples of eprime. Refreshing knowledge of English grammar around the verb "to be" e.g. 5 basic forms of verbs and 6 different types of verbs, or 6 categories of verbs - all things I googled for as I am writing this. While the specification tells what knowledge was probably used to create the application, there is domain knowledge beyond what people choose to write down in specification.

Acquiring functional knowledge. Using the application. Asking questions about what is visible, particularly the concepts that are no obvious: 'What is Possible Violations?". Seeking data demonstrating it could work. Seeking large data to demonstrate functions through serendipity.

Creating simple scripts with multiple inputs and outputs. Writing test automation that allows for giving multiple input and output values as parameters. Getting into the tool and into using the tool.

Identifying css selectors. Getting to various values with code, understanding what is there in different functions to click and check. Feeling joy systematic ID use making the work easier. Recognizing conflicts in UI language and selector language.

Controlling scope of tests. Moving tests to separate files. Commenting out tests that already work. Running tests one by one.

Creating an invisible or visible model. Ensuring we see all things work once before we dig deeper in any individually. Creating a map of learning in the last minutes of the session. Writing down notes of what we are seeing as either test automation or other types of documents.

Cleaning up test automation. Rename everything named foo at time when we knew the least. Comment out things to focus on the next thing getting done efficiently. Using domain concepts as names of collections.

Using the application. Spending time using the application to allow for serendipity. Observing look and feel. Observing selected terminology.

Identifying problems. Seeing things that don't work. Like visual of it that is very bare-bones. Or line breaks turning valid positives into false negatives.

Documenting problems. Writing these down in test automation. Figuring out it we want to leave it passing (documenting production behavior) or failing (documenting bugs). Remembering issues to mention. Writing them down as notes. Writing a proper bug report.

Working to systematic coverage. Stopping to compare models to how well those are covered. Creating a visual model. Covering everything in the specification. Covering all visible functionality.

Reading the code. Closing the window that has the code as it gets in the way of moving between windows. Reading the code to see how concepts were implemented.

Some Reflections

Every time I teach exploratory testing, I feel I should find ways of teaching each activity separately. There is a lot going on at the same time, and part of its effectiveness is exactly that.

In the group, someone suggested we could split the activity so that we first only document as test automation, not caring about any of the information other than what is true right now in the application. Then we could later review it against specifications and domain knowledge. That could work. It would definitely work as one of the many mixes when exploring - change is the only constant. This split is what approvaltesting is founded on, yet I find that I see different things when I use the application and create documentation intertwined, than receiving documentation that I could review. One night in between the actions is enough for me to turn into a different person.

The Final Deliverables

In the last minutes of one of the sessions, I cooked up a mindmap of what was in my head on the application. I had only covered a small portion, focusing on counting Discouraged words.