Sunday, January 3, 2021

Contemporary Exploratory Testing

We all know what testing looks like: it's when we hunt information, using the application, it's interfaces and all existing information as our external imagination to go even deeper in empirical understanding of what is true and what is an illusion. It involves a person or multiple people, using programming to get to places in time, but it is all framed in this quest of someone learning what more we could try knowing. 

Yet when organizations set up "testing", they ask for resources, pay limited attention to skills, focus on plans, covering requirements, writing and executing test cases, cutting down in testing when project schedules fall short. 

When we say "all testing is exploratory", we have the individual tester on their good day in mind. I call this exploratory testing, the verb. The act of testing is inherently exploratory when people are involved. No matter how strict a test case you gave, how much you told not to leave that path described, the human mind wonders and the human fingers make mistakes revealing more than the test intended. 

The organizational frame however makes a huge difference on how that inherently exploratory testing takes place, and how much of its wings are cut. I call this exploratory testing, the noun. Organizations often set up frames for testing that are far from exploratory, and get results that are far from what we would mean by results of exploratory testing. 

Exploratory testing (verb AND noun) is focus of my learning and teaching. I want to create excellent products with excellent testing, and I feel that the 36 year old coining of the term needs a revisit from its watered down interpretations. Thus I sometimes add still one more word to explain what I am aiming for: contemporary exploratory testing.

Contemporary is about today. I use that word to get away from the ideas of ISTQB and agile testing where exploratory testing was considered a technique, a thing you do for unknown unknowns on top of all the other testing recipes. For me, it has for the last 25 years of my career in it been an approach, a foundation of agency, learning and systems thinking that frames my choices of using all the other recipes. And I believe that is what it is at its roots, so calling my approach contemporary is saying I want to take a step away from some of the popular notions of it being the idea of just spending time with an application to find bugs. 

With this new year, I welcome you to join my exploration of how to understand and teach contemporary exploratory testing. I have many subproject on it:

  • Exploratory Testing Podcast. I just published my first episode yesterday, and will continue to do so on a monthly cadence. 
  • Exploratory Testing Academy. I work with  Parveen Khan (UK), Angela Riggs (US), Irja Straus (Croatia) and Mirja Pyhäjärvi (Finland) to create free testing video courses and a series of paid facilitated courses with focus on learning by doing testing under various constraints for various systems under test. 
  • Exploratory Testing Book. I finish the book I started so that you don't have to read all the individual articles and get out a timeline of where I am now compared to what I wrote earlier
  • Exploratory Testing Slack. I want to bring together people that are figuring out contemporary exploratory testing. I take it more as a forum of practitioners than a forum of consultants. 
  • Exploratory Testing Twitter account. I use this for promotions, because the one advice I was given on marketing (collect people's emails) is the one advice I don't want to use. I want pull over push, even if it is against all things marketing. 
I have a full time job at a company as a tester. My job pays well, better than average developers, and I have my company specific goals there. I do this all because I feel the one thing I should do better is scale. I make my work available for free to support scale. I have things I could use extra money on, but I leave the payment part for the community based on value. You can pay (but don't have to) for my book in progress. You can pay by buying me coffee. Since I am Finnish, you can never donate me money - only pay for my services. But I want that to be optional, and work against paywalls in my own little way. 

If out of this I create more great testers, more testing trainers, more love of testing in both professional testers and programmers - my aspirations are fulfilled. 

Tuesday, December 29, 2020

I like numbers

This feels like a dirty secret, but it becomes very clear if you ever work with me. I like numbers. 

I count things, some more relevant than others. I usually know how many times someone called me a 'guy' on a particular day, because it's my calming mechanism of having to deal with that annoyance. I know I delivered 23 external conference sessions in 2020, I know my team is about to do release number 24 before the year ends, I know we've processed a little over 300 Jira tickets (which I consider a few full weeks of wasted time) and I know how many items of value I would say they actually include since I reverse engineered value out of the work we did this year. 

The numbers are information. They are only one type of information. But they are information easily used to impress people. And they are a tool for getting my message across stronger than most people. 

For past few jobs, I have done end of year numbers. Since anything I work on right now is kind of secret, I'll just go traveling back in time to show what type of numbers I collected in the two previous years, to run an annual comparison.

  • Team size. How many people delivering the work I was analyzing. 
  • Monthly Active Devices (MAD). The first number I was curious on is how many customers the team was serving with the same people. Being a DevOps team meant that the team did development and deployment of new changes, but also provided support for a ever growing customer base we calculated in impressively large numbers. Telemetry was invaluable source for this information. It was not a number of money coming in. It was people using the product successfully. 
  • Work done represented in Jira tickets. I was trying hard to use Jira only as an inbox of work coming from elsewhere outside the immediate team, and for most part I succeeded with that and messed up all my changes of showing all work in Jira ticket numbers (I consider this a success!). About a third of visible ticket work done was maintenance, responding to real customer issues and queries. Two thirds were internally sourced. 
  • Work coordinated represented in Jira tickets. Other teams were much stricter in not accepting work from hallway conversations, and we often found ourselves in a role of caring for the work that others in the overall ecosystem should do. Funny enough, numbers showed that for every 2 tickets we had worked on ourselves, we had created 3 for other teams. The number showed our growing role in ensuring other teams understood what hopes were directed towards them. It was also fascinating to realize that 70% of the work we had identified for others was done within the same year, indicating that it wasn't just empty passing of ideas but a major driving force effort. 
  • Code changes. With the idea that for a DevOps team nothing changes if the code (including configurations) changes, I looked around for numbers of code going into the product. I counted how many people contributed to the codebases and noted it was growing, and I counted how many separate codebases there were and that that too was growing. I counted number of changes to product, and saw it double year over year. I noted that for 4 changes to the product, we had 3 changes to system level test automation. I noted code sharing had increased. Year over year numbers were delight: from 16% to 41% (people committing to over N components) and from 22% to 43% (more than M people committing on them) on the two perspectives of sharing I sampled. I checked my team was quarter of the people working on the product line, and yet we had contributed 44% of changes. I compared changes to Jira tickets to learn that for each Jira ticket, we had 6 changes in. Better use the time on changing code than managing Jira, I would say. 
  • Releases. I counted releases, and combinations included in releases. If I wanted to show a smaller number, I just counted how many times we completed the process: 9 - number that is published with the NEXTA article we wrote on our test automation experience. 
  • Features pending on next team. I counted that while we had 16 of them a year before, we had none with the new process of taking full benefit of all code being changeable - including that owned by other teams. Writing code over writing tickets for anything of priority to our customer segment. 
  • Features delivered. I reverse engineered out the features from the ticket and change numbers, and got to yet another (smaller) number. 
  • Daily tests run. I counted how many tests we had now running on a daily basis. Again information that is published - 200 000. 
So you see, numbers are everywhere. They give you glimpses to what might be happening, but at the same time they are totally unreliable. If you have a critical mind and good understanding on their downsides, looking at them may be good. 

Going back in time even more, I find my favorite numbers: how I ended up having to log less and less bugs as the team's tester. From impressive numbers showing I found 8,5 bugs for every day of the year to having almost none logged as I moved to fix-and-forget and pairing with developers on fixes give a nice fuzzy feeling that turning my work invisible was a real improvement. 

Ask your question, and numbers may help. Numbers first - or comparable numbers between different teams - usually cause havoc. 

So like numbers, like me. But be careful out there. Even if they say you get what you measure, it is only true sometimes. We can talk about that another time. 

Thursday, December 17, 2020

The box with Christmas Ornaments

There is a fascinating way of coming to the idea that the problem is almost always testing. Here's a little story of something that has happened to me many times in many organizations, and was recently inspired to think about. Maybe it is because it is almost Christmas. :)

Speaking in metaphors, the box with Christmas Ornaments inside. 

Once upon a time, there was a product owner who ordered a Box with Christmas Ornaments. As product owners go, they diligently logged into Jira their Epic describing acceptance criteria clearly outlining what the Box with Christmas Ornaments would look like delivered. 

The Developers and the Testers got busy with their respective work. Testers carefully reviewed the acceptance criteria that was co-created, and outlined their details of how testing would happen. Developers outlined the work they need to do, split the work to pieces, and brilliantly communicated to testers which pieces were made available at each time. Testers cared and pinged on progress, but when things aren't complete, they are not complete. 

The test environment for the delivery was a large table. As pieces were ready from the Developers, their CI system delivered an updated version into the middle of a table. The Box with Ornaments was first a pile of cardboard, and everyone could see it was not there yet. But as work progressed, the cardboard turned into a Box, without the Ornaments. As per status, pieces were delivered (and tested), but clear parts of the overall delivery were still undone.

Asking the status and wanting to be positive, Developers would report on each piece completed, and the Box on the table looked like it was there. It was there quite some time. Asking status from testers on testing, they would learn that testing was incomplete, and it was so easy to forget that there are scenarios that required both the Box and the Ornaments to make sense of the final item, even if we could and had tested to learn about each individually. 

The product owner, equipped with their Epic in Jira looking towards the table concluded: 

Things get stuck in the process. They are long in an intermediate stage. It feels like they don't care about delivering me my package, they just leave it lying around for testing. 

It's not like they ordered the Box without Ornaments. Yet they feel it looks ready enough that putting the Ornaments in is extra wait time. 

To achieve flow of ready to the hands of whoever is expecting, optimizing developer time between multiple deliveries really does the negative trick. Yet we still, in so many cases, consider this to be a problem with testing.

I know how to fix it. Let's deliver it as soon as developer says so. No more Testers in the place you imagine them - between implementation and you having that feature at your hands. 

A better fix is to deliver the empty box all the way to the customer as it is ready, and carefully think if the thing they really wanted was the Ornaments, and if another order of delivery would have made more sense. 

Tuesday, December 8, 2020

RED green refactor and system test automation

 In companies, I mostly see two patterns with regards to red in test automation radiators:

  1. Fear of Red. We do whatever we can, including being afraid of change to avoid red. Red does not belong here. Red means failure. Red means I did not test before my changes can be seen by others. 
  2. Ignorance of Red. We analyze red, and let it hang around a little too long, without sufficient care on the idea that one known red hides an unknown red. 
I don't really want either of these. I would like to work in a way where we bring things back to a known (green / blue - pick your color) baseline, but seeing red is invitation to go dig deeper, to explore, to understand why. 

Red being associated with failure is a problem. With Red-Green-Refactor cycles, we want to start with red. Red is an essential part of the process. Trying to make progress with occasional fails is better than not making progress to avoid all fails. 

The red allergy is allergy to failing. And allergy to failing is an indication that we don't have a mindset of appreciating learning. 

Instead of fearing red, we should fear green/blue. Red calls in for action, green/blue accepts a status quo. W can only trust green/blue if we have, in the process of creating the tests, seen red. Make it fail, make it pass, clean it up. 

I think back to the projects I've watched with system test automation, and red is not an infrequent visitor. Red invites work. And with making decisions, we still wish we'd have less red, and faster analysis of the red. 

From appreciating the red, to working to get the right red to right hands sooner - that is my call for action. And with a system of systems being automated, the granularity of feedback is a fascinating problem to solve. 

Tuesday, November 24, 2020

Orchestrating System Test Automation for Visual Thinkers

Working on a sizable system, feature after feature I found us struggling with timeliness of completing testing. Each feature kickoff, testerkind showed up to listen, and start learning. While development was ongoing, more learning through exploring took place. And in the end, after feature was completed, tests were documented in test automation, and just in case time taken for some more final exploring. 

It seemed like a solid recipe, but what I aspired for is finding a way where testing could walk more in sync with the development. The learning time was either half-attention, or elsewhere occupied, and it resembled a mini-waterfall for each feature. 

I formulated an experiment, with the intent of learning if being active with test automation implementation from the start would enable us to finish together. And so far, it is looking much better than  before.

The way I approached test automation implementation was something I had not done before. It felt like a thing to try, to move focus from testerkind listening and learning to actively contributing and designing. In preparation for the feature kickoff, I draw an image of points of control and visibility, tailored to the specific feature.

As I don't want to go revealing any specific of what we build, I drew a general purpose illustration of the idea. Each box on the main line were touch points where development for the feature would happen (pinks). Each box on the side was a touch point already existing in the system, found by analyzing the system for what was relevant combination for this feature (greens and oranges). I used two colors to categorize existing system touchpoint on closeness to the things we were developing (greens) and necessity for end to end perspective (oranges). Finally, for each touch point, I identified the type of handles: visibility and control, existing or missing. 

To make the abstract a little more specific, let me give you a few safe examples from my years in testing:
  • ssh to a command line in a remote computer is a touch point of both control and visibility
  • reading a file in filesystem is a touchpoint of visibility
  • reading a system log is a touch point of visibility
  • calling a REST API and verifying responses is a touch point of both visibility and control
  • clicking a button on the UI and entering values is a touch point of control 
Same things can be both control and visibility. Making these choices is how I think about design for testability. And I prefer thinking it visually. 

Having agreed the touch points, we could do test automation a touch point at a time. Our understanding of what touch points that were not about to change we could already work on, and what touch points were depending on the changes. We could talk about how order of development enabled testing. And most importantly, we could talk about which touch points I identified did not make sense for the system scale, because we could address risks on unit and component tests. 

The current popular thinking in the testing community is to paraphrase testability into something wider than visibility and control. This little exercise reminded me that I can drive a better collaboration test automation first, without losing any of exploratory testing aspects - quite contrary. This seemed to turn the "exploring to learn randomly while waiting" into a little more purposeful activity, where learning was still taking place. 

In a few weeks, I will know if the original aspiration of starting together actively to finish together will see a positive indication all the way to the end. But it looks good enough to justify sharing what I tried. 

Monday, November 23, 2020

Stop paying users, start paying testers

At work when I find a bug, I'm lucky. I follow through during my work hours, help with fixing it, address side effects, all the works. That's work.

When I'm off work, I use software. I guess it is hard not to these days. And a lot of the software recently makes my life miserable in the sense that I'm already busy doing interesting things in life, and yet it has the audacity of blocking me from my good intentions of minding my business.

Last Friday, I was enjoying the afterglow of of winning a major award, bringing people to my profile and my book, only to learn that my books had vanished from LeanPub. One the very day I was more likely to reach new audiences, LeanPub had taken them down! 

After a full excruciating day of thinking what was it that I did wrong to have my account suspended for authorship, LeanPub in Canada woke up to tell me that I had, unfortunately, run into a "rare bug". Next day I had my books back, and a more detailed explanation of the conditions of the bug. 

If I felt like wasting more of my time, I guess I could go about trying to make a case for financial losses.

  • It took time of my busy day to figure out how to report the bug (not making it easy...) 
  • It caused significant emotional distress with the history of one book taken down in a dispute the claimant was not willing to take to court
  • It most likely resulted in lost sales
But overall, I call my losses, and turn it into a sharable story. Issues with software are something to pay attention to, and I can buy into the idea of this being rare in the sense that it was probably hard to catch and they were quick to react on it. 

Next day, I decide its time to buy a new domain name, and I go to Hover, where all my services are hosted in a nice bundle. I own a dozen domains. That is how I start new projects. Sometimes, the annual process of deciding if I want to still pay for a domain is my best way to manage hobby projects. 

I try to log in. And I fail with a 404 error. Like a good user, I refresh, and continue my day without reporting. I don't want to spend time on this, I just want to get things done without all these hiccups. 

These two issues are functional bugs, meaning bugs that inconvenience the user, one more significantly than the other. They have very little impact on the company, and the better companies are like LeanPub, making the bug go away fast when it is actually blocking. 

There is a third functional bug that I stumbled upon, in the very same way as all these other functional bugs. But this functional bug is what they call "exploitable". That basically means that it can hurt the company, not just the user. Sometimes exploitable bugs are about money, like this one was. Sometimes exploitable bugs are about secrets, like Vastaamo problem was. Exploitable bugs are of special interest to folks in security, and they are bugs with a particular priority to folks in software testing. 

Just like with the two other bugs, I was minding my own business, not testing in anyone's production. I wanted to buy food delivered, and I used the Foodora app. 

First time I saw this bug, the symptoms were subtle to me. I received two confirmation emails. I received a phone call from the pizzeria I ordered from asking if I ordered two meals. I had a pleasant conversation with the pizzeria guy about this happening often that orders were duplicated, as pleasant as a conversation I don't want to be in can be. And I received one meal, just what I ordered. Because while I was a user, I'm also a tester and a particularly observant one, I knew what I had done to trigger the condition. I was not planning on doing it again. So I tweeted a wide characterization with no repro steps to vent my frustration on software getting in my way. 

Ordering takeaway food as much as I do, it was only a matter of time before it happened again. This time I had my whole extended family over, and it was a fairly large order. Yet this different restaurant apparently did not know the issue, and delivered two full sets of food at my doorstep. With yet another inconvenient discussion with someone I did not want to meet at my doorstep, the driver took the other set of food away with him. I was inconvenienced, more than the previous time. I tweeted about Foodora needing to fix their bug, and searched their web page to contact with no luck. So I let it be. Someone asked me for steps to repro, alerting me to the idea that I might have to waste my time on this eventually. 

Third time was the charm. I still don't want to reproduce this bug, and despite my best efforts, it happened AGAIN. Already upset at the idea of having to have the conversation of what I ordered and didn't, I just took the food they delivered and complained they delivered me two sets of my order billing only one. With the proof of order numbers and payments, I assumed they could track it and get it fixed. Their loss, they should care. I mentioned that if they did see my feedback and needed help, I would be happy to tell tell them exactly what causes it. I didn't feel like writing long step-wise instructions into the void. I spent more of my energy in establishing credibility in knowing that this is a bug than on anything else. Because, that is what testers find they need to do when they report. Dismissing feedback is regular. 

Weeks passed, they contacted me. I wrote them the steps. Weeks passed, they contacted me. They said thank you, that they had the issue addressed with their tech folks, and gave me, the user, 10 euros. 

I write a shorter version of the story on my LinkedIn wall, only to learn how the cyber community (some outskirts of it, at least), misrepresents what I say:
  • Someone claimed I was "testing in production" because through my profession, I couldn't be a user. 
  • Someone claimed I was "testing without consent" because I wasn't part of a bug bounty in finding this
  • Someone claimed that I was breaking the law using the software with a vulnerability hitting the bug
  • Someone claimed I was blackmailing Foodora on the bug I had already reported, for free, expressing to them I was not doing this for money in our communication 
  • Someone claimed I was criminally getting financial benefit of the bug
  • Someone claimed the company could sue me, their user, for libel in telling they had a bug
  • Someone claimed I was upset they did not pay me more, not on the fact they didn't pay a competent tester in the first place (I know how to get to the bug, I would have found it working for them - exploratory testers couldn't avoid it, automation only strategy or test cases could)
  • Someone claimed I was eating on the company's expense, when I was reporting on 200 euros of losses (for food they had thrown away) 
You may already sense that I think these claims are poppycock. But these types of claims are very typical for anyone with a larger follower base - I'm great platform for both security marketing effort but apparently also restaurants telling me they still want me as their customer. 

Today I listened to Turvakäräjät podcast, only to learn I made appearance as a "reputable tester" on this security podcast in Finnish. They really got their alternative facts straight: eating on others' expense (not that I had to pay for every order I made, there was no free meal ever, only duplicate to go to waste) and being upset over 10 euros bounty. 

Let's reiterate this one more time. 

There is no money in the world that I would be willing to sell my free time on to test their software with a zero sum contract that only pays when I find problems. I think bug bounty programs and crowdsourcing are unethical, and would not take part in results-only payment models. I struggle with ethics of companies like Foodora that don't treat their drivers as employees, but I use them because I have nothing better in times I don't want to leave home for food. My ethics are important to me. 

There is no money in the world that would compensate stealing my time on bugs, exploitable or not. The bugs are not the users responsibility, even if sometimes users hands are tied, and waiting or leaving are not real options.

There is no ethical obligation for me to report even exploitable bugs as a user. The ethical obligation I take on me is not to share a vulnerability further. Knowing impact does not mean knowing the mechanism. And I write about the impact vaguely to protect the buggy party. I delay big visible posts to times when they won't have extra harm on my account. 

I say don't pay the user, paying users 10 euros is an insult to the users. Hire a professional, and note that the professionals don't work for peanuts. 

Asking Your Users Perception

A colleague was working on a new type of application, one that did not exist before. The team scratched together a pretty but quick prototype, a true MVP (minimum viable product) and started testing that on real users. 

Every user they gave the app to, gave the same feedback on a detail they did not like after first experience of use. 

Fixing the thing would take a while, but since the feedback was so unanimous, the team got to work. A week later, they delivered a new version. 

Every user that had the app before, hated that there was a change, they had grown to like the way it was. 

Every user they added for first time experience, hated how it was different from the usual way things are done. For the first day, until they learn.

I shared this story to my team at work this morning, as my team was wondering if we should immediately put back the "in review" state to our work board in Jira. The developers were so used to calling "in review" their done, but also refusing to call it their done and moving it to the done column. They used to leave tickets to this in review column, and no one other than themselves was finding any value on it. 

We agreed to give the change two weeks before going back to reflect on it with the idea that we might want to put it back. 

Users of things will feel differently when they are still adjusting to change, and when they have adjusted to change. Your needs of designing things may be to ease getting started, or ease the continued use. Designing is complicated. Pay attention.