Tuesday, September 24, 2024

Learning programming through osmosis

This article for written in 2016 for Voxxed that is no longer online. Back then I did not know POSSE and thus this piece has not been online for a while. 

I identify mostly as a non-programmer. Yet, two weeks into a new job I’m already learning and contributing to Python and C++ -code. The method that enables me to do this is ensemble programming, the idea of having a group of people working together on one computer on a task, taking turns on who types for the team while others instruct. For an idea to get from one’s head to get to the computer, it flows through someone else’s hands. 


This article shares key insights from my journey over a little over a year on learning programming through osmosis, just being around programmers working on code, without intention of learning. As a result of learning, I rewrote my history with things I had forgotten and dismissed from my past. I hope it serves as an inspiration for programmers to invite non-programmers to learn to code one layer at a time, immersed in the experience of creating software together to transform the ability to deliver. Lessons specific to skillsets get transferred both ways, and while I learn from others, they learn from me, leaving everyone better off after the experience. 


Finding Ensemble Programming


Many different roles contribute to building software: product owners, business specialists, and testers, yet, knowledge of programming keeps these roles at a distance. I did not come to programming through wanting to program or taking courses on it but through working with programmers in a style called ensemble programming. 


As a tester within my team of nine developers, it was clear I was different. I wasn’t particularly keen on learning programming since there was more than plenty of work in the feedback through empirical evidence and exploration that is my specialty I’ve developed in depth over two  decades. I’m an excellent exploratory tester and my team’s developers have always been my friends with a pickup truck that I can call in for assistance on anything where code needs to be created. Besides being the only non-programmer, I was also the only woman and part of a team, where some people would occasionally spout out things like “Women only write comments in code.” Not exactly an inviting starting position. 


Although I did not like programming, my hobbies that started at the age of twelve and my computer science studies, that further killed my interest in programming, I  had acquired experience in coding twelve different languages. I started making small changes in how I looked at programming in a different light for my daughter’s sake, as I did not want to transfer my dislike of code to a 7-year old about to be embedded in an elementary learning environment where programming is everywhere as programming is a mandatory part of Finnish curriculum now. 


The real change, however,  started with Woody Zuill’s talk in a conference I organized. Woody is the discoverer of ensemble (mob) programming. The idea of the whole team working on a single task, all together on one computer just sounded ridiculous, yet as ridiculous as it seemed, I thought it could be a way for my team to learn from one another as well as create team building. Instead of taking someone else’s word on methods, I have a preference on experiencing them first hand. And it wasn’t like we had to commit for a lifetime, just to try it out once or twice.


The First Experience Expands


With some discussions, my team agreed to try it out, but I knew I would be out of my comfort zone since I would have to be in front of a computer working on code. Our first task was to refactor some of our code with Extract Method and Rename automatic refactorings and we had an experienced ensemble facilitator lead the session for us. While not on the keyboard, I found myself able to comment on the names from the domain, and while on the keyboard, I noticed with each round that I was picking up things: keyboard shortcuts, ways to navigate, programming concepts without anyone really explaining them to me when the work was being done. In the retrospective, I could reflect on my learning and realized that not only was I picking up things I did not know before, everyone else was doing that too. 


I felt safe in a group, as I did not need to be fully paying attention to every detail at any time, and I was always supported by a group. Surprisingly, the expected negative remarks on gender did not come out in a group, whereas they would be a regular thing in a more private pairing setting. 


From that first experience, my team extended this to a weekly learning activity. I took the mechanism of learning for myself further, organizing various ensemble programming sessions with the programming community on different programming techniques and languages, learning e.g TDD and working with legacy code in a hands-on manner. I introduced my team to ensembling on my work, exploratory testing and they learned to better identify problems. In our ensemble programming sessions, there were several occasions where my existence in the room fixed an expensive mistake about to happen from half a sentence of discussion. Finding a problem like this early on led to more efficient and productive work for everyone. Although it seems inefficient to have so many people working on one thing at the same time, the saved time in avoiding context switching, passing feedback back and forth, increased focus on steps to complete together with great quality,  as well as learning made us develop much faster and with less future problems.  


Joining An All Female Hackathon


I took the idea of ensemble programming to a weekend hackathon outside work and convinced my fellow teammates to try it out, but only three people decided to be involved out of four.I avoided setting the expectations of me being a non-programmer and just joined in with whatever programming skills I had, without disclaimers. There was even a woman participating with less coding experience with, as she had never even looked at code before. 


Out of that weekend, I came out with four major realizations:

  • The best programmer outside the ensemble only contributed graphics. In the ensemble, we were adding one feature at a time and committing regularly, and the senior programmer found it hard not to have modules of her own to work.  There was no long-term plan for incrementally developed software and the version kept changing under her. We tried summarizing the lessons on the used technology for her, but she kept hitting problems that blocked her. 

  • I passed off as a programmer. No one noticed I was not a programmer. And the reason was that I had become one. I realized that programming is like writing. Getting started is easy, and it takes a lifetime to get good at. 

  • The non-programmer felt like an equal contributor. Her experience was that the code created was just as much hers as any of the others and that is a powerful experience. She learned the basics with us through typing for us, and reflecting with us. 

  • We had working software. Not all groups had the same luxury. In the ensemble, we had the discipline to have not just code, but working code to a scope that could vary depending on how much time we had to add more functionality. 


My Main Lessons


Cognitive dissonance is a powerful tool


The experiences of working with a ensemble for over six months transformed how I perceived myself. No amount of convincing and rational arguments on how much fun programming is could have done that. When my actions and beliefs are not in sync, my beliefs change. And that is what ensemble programming did to me. It made me a programmer, through osmosis, and got me started on a long journey of always getting better at it. 


Non-programmers have a lot to contribute


I saw that while I was learning a lot, I was also contributing. As a tester, I had information about intents of the users that seemed mysterious to my programmer colleagues. We would test better while programming, just because I was there. We would avoid mistakes that were about to happen, just because I was there. I could give feedback without egos in play, and we could all learn skills from one another. And even me being slow was a positive thing - it made the other programmers more deliberate and thoughtful in their actions, and they shared the realization that they created better code while slower. I ended up feeling really proud of how much better my developers learned to test with our shared ensembling time. 


Team got out a lot


I wasn’t the only one who learned - everyone in the team picked up different things. It was a pleasure to see how abilities to add unit or selenium tests expanded from individual to a team skillset, and how many times we found better libraries because just one of us was aware of it. 


We slowly moved from working on technical debt and cleaning up to a shared standard to having technical assets in the form of libraries that would enable us to do things faster. 


Everyone got their voices into the code better. We worked with the rule that if we had several ideas of how a problem could be approached, we would do both over arguing while we had the least practical information about how it would turn out. And it was surprising to notice that something that someone would fight to the bitter end with, was good enough to accept after the implementation was available, and not just because people would lower their standards. 


We also learned that when one of us did not feel like contributing in a ensemble format at first, it was a good idea to let one opt-out. The party-like nature of the sessions and the evidence of the rest of us bonding and learning inevitably drew these non-participators back in on their own initiative later on.   


Ensemble Programming as a Practical tool of Diversity


Ensemble programming is a great way of introducing new people to programming, or testing for that matter. It transfers a lot of the tacit knowledge otherwise difficult to share. It brings the best of us to the work we do, as opposed to the most of each individual. While working together, we can remove a lot of the rework with fast and timely feedback. We raise our collective competence, allowing individuals to use specialized skills. We used a rule “learning or contributing” to give a great guideline in thinking of when a ensemble is doing what it is supposed to. 


As software is such a big part of our society’s present and future, we need all hands on deck in creating it. We need to find ways of bridging roles without telling others that everyone just needs to be a programmer. In an ensemble format, I learned that while I picked up my hidden interest in programming, I would have been a valuable contributor even without it. There was a struggle for both me to go do things I thought I wouldn’t enjoy and the team to work in a setting they were not used to. It was worth the struggle to remove the distance I previously felt between myself and the programmers. 


Just adding more women and people of color to the field of software development isn’t enough if the people struggle to get their voices included.  We need to do more than make the world of coding look diverse. With ensemble programming we can use that diversity to innovate the world of coding overall. (Props on this thought to Kelly Furness, who was in the audience with my DevOxxUK talk) 


It’s not just learning programming by osmosis, but the learning is mutual. Give it a chance.


About the author


Maaret Pyhäjärvi is a software professional with testing emphasis. She identifies as an empirical technologist, a tester and a programmer, a catalyst for improvement and a speaker. Her day job is working with a software product development team as a hands-on testing specialist. On the side, she teaches exploratory testing and makes a point of adding new, relevant feedback for test-automation heavy projects through skilled exploratory testing. In addition to being a tester and a teacher, she is a serial volunteer for different non-profits driving forward the state of software development. She blogs regularly at http://visible-quality.blogspot.fi and is the author of Ensemble Programming Guidebook.



Wednesday, September 11, 2024

Do Thee TDD?

Sampling many customer organizations, I can't help but to note a customer theme we aren't answering well. The question is if we are doing test-driven development.

A lot of us know what it is. We usually have learned to recognize it as possibly two different patterns: 

  1. TDD while programming. Super-small loops (inside out,  'Chicago school'). Or small loops with mocks at play (outside in, 'London school'). 
  2. ATDD (BDD, SBE - lots of names for similar idea) where examples characterize the feature before adding it. 
For a lot of the customers through, I realize these two are more intertwined. And the conversation very often gets derailed to defining if the test *really happened before*, and how often did it make sense for each of the developers to write the test first ('isolating a bug is great test first') or write it as part of the few hours-few days feature they are on ('easier to capture intent in the same pull request when I first figured out how to get it done'). In a scale the customer looks at, you can't really tell if it was before or after. In scale of the developer learning techniques to better control and describe the intent and not miss relevant bits with short-loop-after, learning the test-driven development techniques, both Chicago and London styles to mix them up probably does a whole world of good. 

The customers concern is not always whether the test came first. But it is if it came before (ATDD style) and if it came with the change itself (included in PR). 

I find myself characterizing the answers to this team with slightly more granularity: 
  • Level -1. Test after with tester tests and bug reports. This happens a lot too. The 'nightly run' where analyzing the failures takes a week. We've all been there. Lets hope for a generation of developers who will look puzzled at that statement. 
  • Level 0. No Sign of TDD. When code is merged with pull request, significant effort of testing follows in subsequent pull requests. There could be test changes with the original pull request, but their intent tends to be to get old tests to pass. 
  • Level 1. Short-Loop-After. When code is merged, so are tests. Same pull request. Thus in same repo, going into the pipeline. Little care if it was a mix of before and after writing the implementation because the loop is short enough. This more driven and continuous than we ever used to have and we should celebrate. 
  • Level 1b. Disciplined TDD. When code is merged, so are tests. Mixing outside in and inside out, with and without mocks, but the developers consistently write tests first. 
  • Level 2. Acceptance criteria with examples. Examples from customers, illustrating core things that are different after the change, and introduction of the new behavior.  Just having the examples around help developers with a clearer definition of done, and less looping back to new information to learn. Things aren't obvious to everyone in the same way. 
  • Level 3. BDD automation before implementation. Examples passing one by one drive the idea of are we done with the change. 

The three first teams I think of are on levels -1, 0 and 1. They all aspire to level 2. 

Smaller steps may make it more manageable as a change. Where are you, and where are you heading? 

Monday, September 9, 2024

Learning to test in Dynamics365 projects

How do you become an expert in something you did not know yet? By learning about it. You have a foundation of knowledge you probably acquired on other products, and if the foundation is large enough, you will be bound to see similarities. This is how I feel about being thrown at my first Dynamics 365 project. 

Learning in public - explaining how my learning evolves and how my thinking evolves - gives me a chance of learning from people who know things I did not. And it provides the odd chance that my learning is something of use to someone else. 

What's this about?

Dynamics 365 is one of the (many) platforms. You may have, like me, experienced SAP. Or Salesforce. Or Guidewire. Or Odoo. And you can continue the listing. What these essentially are things enabling reuse. I personally like to call them platform products. There is a lot of common functionality for all their users. Yet there is even more own data, configurations, integrations and changes so that the resulting system looks different, is used different, and most definitely holds up information and processes of highly different organizations. They are the epitome of modern reuse. If you could buy a product and use the product everyone else uses too, maybe you did not have to build your very own system. Meanwhile, tailoring enough means that the theory of reuse meets practice in this thing we lovingly call testing, where the rubber meets the road and good plans go to meet empirical evidence to ensure our business still runs with all the plans in place. 


This particular one is a product platform in cloud done by Microsoft. Organization count 1. It is usually configured, integrated and extended by integration partner. Organization count 2. In integrations, there may be a load of other systems as data sources and data targets. Organization count 2+N. In the start of the chain is the organization that assigned responsibilities for all the other organizations, the owner of the system/service, the customer with their users. Organization count 3+N, and responsibilities of ownership

Your usual testing vocabulary isn't helping me

Calling some of this testing acceptance testing isn't really helping me. And particularly, calling some of this unit testing isn't helping me, almost the opposite. Surely if we configure functionality (or decide to not configure it, and working with defaults), it makes sense to verify that the behavior I get is the behavior I want. Most often that testing through needs to happen with at least a partial integrated system, and it may really well be just partial. This drives the design I would need testing vocabulary to reflect towards testing components/services, integrations, and flows across components, services and integrations. Instead of shifting left, here I need shifting down. I need to understand the smaller scope I can verify a functionality in. And if I succeed in that, the feedback granularity for the organization that is expected to react to the feedback is better. 

Theoretically speaking, it would be great if these platform products shipped with tests for the defaults. They rarely do. If they did, I could test with defaults, adapt and extend those tests to test with my configurations, and build a systematic feedback that tells the chain of responsibilities. 

However, I usually end up in these projects from the ownership organization perspective. For me to know if our business flows work, I approach this with the idea of testing core business flows with the application, targeting it with the knowledge of changes. It tends to be better if the chain works, and a chaos ensues if the system is significantly broken. 

That Test Automation thing?

This comes along quite naturally. You have rolling updates (where you may not be able to delay the update at all), and you have quarterly updates (where staying without updating is not possible as an approach, for good reasons). But this means you have pretty much continuous responsibility for testing in the organization of ownership. 

Some people rely on staying close to defaults, and approach this with taking the risk that if the product platform does not work with defaults, it gets rolled back and fixed by the product platform organization. The closer to defaults, the more likely you are to be able to play with timings so that the first wave of installers got whatever was on your way. There's risk, but the risk may be manageable close to defaults. 

Yet usually we are not close to defaults. The further away from defaults we shift, the more there is functionality the product platform organization is unaware of, unable to test for, and thus responsibility for it surviving change is allocated later in the chain. 

You would usually invest in test automation for this. It could be component level, for things where you go furthest from the defaults. It could be process level, to catch things on the basic flows. Or it could be an intricate web of both of these. Plus the test automation that tells you when to point blame towards the product platform. 

In the whole chain, assigning the responsibilities to strategically design the necessary automation is on the organization of ownership. This is where the low code tools find their most lucrative points of entry. 

However, the "no code" approaches are just a visual programming language. If it diffs poorly, it is poorly maintainable. It's a balance, and a belief system. I don't think acceptance testers recording automation tests is the way to go. Shifting down for designing per component / service feedback is the way to go. Visibility of these tests is the way to go. 

Technologies, architectures - it all maps to common web / cloud

Scratching this just a little deeper, I come to realize I have very basic web / cloud things in scale. 

Web pages can be automated with Selenium, Playwright - well, any of the web driver libraries and related testing frameworks. The "scary" parts shadow DOMs, dynamic id's and deeply nested components could perhaps use help of a tool that hides some of that locator complexity. But if it's complex enough, hiding it also means taking away power to maintain it. 

REST APIs can be automated with any of the language specific libraries. 

Why did I want a commercial tool I would have to learn? Or why would I choose to teach that commercial tool to my fellow testers over teaching them the basics of programming for test automation purposes that I know even business testers are capable of learning? 

Let's say the jury is most out in this space. I'll write more when that makes sense to me.

The First Experience - A Users Experience

My first touches to these projects come from having used systems with this - without really realizing I had. Connecting the realization to use examples, I also have examples of missing functionalities on Safari, being forced to use incognito mode and cleaning caches to be able to get some of these tools to work. 

The real question is that since the users experience has directed me to not use Safari, would we care to use all browsers? And what drives the browser differences - I'll learn. 

The Lingo

Finally, in addition to testing vocabulary, there is the product lingo. D365FO, D365CE, feature names, change listings, scope of each project. I find myself classifying: product platform vs. configuration to make sense of it. 

Turns out there is D365RF - the common test automation keywords for robot framework. 


Is this how you take on new testing assignments too?

With a baseline thinking written down, I'll let you know how much more I know in a few weeks. 









Monday, September 2, 2024

Who Are You Paying to Learn AI with You?

Two years ago, Heini Ahven published a research paper (thesis) on AI in testing, concluding from her interviews that there are two particular hurdles for AI in testing: Data and Customer to pay for it. While in two years we have shifted away from needing data and primarily discussing use of generative models now,  the essential challenge of Customer to pay for it remains. Customers are making choices of who they bet on to pay to learn AI with them. 

It would seem to me that these things are good reasons to bet on us:

  • We have a track record of having created customer specific systems with AI in them
  • We have a track record of having new products with AI in them, most notably in modernization of legacy systems, but generally too many to list
  • We publicly say (and can back it up) that we have already invested 1M into AI in the last year, and built quite a platform of knowledge with it
  • We know software development, and we know testing. And we know these in scale. 
That's the high level. Yet I think that reframing the question from do we have solutions to who are you paying to learn AI with you is the way to go. And personally I think that you would do well learning AI with me and my crowd. 
  • We have looked at large numbers of testing tools with AI in them, and can help you sort out positioning of those tools
  • We have used tools with AI in creating test artifacts, and can help you sort out sociotechnical guardrails of use of these tools so that you can steer your learning
  • We're happy to pick up a tool you want to learn with even if we haven't yet, and amplify your learning with success in mind 
There is a lot going on in the scale I get to pull from. I chose 6 activities, 4 values that are core to the approach I work with. 

We need to know where we are to make sense of where we are heading. We were expecting improvement and agreeing how improvement can be recognized is key.

We experimented already, and we scale to experiment with more customers. Any solution in this space has learning at heart, and keeping learning at heart steers to benefits. 

We collected sociotechnical guardrails for different kinds of applications of AI in testing. A lot of what we have been learning we can feed into new organizations, and improve with ongoing learning that benefits us all. 

We rely on building new habits that are good habits, to instill and sustain a culture of learning. This usually means we need to work with people in working through the change rather than input material. 

What we learn, we teach on. Sharing is a way of continuously seeking improvement. 

Some of this we package and make available in scale that helps anyone. These are new tools and services emerging. We recognize attribution is also IP, and we recognize scale will have a mix of different kinds of IP. 

In these six activities, we value four things: 
  • Our approach is human-centric and we are learning best ways to have people in the loop
  • We seek enhancing to better
  • Our expectation is incremental, with controlled investments that can expect results
  • By focusing on many customers while carefully hearing each customers specific challenges, we seek to make helpful impact in testing field in scale
This said, the question remains: who are you paying to learn AI with you, and could be us? 

The author is Director, Consulting Expert at CGI Finland, focusing on AI-driven application testing. She usually writes about her work that isn't specific to CGI and felt like making an exception today. She is seeking primarily Finnish customers to join in increasing use of AI in testing, and believes open calls for collaboration are preferable to approach when seeking early adopters. Expectation for her new position at CGI is that she meets customers 130 times per year, and you scheduling a short conversation on how she could help would be mutually beneficial while unusual approach. She can be reached at maaret.pyhajarvi@cgi.com.  

Do testers need to be devops engineers too?

At a point of my testing career, I specialized in understanding test environments. It started off with seeing connections between subsystems, and recognizing compatible data. Well, there was no other choice to test effectively in insurance industry, where a new IBM mainframe test environment (I got one of those too!) took 1 million and 1 year. I can't remember if it was time when we still had Finnish mark as unit of money, or if it was already euro time, I just remember the overwhelming sense of responsibility for enabling  project with a million spending. That time of my career added awareness to any environment I would test in since, and categorizing workstations and servers and versions became a routine. 

When cloud later landed on my world, the categorization and foundation of test environments was particularly useful. Recognizing locations for storage, compute and specialized services and setting up connections and dependencies between all these geographically distributed and provisioned as needed with controls we have and controls that we recognize but don't have helped a lot in figuring out what it was that I was testing. 

It was easy to grasp that the new project that I just started with will have better working test environment early in the week, and we could expect troubles as the week advances. I could see what symptoms are likely to be about having provisioned a smaller test environment, what are likely to be results of data and memory, and I can design the way I test particular things around that weekly and daily cadence that I recognize going on for the test environment. 

I was thinking about his today, as I saw someone asking if testers need to understand CI/CD, and what of it, and what specific skills are we supposed to have in that space? 

Many of my colleagues extending in test automation space go CI/CD pipelines route after they realize that programmatic tests that run on their machine manually won't be much in the way of automation. There is a significantly higher value if tests are run right after a change that could break things is done and that requires designing the tests into a CI/CD pipeline. Many of those colleagues find barely a day a week for doing testing, when after running the tests nightly turns into optimized sets in the pipelines, and environments turn into dockerized orchestrated platforms where nothing changes in the infra without changing lines of code (Infrastructure as Code). 

I still work with testers who understand environments on the level I used to - recognizing that there is a different address for two different yet same test environments, with heuristics on what to pay attention to each. They, like me before I integrated a lot of this CI/CD pipelines stuff into my thinking, use environments with specific timing patterns to control the version they are experiencing in testing. They may design environments on the level of not allowing change, because installing means often unavailable or out of control we understand. These testers need to understand CI/CD as mechanism of publishing and scheduling, but go no further. 

Increasingly, I work with testers who design and enhance pipelines. While they don't need to do all the changes themselves, they need to read pipelines to see what goes on. Red has a reason, and drives their days of working to see where red is coming from. Majority of people in this group configure new jobs within the same realm of examples, and don't really take things further. Only some bring in new tools. But the new tools part, that is something people seem to love doing.

Then there are people who live in pipelines. Tests are placeholders and boxes, but they rarely have time to go and think about their coverage themselves. Working for the pipelines is the work. Making them run on better infra. Adding new tools. Upgrading the existing tools. Building all the machinery that could support the teams. These people, even with testing background, tend to call themselves devops engineers, to emphasize their attention to infrastructure and pipelines. 

When hiring for a tester, you may expect any of these levels of knowledge. A lot of people search for the middle ground. More and more, we expect people to come with the knowledge of what control pipelines give to your test environments and options of testing. 

And more and more, finding a balance where people know enough yet still manage to test not only build pipelines is what we seek.


Saturday, August 31, 2024

Ethical Stress of the Consulting Work

For three months now, I have been adjusting into a new identity. With my new job, I am now a consultant, a contractor and a service provider. I work for managing a product of testing services, and provide some of those services. 

It's not my first time on this side of the table, but it's my first time on this side of the table knowing what I know now. 20 years ago when I was a senior consultant, I was far from senior. I was senior in a particular style of testing, driven to teach that style forward and learn as much as I could. And while I got to work with 30 something customers opening up new business of testing services back then, I was blessed with externally provided focus and bliss of ignorance.

I had a few reasons to stop being a consultant back then: 

  • Public speaking. I wanted to speak in public, and as a consultant your secondary agenda of sales was getting in the way. Not really for me, but in eyes of others. I got tired of explaining that I would not be able to go to my organization for sponsorship just so that I could speak, and that I was uncomfortable building that link when contents should drive the stage. I knew being in customer organizations for exactly the same work would change the story. And it did. And that mattered to me. 
  • Power structures.With customer and contractors, there is a distribution of power. When a major contractor in a multi-customer environment kicked out of steering group the testing representatives of two out of three organization citing "competitive secrets" and I was the only one allowed in room to block the play that was about to unfold, I learned a lesson: being in the customer organization was lending me power others lacked. Back then I had no counteracts, and I do now, having been in boardrooms as both a testing expert advising board members, and member of those boards. 
Thus 20 years later, I knew what I was doing when I joined consulting to solve the problem of testing competences in testing services in scale. I knew consultancies are the place where this can be solved in scale. I knew the numbers are not on individual customers side, and scale means I need to serve many customers. I knew I needed to level up the testing services to become the testing services I had so much difficulty purchasing when I was on the customer side. 

What I did not know is that the job of consulting would teach me about ethical stress. Because when you serve many yet invoice some, your life is a daily balancing with your sense of fairness. And you will feel the push of just working for one customer so that the overhead of context switching would not be on any of them. The teaching of ethics this particular organization offers adds to the stress. Working by the hour is an extra mental load. 

It's not that I can't deal with it. It's just that it is so much more than it was before that it sticks out, and I need to label it: 

Ethical stress is the continuous sense of having to balance the different perspectives. 

If it takes me an hour to report hours, who should pay for that hour? 
If I get interrupted with another customer while working for a different one, who pays for the reorientation time?
If I have to create a plan and actually have to follow that plan even though I know better, do I look worse even though I am better? 

Having recognized this, I now use it to discuss no estimates in agile teams. Because ethical stress is the big thing estimating and following detailed hours brings to people in the team. It is an invisible motivation killer, impacting how we do the tasks to make them trackable rather than flow best way we know how. 

Ethical stress costs us energy. And sometimes we are so focused on teaching the ethical part of things that we forget the stress part of the same. 


Thursday, August 22, 2024

Prepare to shift down

While the world of testing conferences is discussing shift left - a really important movement in how we make the efforts count instead of creating failure demand, we are noticing another shift: shift down. 

For years, we have been discussing the idea that sustainable success in programmatic testing space requires you to decompose testing differently. Great automation for testing purposes is rarely built from automating what you considered end to end flows a human experiences. Great automation optimizes for speed and accuracy of feedback, granularity in both time - immediately available for the developer who made an impacting change - and location. No speculation on the source of the problem saves up a whole lot of effort. 

As I talked about these two shifts today as part of a talk in applying AI of today in testing as a foundational concept in terms of how I believe we should first shift left and down and a personal suspicion that we might like what AI is doing for us after the shifts, am also realizing positioning organizations at these two locations helps make sense of the kinds of tools and workflow ideas they are automating from. 


What is this shift down that we talk about? Shift left is a common place term and I prefer not having left or right but single commit delivery making things continuous. One can dream. But shift down, that is not as commonly discussed.


Shift down is this idea that test-driven development is great, yet limited by the intent of the individual developer. A lot of developers are good and getting better daily at expressing and capturing that intent, and having that intent is hugely beneficial in routinely accepting / rejecting generated code from modern tools and choosing to stay on controls. From a sample set of projects most certainly not built with TDD and with unknown level of unit testing, we have seen a report recounting that 77% of bugs that escaped after all our efforts to production could be in hindsight reproduced with a unit test, meaning there is a lot more potential for doing good work of testing on the unit testing level. I like to play with the word exploratory unit testing, which is kind of a way of stretching the intent of today with learning in the context of the code, to figure out some of this 77%.

For a few crafters I have had the pleasure to work with, Test-Driven Development and Exploratory Unit Testing could be interchangeable. For others, the latter encourages us to take time to figure out the gap, especially with regards to legacy code where those who came before us left us less than complete set of tests to capture their intent. 

Shifting down pushes conversations to unit tests; component tests; subsystem tests; and guides us to design for decomposed feedback. We've been on that shift as long as the other one. 

Tuesday, August 20, 2024

How to make testing fun and enjoyable?

I have believed and experienced that testing is fun and enjoyable for 27 years. I have had that experience enough to talk about my primary heuristic from stage:

Never be bored.


 This confuses people, especially when their idea of testing is repetition growing over time. 

You keep replenishing the test results. Sort of same results. Except that while the tests may be the same, you don't have to be. You can vary things, and return to common baseline when the variation takes you to surprising information. Every change, every changer in the moment is different. And it's like a puzzle figuring out how to create a spider web of programmatic tests that tells you enough while not all, and yet look at each change with the curiosity of 'what might go wrong here'. 

If I feel bored, I introduce variation. I change the user I log in with. I change the colleague I pair with. I change the order in which I test in. I write test automation that does not fit our existing patterns of how we automate. I write detailed public blog posts while I test unlike normally. I experiment with separating programmatic tests that always run into suites where I run each suite every second day to save up replenishment resources. Well, the list of variations is kind of endless. 

I love testing. And I have been testing today for a new system under test (for me). 

In order to be able to test the way I love testing, I have to be able to have a solid foundation of programmatic tests that we grow gradually as output of our testing, capturing the pieces of learning worth keeping around. Today, I want to recognize a few things that I need to keep testing fun and enjoyable. 

  1. Agency. You don't give me a test case to automate. You give me a feature to test, and out of that I will automate some test cases. But thinking you plan and I execute takes the fun out of my testing. Even the more junior folks do better starting with WHY not HOW. 
  2. Smart constraints. You don't tell me that programmatic tests need to mimic written step by step test cases. That makes me use my time in updating two documentation sets for same purpose, and doing busywork is not fun. 
  3. Test environment. You don't deny me access to exploring of an old version while I design and collect ideas for how we should test changes for the new version. External imagination - the product without the change - makes the task more productive, and it's fun to do good work. There needs to be enough of these to go around for us all, every day. 
Notice how my fun and enjoyment isn't asking for documentation or answers to all things about the product. Not having those around is sometimes different kind of fun, even if I prefer us starting with a better agreement, you can be sure I will discover things outside it. It also does not include great people and friendly developers, because today I choose to believe that people are good and want to do good. Us discovering exactly how many jokes going around that requires is part of the variation. 

A colleague inspired this post by wishing that we had common templates and a unified front on what test documentation looks like. Figuring out how I could ever do that, when I do decent plans and strategies but not to a template should be fun. While it's fun and enjoyable, it is less impactful for the good results I would want out of my testing. Plans are more often ways for me to think the big picture than the most relevant deliverable. 

That's my shortlist (today), what's yours? 

 


Saturday, August 17, 2024

Believing things can be different

27 years testing, and I still think problems with software are fascinating. Running into a problem does not mean it wasn't tested at all, it means I run into a flow that wasn't covered or a problem considered not relevant to fix under whatever constraints the organization has. Sometimes the conditions are fairly general throughout the user base, like with Crowdstrike, and impact all users. Sometimes the conditions are fairly specific, and impact some users. 

There's a story I have told a lot, an illustration of serendipity and the role it plays in testing. Joining a new organization as a tester on my first day, I got access to the system under test, credentials to log in, and logged in barely before I was asked to join whatever inception program the company had in mind. Being pressed with time and not relying on remembering, I bookmarked the address. Not the main page though, I had already clicked onwards a bit, not even paying attention. When I got back to testing, I used the link to find the application and experienced a big visible error screen. Further investigation showed that I was lucky enough to bookmark a single page inside the application with a different implementation pattern resulting in this error. Quite a way to make a tester entrance. 

For some years after that, I was thinking that core of testing seem to be serendipity and perseverance. Working with a team where developers tested, had tested for 15 years before the first tester ever joined the team, claiming there was no testing done would have been misrepresenting the efforts. But something with the way I tested was different. I got lucky with running into bugs. A lot. Like a more than you can imagine. And this sentiment is something a lot of testers relate to. And I gave the program chances to show how I was lucky, with systematically working through flows and time, and odds, fast forwarding different users year in production time and time again. 

I pulled up two quotes to summarize the insight I had gained: 

"The more I practice, the luckier I get." - Arnold Palmer (golfer)

"It's not that I am so lucky, I just stick with the problems longer." - Albert Einstein (scientist)

This kind of luck favors my kind, testers, or at least we frame it as a positive event of luck, to serve as means to remove such luck from those who would not welcome it. But we aren't always working. Work is when you get paid. Most of your time, you are just you, and a user amongst all the others. You'd love for the systems you use to have been built and fixed to work, but things are complicated.

A favorite incident, really showing perseverance with the problem comes from a friend. She was out in an event one evening with her development team, having a good time. She returned home late, crashing in for the night as soon as she could. Next morning she realizes things are a little off at her house. Her scented lamp has fallen and stained her sofa. Going into her kitchen, she discovers a handwritten note. In her absence, police had visited inside her apartment to turn off Alexa, and left a note. A little later in the day, a worried neighbor reaches out and tells about the full blasting volume of music leading to calling the police over, this being so unusual. Turns out Alexa had been throwing an empty house party. 

We were recounting this incident and all the details of interoperability at a conference to realize - later to be confirmed by another colleague working inside AWS, that there was indeed a case where you could have Spotify in a state where you could be far away trying to connect Spotify elsewhere, only to turn up volume on the previous location. A very specific flow. This interoperability bug resulted in an invoice from the apartment company for opening doors to the police late in the evening in owners absence, and a lot of research on whether you could be responsible to pay up for a bug somewhere between Amazon and Spotify. Turns out that spending enough time fighting, you did not have to pay. 

I did not turn this one into a talk and research for quotes, but I did draw a major lesson: 

Getting lucky costs you time and money that no one is responsible for. 

It's a lesson that is hard to miss as a tester. You are sometimes painfully aware that outside the paid hours where you work for advocating for users in a very particular style, the same characteristic of serendipity is used as free labor sometimes by accident and sometimes by design of the companies. 

When I a few years ago serendipitously run into a Foodora security bug that allowed users to get free food, I did the free labor following through to fix before I disclosed my knowledge. They gave me a discount code of a few euros for hours of investigating with them, which compared to my professional rates are quite a contrast. 

When I last week serendipitously run into another Foodora bug that now impacts users, I came to realize the immense power difference when tables are turned. User advocacy takes even more time. I do user advocacy, because I believe things can be different. 

Things can be different if people share their experiences, and individual experiences sum up to phenomena that speaks to the power. 

Things can be different if people care enough to pick up feedback outside the usual channels. 

Things can be different if people believe there may be a step in the flow where while it works on your machine, it indeed does not work on mine. 

Believing things can be different guides my choices of what I end up doing with my time and effort. I ended up researching legislation to learn where to report and how so that I can loan power a regular user does not have. They would have to investigate with a specific way of asking and report back to me in writing in 14 days. They would have to correct the state of my banking within 1 day. 

I did not imagine that in reporting this I would serendipitously find one more bug: the messaging app fails with error code 500 ("it's our backend") on a specific part of the message I had crafted, even if it seems like regular text with no special features. Reporting the second bug was possible in the messaging app, reporting the first bug required sending over a portion of the text as a screenshot. I did try also talking to a developer who worked at the company and was helpful in explaining how things should work on Foodora, because believing things can be different means I believe people could help when their organizations technical channels fail. 

From this one, I can draw again major lessons: 

Reporting channels don't always work even if you find them, reducing your chances of solving your issues.

Testing gives you tools and attitude to work around what blocks you: the text vs. screenshot, the developer inside the organization; the legal references.

A regular user would give up but I am no regular user, even if I am a volunteer on task of user advocacy. When I get the written report on this, it still does no conclude the case even if it hopefully fixes this bug. There's still the work we need to do in the industry to connect the regular users with working feedback systems to get things set right, and the work of setting our systems right. I hope the programmers and testers community does better on quality, and I know it is not easy. I know we try and that we are not done. 

I've been a tester for 27 years because I stick with the problems longer. And I stick with the problems on user advocacy. It's as important as ever when we can't even find a channel to communicate back our troubles to the multinational corporations. Just look at the troubles content creators have social media being kicked out from the foundation their businesses run on, without routes to appeal. 

User advocacy is speaking. It's listening. And the latter is needed even more.  

 

 

 

Thursday, August 15, 2024

Which of us did not understand, again?

A few years back, a friend in testing got on a stage and shared a personal story of bug advocacy. She found and reported a bug, and the reporting ticket kept pinging back and forth as "does not reproduce" up to a point when the two of them were put on a spot no one should ever be in - a boss with power of letting you go, and an ultimatum where it's one of you stuck in that loop. If the problem can be reproduced in front of the boss, the tester stays. If it cannot, the developer stays. The problem was reproduced, and the story illustrates things can escalate. 

I think about this story whenever I find myself in an argument with a developer about a bug that I think I am experiencing. When we disagree, it almost always means one of us is missing information. I have had this argument professionally enough to know that sometimes it is me missing information. But a lot of times it isn't, even when I am told I am. 

Something went wrong

Last Saturday I ordered food with Foodora delivery app. I have done so before. I was already using the app less frequently, choosing another option over it due to previous two experiences of using it, but I gave it a go. 

I ordered food once. I confirmed the purchase with the bank twice. And I confirmed the purchase with mobile pay three times. This is not how the purchase flow is supposed to work. But this is how the purchase flow has been for me with my last three uses. It's not a temporary glitch. With three times of having seen this, I have a good idea of what it is in my flow of use that reveals the problem. Yet I am unwilling to test in production, with my own money. After all, the result for the user is that today, 5 days later the money has been invoiced from my account once (correct), but still remain reserved so that it is unavailable for me to use (incorrect). 


I also know my experience in the foodora app. The user interface flow never completed with confirmation. Yet I got my food, I paid for my food, and forcibly loaned money for the bank in their reservations queue. 

I offered the service desk person that I could work with them for free to isolate the problem. They told me they don't have that kind of contacts within their company, and they could only tell me to wait to see if I get invoiced once or twice, and that their system only shows only invoice going out. 

I have experienced reservations of money for a failed transaction before. I just don't experience it with every time I use the application. Or in a way where the application never finishes the users flow yet delivers me the food. So I suspect there is a problem here. 

Speaking about problems

I decided to use the experience as a talking point on LinkedIn. My message of what I wanted to say was muddled with multiple messages. I wanted to say:
  1. I am still happy to help Foodora figure out the bug because I think I can isolate my actions better than an average person, with 27 years of testing experience. Not asking to be paid.
  2. Paying for people for testing, isolating problems being part of it, would be good. 
  3. Users need advocates and holding double bookings, especially in scale, is unacceptable, even when they get the money back in a week or two. 
  4. When users suffer the consequence, we only care in scale. When the company suffers the consequence, we call it security vulnerability and care immediately. 
  5. Users option - my option - is to not use this service. They would not know why my business goes elsewhere. 
Lovely developers jumped in to help. 

The first one wanted to help me with not saying bad things about Foodora in public, enough to go and snitch tag my employer. I got a lovely Sunday evening of worrying if a multinational corporation would let me go on my trial period because I shared an experience, until I stopped spiraling and realized I work with smart people who would do no such thing. His point was that in having so many messages and not my context, he did not appreciate tone of my post resulting from comparison of how differently we treat individuals and companies. 

The second one wanted to help me like they help users. He explained preauthorization process - which I also am aware of, and even did some research on related to this case - and that it is designed to ensure the company gets their money. He insisted, and probably is still insisting that it is not a bug that I don't have access to 35,69 € of my own money 5 days later. I agree that it would be a worse problem if I ended up with a negative balance due to reservation missing. And I agree we prioritize the company getting their money, even if the user is forced to live with extra reservations. 

What we don't agree on is my experience on the app that causes the double - triple transactions. I am pretty sure it is an interoperability problem and a difficult one to test because of the specific conditions in my flow of use, unlikely to be available without setup in an integrated test environment. 

I have been a part of enough financial systems integrations to have learned two things:
  • Production-like test environments don't exist for all financial integrations
  • We can't test the exact user flows if we don't have the users kind of environment
Financial integrations tend to be, at least in Finland, heavily in dominant market positions and you can't exactly choose who you integrate with if you want to move money.

I tried making a final point: 
The company whose app I used to make my purchase is responsible for choices of the contracting chain.
Theoretically, they could choose to use financial service provider that designs the transaction flow differently. Realistically there are no options, and becoming an option would be hard. But if quality issues with double reservations were a problem in scale, they could seek solutions with options other than shouting down the chain. 

End result of this is that Foodora lost my business until I forgive and forget. Since this is my second rodeo, I know it took me something in the neighborhood of a year last time. In that time, I would choose to pay with a different flow, or someone catches the glitch they have for now. Not that we would know of it. 

I wish all the best for the development team at Foodora, with great production-like environments, feedback reaching them from their call center, and a lovely developer who will notice and fix the problem even without the two others. And thanks for fixing the security bug I reported where I got free food. I wish it was the same sense of urgency when your users are experiencing the trouble in whatever sociotechnical system you have going on. It might even include my colleagues, I would not know. 

At least it was not that one of us would be fired at the spot. 


Tuesday, August 13, 2024

Did you make notes of what you learned on the CrowdStrike case?

It was hard to miss a few weeks back: CrowdStrike's inputs off-by-one bug bluescreened a number of computers relevant enough to impact how business of all sorts run globally. I'm still not sure what is the top sentiment of this for me:

  • Sucks to have been the cause of this. Sociotechnical systems fail and you never know when you're part of the system that causes this. 
  • Scale of who were impacted surprised me. Managing to deliver visibly broken to this scale in an hour feels rushed. 
  • Costs of this across industry must be everyone's responsibility. Having people manually fix this on business computers globally must have cost a fortune, and the bill of this must end up being split instead of finding its way to source. 
Everyone in this tech bubble followed this. Everyone had an opinion. There's a case of failure now, something other than Therac-25 we still talk to this day as an example even if that happened 1987. 

An Outsider's Recount of What Happened

Revealing the details of the issue reached us in two waves: 
I learned more from the second wave of reporting. 

I learned the problem was an off-by-one number of inputs of specific kind between two components. One component produced input of 21 input parameters with the extra input parameter including specific kind of input. The other component received 20 input parameter and did not work well with the extra input parameter with specific kind of input. 

The time-based distribution of why this leaked was particularly interesting. 

The capability to produce templates with 21 inputs was introduced in February and taken to production. The use of the capability with incompatible input happened in July, resulting in the blue screen. 

Process-wise, the bug was missed in February. It may have been introduced by another developer. The latter developer, in July would have found the bug if they tested previous developer's work again. Testing of it, as it appears, would have required two things that take effort away from the development environment: 
  1. Production like environment. If Windows blue screens, a Windows would have been needed. Nothing in the materials discusses if developers could test this on their development machines, or if it would require connecting to a virtual environment or remote environment. 
  2. Looking at result from outside in. As last step of making a change, someone could look at the change working at least on one Windows machine. The reporting, however, does not make it completely clear if there really was no third conditions like having to run the system for more than 5 minutes. These kind of faults could take time for the computer to do the work to come crashing down. 
This could have happened in many of the organizations I have been at. Developer teams shortcut production like environments and looking at the result from outside in. Even when automated, end-to-end tests like this take longer to run. It is often considered a risk worth taking that we make software available for users without zero pair of eyes on it. I tend to prefer two pairs of eyes on a change, because no one - no one - should be left alone with responsibility of potentially breaking some millions of computers globally. It is always a fail of the sociotechnical system when pushing bugs to production, in scale, happens. 

The changes outlined were interesting: Items 1-4 are missing behaviors (features) of the code. Item 5 says they will test in changes now in integration instead of in isolation. Item 6 says they will throttle delivery to destroy a select group rather than everyone. Items 5&6 are generally considered good practices. 

Recounting Personal History
 
I used to work with security software with risks exactly like this - possibility of taking down millions of Windows machines. This whole incident reminds me we too, decade(s) ago had problem like this that test coverage missed. But we had a few other mechanisms that protected us: 
  1. Delay-based distribution: the group who would get things first was small but reactive, and rendering a board member's computer useless without manual intervention did a lot of good for investments in ensuring the lessons were learned without impacting customers
  2. Eating own dogfood in company: we learned to distribute internally continuously, and segment distributions. The whole environment for testing was built to provide HUMAN interventions because test systems fail. 
  3. Throttling, autorevert, secondary update channels: we built a number of features that would enable us to fix things without showing up face to face
  4. Architectural rewrites: isolating risks like this, because not all software causes blue screens. 
And then there was the continuous investment on test coverage. I would have remained disappointed if the conclusion in the end is that test coverage is what they missed, when they have a wealth of investments their board members would happily sponsor to never see this happen to them again. 

Moving on

The sociotechnical system designed and put in place did not account for a risk that realized. And that leaves me thinking of this.
"Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand."

--Norm Kerth, Project Retrospectives: A Handbook for Team Review
This is our chance as industry to remember: Production-like environment and looking at result from outside in. In some teams developers do that. And in many teams still, it's considered so important that the teams invest in a team member with a testing emphasis. Some teams call that person a developer, but others may use the familiar term 'tester'.

Sunday, August 11, 2024

Explaining Exploratory Testing

In this last week, a realization hit me. Exploratory testing, coined by Cem Kaner and the most natural frame in which testers would work and do good work in, just turned 40 years old this year. It has been around longer than I have, and yet we don't agree or understand it fully. 

When first observed to be labeled 40 years ago, it meant the exceptionally different way of testing that cost and results -aware companies in Silicon Valley were doing testing. It was multidisciplinary, and it generally avoided test cases that the rest of the non-exploratory testing companies were obsessed with. 

We learned it was founded on agency, the idea that when two things belong together, we don't separate them. And a lot of people separate them, by having different people do different parts of what is essentially the same task, and having a separation in time to protect thinking / learning time of testers. We learned that opportunity cost was essential because we could choose to do things differently with the same limited time we had available. 

Some people run with the concept and framed it with testing vs checking. Checking was their choice of word to say that for the exact same reason exploratory testing was framed as an observation of Silicon Valley product companies doing something different, we needed to wrap the other by contrast. Ever since I realized that checking is an output of exploring, I have not cared much for this distinction. And when it became the main tool for correcting people, I stepped away from it more actively. 

We can still observe that not all companies do exploratory testing. And looking deeper, we can see that some companies do exploratory testing as a technique, kind of as it is framed in a lot of writing that is based on how Lisa Crispin and Janet Gregory describe it. Others do it as an approach, and that is how I tend to describe it. 

For sake of illustration, I sketched a typical social agreement of how I work with an agile team. 

My work as tester starts already before the team gets together to look at a feature in a story kickoff. I usually work with product owners on impact analysis, exploring the sufficiency of our test environments, possible dependencies that would need to be in place, the availability of right skills and competencies for success, the other features that will be impacted, and so on. With impact analysis, I usually use the previous version of the application as my external imagination while thinking of what we'd need to address. That is very much exploratory testing. 

When we then prioritize this feature and have a story kickoff, I join the team in exploring what examples would be the minimal set for us to understand what is in scope and out of scope. With my best efforts as a seasoned tester, I seem to get to about 70% success of identifying claims of relevance with my teams. We usually write these down as acceptance criteria (updating whatever was there already from impact analysis), and as examples we would like to see in test automation. 

While implementing, we also implement test system. The unit tests, other tests, the reviews of the tests, the new capabilities the other tests rely on if a feature is touching areas where some automation capabilities are missing. If you wonder what I might mean by an automation capability, a good example is a library allowing for certain kinds of commonly available functions, like simulation. 

Even though I was exploring throughout the implementation, I do take a breather moment and just look at the thing we created as my external imagination, trying to think of stakeholders, and their feedback. I might even drag them in, and strengthen my external imagination from just the application that speaks to me, to actual people looking at the application with me. 

Then, I will still look at things with my team once more, seeing if we can just press button to release or if we want to double check something. I aim to minimize anything I would have to do while releasing, but at the same time, I make sure one of us is exploring the experience with the new feature included. 

Finally, I follow through. Sometimes I follow immediately. Sometimes I follow in a month, three months and six months. I deal with the long tail of learning by exploring what use in production is like. In mature organizations, I do much of this from logs and access to customer facing issue trackers. In not so mature organizations, I drink coffee with people who meet real users. 

Within the social agreement in a team, I have an exceptional level of agency: I am allowed to break the social contract at any time when I recognize something important. I discuss what I plan on doing, and sometimes we have a conversation. Some of my activities feed into the feature we are on now, other times on the features we are about to think about after this one. My exceptional level of agency allows me to choose what I do and which order, making agreements in who does what. Then again, I work in a very social context where dailys allow us to redistribute work, not from a backlog of tasks of one. 

If in any stage of the process people talk about test cases, that's as output. Sometimes we need to leave those behind for standards that don't quite get exploratory testing. And in most cases we hear "test case" and we just transform it into a programmatic test. While it does only limited part of the testing, it does enough of evidence as long as it is framed in exploratory testing mindset. 

For me, exploratory testing is an approach. I explore, target, with resources, for information throughout. When it is a technique, it's that rethink with external imagination part of the social agreement.

In its core, exploratory testing starts with the idea that while there are knowns (or targeted requirements we think we know we want), there's movement in continuous learning. 

The difference in thinking tends to drive opportunity cost and thus prioritization. Having to choose between writing automation and using the application, a lot of people in exploratory testing would choose the latter. And when I speak of contemporary exploratory testing including automation, I discuss a frame in which we actively change things so that we don't need to do choices between the two, but we can merge the two. Modern tools, modern teamwork and short cycles in agile development all enable that. Our ideas of what exploratory testing is and isn't are still sometimes getting in the way.