A Seasoned Tester's Crystal Ball: September 2024

Tuesday, September 24, 2024

Learning programming through osmosis

This article for written in 2016 for Voxxed that is no longer online. Back then I did not know POSSE and thus this piece has not been online for a while.

I identify mostly as a non-programmer. Yet, two weeks into a new job I’m already learning and contributing to Python and C++ -code. The method that enables me to do this is ensemble programming, the idea of having a group of people working together on one computer on a task, taking turns on who types for the team while others instruct. For an idea to get from one’s head to get to the computer, it flows through someone else’s hands.

This article shares key insights from my journey over a little over a year on learning programming through osmosis, just being around programmers working on code, without intention of learning. As a result of learning, I rewrote my history with things I had forgotten and dismissed from my past. I hope it serves as an inspiration for programmers to invite non-programmers to learn to code one layer at a time, immersed in the experience of creating software together to transform the ability to deliver. Lessons specific to skillsets get transferred both ways, and while I learn from others, they learn from me, leaving everyone better off after the experience.

Finding Ensemble Programming

Many different roles contribute to building software: product owners, business specialists, and testers, yet, knowledge of programming keeps these roles at a distance. I did not come to programming through wanting to program or taking courses on it but through working with programmers in a style called ensemble programming.

As a tester within my team of nine developers, it was clear I was different. I wasn’t particularly keen on learning programming since there was more than plenty of work in the feedback through empirical evidence and exploration that is my specialty I’ve developed in depth over two decades. I’m an excellent exploratory tester and my team’s developers have always been my friends with a pickup truck that I can call in for assistance on anything where code needs to be created. Besides being the only non-programmer, I was also the only woman and part of a team, where some people would occasionally spout out things like “Women only write comments in code.” Not exactly an inviting starting position.

Although I did not like programming, my hobbies that started at the age of twelve and my computer science studies, that further killed my interest in programming, I had acquired experience in coding twelve different languages. I started making small changes in how I looked at programming in a different light for my daughter’s sake, as I did not want to transfer my dislike of code to a 7-year old about to be embedded in an elementary learning environment where programming is everywhere as programming is a mandatory part of Finnish curriculum now.

The real change, however, started with Woody Zuill’s talk in a conference I organized. Woody is the discoverer of ensemble (mob) programming. The idea of the whole team working on a single task, all together on one computer just sounded ridiculous, yet as ridiculous as it seemed, I thought it could be a way for my team to learn from one another as well as create team building. Instead of taking someone else’s word on methods, I have a preference on experiencing them first hand. And it wasn’t like we had to commit for a lifetime, just to try it out once or twice.

The First Experience Expands

With some discussions, my team agreed to try it out, but I knew I would be out of my comfort zone since I would have to be in front of a computer working on code. Our first task was to refactor some of our code with Extract Method and Rename automatic refactorings and we had an experienced ensemble facilitator lead the session for us. While not on the keyboard, I found myself able to comment on the names from the domain, and while on the keyboard, I noticed with each round that I was picking up things: keyboard shortcuts, ways to navigate, programming concepts without anyone really explaining them to me when the work was being done. In the retrospective, I could reflect on my learning and realized that not only was I picking up things I did not know before, everyone else was doing that too.

I felt safe in a group, as I did not need to be fully paying attention to every detail at any time, and I was always supported by a group. Surprisingly, the expected negative remarks on gender did not come out in a group, whereas they would be a regular thing in a more private pairing setting.

From that first experience, my team extended this to a weekly learning activity. I took the mechanism of learning for myself further, organizing various ensemble programming sessions with the programming community on different programming techniques and languages, learning e.g TDD and working with legacy code in a hands-on manner. I introduced my team to ensembling on my work, exploratory testing and they learned to better identify problems. In our ensemble programming sessions, there were several occasions where my existence in the room fixed an expensive mistake about to happen from half a sentence of discussion. Finding a problem like this early on led to more efficient and productive work for everyone. Although it seems inefficient to have so many people working on one thing at the same time, the saved time in avoiding context switching, passing feedback back and forth, increased focus on steps to complete together with great quality, as well as learning made us develop much faster and with less future problems.

Joining An All Female Hackathon

I took the idea of ensemble programming to a weekend hackathon outside work and convinced my fellow teammates to try it out, but only three people decided to be involved out of four.I avoided setting the expectations of me being a non-programmer and just joined in with whatever programming skills I had, without disclaimers. There was even a woman participating with less coding experience with, as she had never even looked at code before.

Out of that weekend, I came out with four major realizations:

The best programmer outside the ensemble only contributed graphics. In the ensemble, we were adding one feature at a time and committing regularly, and the senior programmer found it hard not to have modules of her own to work. There was no long-term plan for incrementally developed software and the version kept changing under her. We tried summarizing the lessons on the used technology for her, but she kept hitting problems that blocked her.
I passed off as a programmer. No one noticed I was not a programmer. And the reason was that I had become one. I realized that programming is like writing. Getting started is easy, and it takes a lifetime to get good at.
The non-programmer felt like an equal contributor. Her experience was that the code created was just as much hers as any of the others and that is a powerful experience. She learned the basics with us through typing for us, and reflecting with us.
We had working software. Not all groups had the same luxury. In the ensemble, we had the discipline to have not just code, but working code to a scope that could vary depending on how much time we had to add more functionality.

My Main Lessons

Cognitive dissonance is a powerful tool

The experiences of working with a ensemble for over six months transformed how I perceived myself. No amount of convincing and rational arguments on how much fun programming is could have done that. When my actions and beliefs are not in sync, my beliefs change. And that is what ensemble programming did to me. It made me a programmer, through osmosis, and got me started on a long journey of always getting better at it.

Non-programmers have a lot to contribute

I saw that while I was learning a lot, I was also contributing. As a tester, I had information about intents of the users that seemed mysterious to my programmer colleagues. We would test better while programming, just because I was there. We would avoid mistakes that were about to happen, just because I was there. I could give feedback without egos in play, and we could all learn skills from one another. And even me being slow was a positive thing - it made the other programmers more deliberate and thoughtful in their actions, and they shared the realization that they created better code while slower. I ended up feeling really proud of how much better my developers learned to test with our shared ensembling time.

Team got out a lot

I wasn’t the only one who learned - everyone in the team picked up different things. It was a pleasure to see how abilities to add unit or selenium tests expanded from individual to a team skillset, and how many times we found better libraries because just one of us was aware of it.

We slowly moved from working on technical debt and cleaning up to a shared standard to having technical assets in the form of libraries that would enable us to do things faster.

Everyone got their voices into the code better. We worked with the rule that if we had several ideas of how a problem could be approached, we would do both over arguing while we had the least practical information about how it would turn out. And it was surprising to notice that something that someone would fight to the bitter end with, was good enough to accept after the implementation was available, and not just because people would lower their standards.

We also learned that when one of us did not feel like contributing in a ensemble format at first, it was a good idea to let one opt-out. The party-like nature of the sessions and the evidence of the rest of us bonding and learning inevitably drew these non-participators back in on their own initiative later on.

Ensemble Programming as a Practical tool of Diversity

Ensemble programming is a great way of introducing new people to programming, or testing for that matter. It transfers a lot of the tacit knowledge otherwise difficult to share. It brings the best of us to the work we do, as opposed to the most of each individual. While working together, we can remove a lot of the rework with fast and timely feedback. We raise our collective competence, allowing individuals to use specialized skills. We used a rule “learning or contributing” to give a great guideline in thinking of when a ensemble is doing what it is supposed to.

As software is such a big part of our society’s present and future, we need all hands on deck in creating it. We need to find ways of bridging roles without telling others that everyone just needs to be a programmer. In an ensemble format, I learned that while I picked up my hidden interest in programming, I would have been a valuable contributor even without it. There was a struggle for both me to go do things I thought I wouldn’t enjoy and the team to work in a setting they were not used to. It was worth the struggle to remove the distance I previously felt between myself and the programmers.

Just adding more women and people of color to the field of software development isn’t enough if the people struggle to get their voices included. We need to do more than make the world of coding look diverse. With ensemble programming we can use that diversity to innovate the world of coding overall. (Props on this thought to Kelly Furness, who was in the audience with my DevOxxUK talk)

It’s not just learning programming by osmosis, but the learning is mutual. Give it a chance.

About the author

Maaret Pyhäjärvi is a software professional with testing emphasis. She identifies as an empirical technologist, a tester and a programmer, a catalyst for improvement and a speaker. Her day job is working with a software product development team as a hands-on testing specialist. On the side, she teaches exploratory testing and makes a point of adding new, relevant feedback for test-automation heavy projects through skilled exploratory testing. In addition to being a tester and a teacher, she is a serial volunteer for different non-profits driving forward the state of software development. She blogs regularly at http://visible-quality.blogspot.fi and is the author of Ensemble Programming Guidebook.

Wednesday, September 11, 2024

Do Thee TDD?

Sampling many customer organizations, I can't help but to note a customer theme we aren't answering well. The question is if we are doing test-driven development.

A lot of us know what it is. We usually have learned to recognize it as possibly two different patterns:

TDD while programming. Super-small loops (inside out, 'Chicago school'). Or small loops with mocks at play (outside in, 'London school').
ATDD (BDD, SBE - lots of names for similar idea) where examples characterize the feature before adding it.

For a lot of the customers through, I realize these two are more intertwined. And the conversation very often gets derailed to defining if the test *really happened before*, and how often did it make sense for each of the developers to write the test first ('isolating a bug is great test first') or write it as part of the few hours-few days feature they are on ('easier to capture intent in the same pull request when I first figured out how to get it done'). In a scale the customer looks at, you can't really tell if it was before or after. In scale of the developer learning techniques to better control and describe the intent and not miss relevant bits with short-loop-after, learning the test-driven development techniques, both Chicago and London styles to mix them up probably does a whole world of good.

The customers concern is not always whether the test came first. But it is if it came before (ATDD style) and if it came with the change itself (included in PR).

I find myself characterizing the answers to this team with slightly more granularity:

Level -1. Test after with tester tests and bug reports. This happens a lot too. The 'nightly run' where analyzing the failures takes a week. We've all been there. Lets hope for a generation of developers who will look puzzled at that statement.
Level 0. No Sign of TDD. When code is merged with pull request, significant effort of testing follows in subsequent pull requests. There could be test changes with the original pull request, but their intent tends to be to get old tests to pass.
Level 1. Short-Loop-After. When code is merged, so are tests. Same pull request. Thus in same repo, going into the pipeline. Little care if it was a mix of before and after writing the implementation because the loop is short enough. This more driven and continuous than we ever used to have and we should celebrate.
Level 1b. Disciplined TDD. When code is merged, so are tests. Mixing outside in and inside out, with and without mocks, but the developers consistently write tests first.
Level 2. Acceptance criteria with examples. Examples from customers, illustrating core things that are different after the change, and introduction of the new behavior. Just having the examples around help developers with a clearer definition of done, and less looping back to new information to learn. Things aren't obvious to everyone in the same way.
Level 3. BDD automation before implementation. Examples passing one by one drive the idea of are we done with the change.

The three first teams I think of are on levels -1, 0 and 1. They all aspire to level 2.

Smaller steps may make it more manageable as a change. Where are you, and where are you heading?

Monday, September 9, 2024

Learning to test in Dynamics365 projects

How do you become an expert in something you did not know yet? By learning about it. You have a foundation of knowledge you probably acquired on other products, and if the foundation is large enough, you will be bound to see similarities. This is how I feel about being thrown at my first Dynamics 365 project.

Learning in public - explaining how my learning evolves and how my thinking evolves - gives me a chance of learning from people who know things I did not. And it provides the odd chance that my learning is something of use to someone else.

What's this about?

Dynamics 365 is one of the (many) platforms. You may have, like me, experienced SAP. Or Salesforce. Or Guidewire. Or Odoo. And you can continue the listing. What these essentially are things enabling reuse. I personally like to call them platform products. There is a lot of common functionality for all their users. Yet there is even more own data, configurations, integrations and changes so that the resulting system looks different, is used different, and most definitely holds up information and processes of highly different organizations. They are the epitome of modern reuse. If you could buy a product and use the product everyone else uses too, maybe you did not have to build your very own system. Meanwhile, tailoring enough means that the theory of reuse meets practice in this thing we lovingly call testing, where the rubber meets the road and good plans go to meet empirical evidence to ensure our business still runs with all the plans in place.

This particular one is a product platform in cloud done by Microsoft. Organization count 1. It is usually configured, integrated and extended by integration partner. Organization count 2. In integrations, there may be a load of other systems as data sources and data targets. Organization count 2+N. In the start of the chain is the organization that assigned responsibilities for all the other organizations, the owner of the system/service, the customer with their users. Organization count 3+N, and responsibilities of ownership.

Your usual testing vocabulary isn't helping me

Calling some of this testing acceptance testing isn't really helping me. And particularly, calling some of this unit testing isn't helping me, almost the opposite. Surely if we configure functionality (or decide to not configure it, and working with defaults), it makes sense to verify that the behavior I get is the behavior I want. Most often that testing through needs to happen with at least a partial integrated system, and it may really well be just partial. This drives the design I would need testing vocabulary to reflect towards testing components/services, integrations, and flows across components, services and integrations. Instead of shifting left, here I need shifting down. I need to understand the smaller scope I can verify a functionality in. And if I succeed in that, the feedback granularity for the organization that is expected to react to the feedback is better.

Theoretically speaking, it would be great if these platform products shipped with tests for the defaults. They rarely do. If they did, I could test with defaults, adapt and extend those tests to test with my configurations, and build a systematic feedback that tells the chain of responsibilities.

However, I usually end up in these projects from the ownership organization perspective. For me to know if our business flows work, I approach this with the idea of testing core business flows with the application, targeting it with the knowledge of changes. It tends to be better if the chain works, and a chaos ensues if the system is significantly broken.

That Test Automation thing?

This comes along quite naturally. You have rolling updates (where you may not be able to delay the update at all), and you have quarterly updates (where staying without updating is not possible as an approach, for good reasons). But this means you have pretty much continuous responsibility for testing in the organization of ownership.

Some people rely on staying close to defaults, and approach this with taking the risk that if the product platform does not work with defaults, it gets rolled back and fixed by the product platform organization. The closer to defaults, the more likely you are to be able to play with timings so that the first wave of installers got whatever was on your way. There's risk, but the risk may be manageable close to defaults.

Yet usually we are not close to defaults. The further away from defaults we shift, the more there is functionality the product platform organization is unaware of, unable to test for, and thus responsibility for it surviving change is allocated later in the chain.

You would usually invest in test automation for this. It could be component level, for things where you go furthest from the defaults. It could be process level, to catch things on the basic flows. Or it could be an intricate web of both of these. Plus the test automation that tells you when to point blame towards the product platform.

In the whole chain, assigning the responsibilities to strategically design the necessary automation is on the organization of ownership. This is where the low code tools find their most lucrative points of entry.

However, the "no code" approaches are just a visual programming language. If it diffs poorly, it is poorly maintainable. It's a balance, and a belief system. I don't think acceptance testers recording automation tests is the way to go. Shifting down for designing per component / service feedback is the way to go. Visibility of these tests is the way to go.

Technologies, architectures - it all maps to common web / cloud

Scratching this just a little deeper, I come to realize I have very basic web / cloud things in scale.

Web pages can be automated with Selenium, Playwright - well, any of the web driver libraries and related testing frameworks. The "scary" parts shadow DOMs, dynamic id's and deeply nested components could perhaps use help of a tool that hides some of that locator complexity. But if it's complex enough, hiding it also means taking away power to maintain it.

REST APIs can be automated with any of the language specific libraries.

Why did I want a commercial tool I would have to learn? Or why would I choose to teach that commercial tool to my fellow testers over teaching them the basics of programming for test automation purposes that I know even business testers are capable of learning?

Let's say the jury is most out in this space. I'll write more when that makes sense to me.

The First Experience - A Users Experience

My first touches to these projects come from having used systems with this - without really realizing I had. Connecting the realization to use examples, I also have examples of missing functionalities on Safari, being forced to use incognito mode and cleaning caches to be able to get some of these tools to work.

The real question is that since the users experience has directed me to not use Safari, would we care to use all browsers? And what drives the browser differences - I'll learn.

The Lingo

Finally, in addition to testing vocabulary, there is the product lingo. D365FO, D365CE, feature names, change listings, scope of each project. I find myself classifying: product platform vs. configuration to make sense of it.

Turns out there is D365RF - the common test automation keywords for robot framework.

Is this how you take on new testing assignments too?

With a baseline thinking written down, I'll let you know how much more I know in a few weeks.

Monday, September 2, 2024

Who Are You Paying to Learn AI with You?

Two years ago, Heini Ahven published a research paper (thesis) on AI in testing, concluding from her interviews that there are two particular hurdles for AI in testing: Data and Customer to pay for it. While in two years we have shifted away from needing data and primarily discussing use of generative models now, the essential challenge of Customer to pay for it remains. Customers are making choices of who they bet on to pay to learn AI with them.

It would seem to me that these things are good reasons to bet on us:

We have a track record of having created customer specific systems with AI in them
We have a track record of having new products with AI in them, most notably in modernization of legacy systems, but generally too many to list
We publicly say (and can back it up) that we have already invested 1M into AI in the last year, and built quite a platform of knowledge with it
We know software development, and we know testing. And we know these in scale.

That's the high level. Yet I think that reframing the question from do we have solutions to who are you paying to learn AI with you is the way to go. And personally I think that you would do well learning AI with me and my crowd.

We have looked at large numbers of testing tools with AI in them, and can help you sort out positioning of those tools
We have used tools with AI in creating test artifacts, and can help you sort out sociotechnical guardrails of use of these tools so that you can steer your learning
We're happy to pick up a tool you want to learn with even if we haven't yet, and amplify your learning with success in mind

There is a lot going on in the scale I get to pull from. I chose 6 activities, 4 values that are core to the approach I work with.

We need to know where we are to make sense of where we are heading. We were expecting improvement and agreeing how improvement can be recognized is key.

We experimented already, and we scale to experiment with more customers. Any solution in this space has learning at heart, and keeping learning at heart steers to benefits.

We collected sociotechnical guardrails for different kinds of applications of AI in testing. A lot of what we have been learning we can feed into new organizations, and improve with ongoing learning that benefits us all.

We rely on building new habits that are good habits, to instill and sustain a culture of learning. This usually means we need to work with people in working through the change rather than input material.

What we learn, we teach on. Sharing is a way of continuously seeking improvement.

Some of this we package and make available in scale that helps anyone. These are new tools and services emerging. We recognize attribution is also IP, and we recognize scale will have a mix of different kinds of IP.

In these six activities, we value four things:

Our approach is human-centric and we are learning best ways to have people in the loop
We seek enhancing to better
Our expectation is incremental, with controlled investments that can expect results
By focusing on many customers while carefully hearing each customers specific challenges, we seek to make helpful impact in testing field in scale

This said, the question remains: who are you paying to learn AI with you, and could be us?

The author is Director, Consulting Expert at CGI Finland, focusing on AI-driven application testing. She usually writes about her work that isn't specific to CGI and felt like making an exception today. She is seeking primarily Finnish customers to join in increasing use of AI in testing, and believes open calls for collaboration are preferable to approach when seeking early adopters. Expectation for her new position at CGI is that she meets customers 130 times per year, and you scheduling a short conversation on how she could help would be mutually beneficial while unusual approach. She can be reached at maaret.pyhajarvi@cgi.com.

Do testers need to be devops engineers too?

At a point of my testing career, I specialized in understanding test environments. It started off with seeing connections between subsystems, and recognizing compatible data. Well, there was no other choice to test effectively in insurance industry, where a new IBM mainframe test environment (I got one of those too!) took 1 million and 1 year. I can't remember if it was time when we still had Finnish mark as unit of money, or if it was already euro time, I just remember the overwhelming sense of responsibility for enabling project with a million spending. That time of my career added awareness to any environment I would test in since, and categorizing workstations and servers and versions became a routine.

When cloud later landed on my world, the categorization and foundation of test environments was particularly useful. Recognizing locations for storage, compute and specialized services and setting up connections and dependencies between all these geographically distributed and provisioned as needed with controls we have and controls that we recognize but don't have helped a lot in figuring out what it was that I was testing.

It was easy to grasp that the new project that I just started with will have better working test environment early in the week, and we could expect troubles as the week advances. I could see what symptoms are likely to be about having provisioned a smaller test environment, what are likely to be results of data and memory, and I can design the way I test particular things around that weekly and daily cadence that I recognize going on for the test environment.

I was thinking about his today, as I saw someone asking if testers need to understand CI/CD, and what of it, and what specific skills are we supposed to have in that space?

Many of my colleagues extending in test automation space go CI/CD pipelines route after they realize that programmatic tests that run on their machine manually won't be much in the way of automation. There is a significantly higher value if tests are run right after a change that could break things is done and that requires designing the tests into a CI/CD pipeline. Many of those colleagues find barely a day a week for doing testing, when after running the tests nightly turns into optimized sets in the pipelines, and environments turn into dockerized orchestrated platforms where nothing changes in the infra without changing lines of code (Infrastructure as Code).

I still work with testers who understand environments on the level I used to - recognizing that there is a different address for two different yet same test environments, with heuristics on what to pay attention to each. They, like me before I integrated a lot of this CI/CD pipelines stuff into my thinking, use environments with specific timing patterns to control the version they are experiencing in testing. They may design environments on the level of not allowing change, because installing means often unavailable or out of control we understand. These testers need to understand CI/CD as mechanism of publishing and scheduling, but go no further.

Increasingly, I work with testers who design and enhance pipelines. While they don't need to do all the changes themselves, they need to read pipelines to see what goes on. Red has a reason, and drives their days of working to see where red is coming from. Majority of people in this group configure new jobs within the same realm of examples, and don't really take things further. Only some bring in new tools. But the new tools part, that is something people seem to love doing.

Then there are people who live in pipelines. Tests are placeholders and boxes, but they rarely have time to go and think about their coverage themselves. Working for the pipelines is the work. Making them run on better infra. Adding new tools. Upgrading the existing tools. Building all the machinery that could support the teams. These people, even with testing background, tend to call themselves devops engineers, to emphasize their attention to infrastructure and pipelines.

When hiring for a tester, you may expect any of these levels of knowledge. A lot of people search for the middle ground. More and more, we expect people to come with the knowledge of what control pipelines give to your test environments and options of testing.

And more and more, finding a balance where people know enough yet still manage to test not only build pipelines is what we seek.