Wednesday, September 11, 2024

Do Thee TDD?

Sampling many customer organizations, I can't help but to note a customer theme we aren't answering well. The question is if we are doing test-driven development.

A lot of us know what it is. We usually have learned to recognize it as possibly two different patterns: 

  1. TDD while programming. Super-small loops (inside out,  'Chicago school'). Or small loops with mocks at play (outside in, 'London school'). 
  2. ATDD (BDD, SBE - lots of names for similar idea) where examples characterize the feature before adding it. 
For a lot of the customers through, I realize these two are more intertwined. And the conversation very often gets derailed to defining if the test *really happened before*, and how often did it make sense for each of the developers to write the test first ('isolating a bug is great test first') or write it as part of the few hours-few days feature they are on ('easier to capture intent in the same pull request when I first figured out how to get it done'). In a scale the customer looks at, you can't really tell if it was before or after. In scale of the developer learning techniques to better control and describe the intent and not miss relevant bits with short-loop-after, learning the test-driven development techniques, both Chicago and London styles to mix them up probably does a whole world of good. 

The customers concern is not always whether the test came first. But it is if it came before (ATDD style) and if it came with the change itself (included in PR). 

I find myself characterizing the answers to this team with slightly more granularity: 
  • Level -1. Test after with tester tests and bug reports. This happens a lot too. The 'nightly run' where analyzing the failures takes a week. We've all been there. Lets hope for a generation of developers who will look puzzled at that statement. 
  • Level 0. No Sign of TDD. When code is merged with pull request, significant effort of testing follows in subsequent pull requests. There could be test changes with the original pull request, but their intent tends to be to get old tests to pass. 
  • Level 1. Short-Loop-After. When code is merged, so are tests. Same pull request. Thus in same repo, going into the pipeline. Little care if it was a mix of before and after writing the implementation because the loop is short enough. This more driven and continuous than we ever used to have and we should celebrate. 
  • Level 1b. Disciplined TDD. When code is merged, so are tests. Mixing outside in and inside out, with and without mocks, but the developers consistently write tests first. 
  • Level 2. Acceptance criteria with examples. Examples from customers, illustrating core things that are different after the change, and introduction of the new behavior.  Just having the examples around help developers with a clearer definition of done, and less looping back to new information to learn. Things aren't obvious to everyone in the same way. 
  • Level 3. BDD automation before implementation. Examples passing one by one drive the idea of are we done with the change. 

The three first teams I think of are on levels -1, 0 and 1. They all aspire to level 2. 

Smaller steps may make it more manageable as a change. Where are you, and where are you heading? 

Monday, September 9, 2024

Learning to test in Dynamics365 projects

How do you become an expert in something you did not know yet? By learning about it. You have a foundation of knowledge you probably acquired on other products, and if the foundation is large enough, you will be bound to see similarities. This is how I feel about being thrown at my first Dynamics 365 project. 

Learning in public - explaining how my learning evolves and how my thinking evolves - gives me a chance of learning from people who know things I did not. And it provides the odd chance that my learning is something of use to someone else. 

What's this about?

Dynamics 365 is one of the (many) platforms. You may have, like me, experienced SAP. Or Salesforce. Or Guidewire. Or Odoo. And you can continue the listing. What these essentially are things enabling reuse. I personally like to call them platform products. There is a lot of common functionality for all their users. Yet there is even more own data, configurations, integrations and changes so that the resulting system looks different, is used different, and most definitely holds up information and processes of highly different organizations. They are the epitome of modern reuse. If you could buy a product and use the product everyone else uses too, maybe you did not have to build your very own system. Meanwhile, tailoring enough means that the theory of reuse meets practice in this thing we lovingly call testing, where the rubber meets the road and good plans go to meet empirical evidence to ensure our business still runs with all the plans in place. 


This particular one is a product platform in cloud done by Microsoft. Organization count 1. It is usually configured, integrated and extended by integration partner. Organization count 2. In integrations, there may be a load of other systems as data sources and data targets. Organization count 2+N. In the start of the chain is the organization that assigned responsibilities for all the other organizations, the owner of the system/service, the customer with their users. Organization count 3+N, and responsibilities of ownership

Your usual testing vocabulary isn't helping me

Calling some of this testing acceptance testing isn't really helping me. And particularly, calling some of this unit testing isn't helping me, almost the opposite. Surely if we configure functionality (or decide to not configure it, and working with defaults), it makes sense to verify that the behavior I get is the behavior I want. Most often that testing through needs to happen with at least a partial integrated system, and it may really well be just partial. This drives the design I would need testing vocabulary to reflect towards testing components/services, integrations, and flows across components, services and integrations. Instead of shifting left, here I need shifting down. I need to understand the smaller scope I can verify a functionality in. And if I succeed in that, the feedback granularity for the organization that is expected to react to the feedback is better. 

Theoretically speaking, it would be great if these platform products shipped with tests for the defaults. They rarely do. If they did, I could test with defaults, adapt and extend those tests to test with my configurations, and build a systematic feedback that tells the chain of responsibilities. 

However, I usually end up in these projects from the ownership organization perspective. For me to know if our business flows work, I approach this with the idea of testing core business flows with the application, targeting it with the knowledge of changes. It tends to be better if the chain works, and a chaos ensues if the system is significantly broken. 

That Test Automation thing?

This comes along quite naturally. You have rolling updates (where you may not be able to delay the update at all), and you have quarterly updates (where staying without updating is not possible as an approach, for good reasons). But this means you have pretty much continuous responsibility for testing in the organization of ownership. 

Some people rely on staying close to defaults, and approach this with taking the risk that if the product platform does not work with defaults, it gets rolled back and fixed by the product platform organization. The closer to defaults, the more likely you are to be able to play with timings so that the first wave of installers got whatever was on your way. There's risk, but the risk may be manageable close to defaults. 

Yet usually we are not close to defaults. The further away from defaults we shift, the more there is functionality the product platform organization is unaware of, unable to test for, and thus responsibility for it surviving change is allocated later in the chain. 

You would usually invest in test automation for this. It could be component level, for things where you go furthest from the defaults. It could be process level, to catch things on the basic flows. Or it could be an intricate web of both of these. Plus the test automation that tells you when to point blame towards the product platform. 

In the whole chain, assigning the responsibilities to strategically design the necessary automation is on the organization of ownership. This is where the low code tools find their most lucrative points of entry. 

However, the "no code" approaches are just a visual programming language. If it diffs poorly, it is poorly maintainable. It's a balance, and a belief system. I don't think acceptance testers recording automation tests is the way to go. Shifting down for designing per component / service feedback is the way to go. Visibility of these tests is the way to go. 

Technologies, architectures - it all maps to common web / cloud

Scratching this just a little deeper, I come to realize I have very basic web / cloud things in scale. 

Web pages can be automated with Selenium, Playwright - well, any of the web driver libraries and related testing frameworks. The "scary" parts shadow DOMs, dynamic id's and deeply nested components could perhaps use help of a tool that hides some of that locator complexity. But if it's complex enough, hiding it also means taking away power to maintain it. 

REST APIs can be automated with any of the language specific libraries. 

Why did I want a commercial tool I would have to learn? Or why would I choose to teach that commercial tool to my fellow testers over teaching them the basics of programming for test automation purposes that I know even business testers are capable of learning? 

Let's say the jury is most out in this space. I'll write more when that makes sense to me.

The First Experience - A Users Experience

My first touches to these projects come from having used systems with this - without really realizing I had. Connecting the realization to use examples, I also have examples of missing functionalities on Safari, being forced to use incognito mode and cleaning caches to be able to get some of these tools to work. 

The real question is that since the users experience has directed me to not use Safari, would we care to use all browsers? And what drives the browser differences - I'll learn. 

The Lingo

Finally, in addition to testing vocabulary, there is the product lingo. D365FO, D365CE, feature names, change listings, scope of each project. I find myself classifying: product platform vs. configuration to make sense of it. 

Turns out there is D365RF - the common test automation keywords for robot framework. 


Is this how you take on new testing assignments too?

With a baseline thinking written down, I'll let you know how much more I know in a few weeks. 









Monday, September 2, 2024

Who Are You Paying to Learn AI with You?

Two years ago, Heini Ahven published a research paper (thesis) on AI in testing, concluding from her interviews that there are two particular hurdles for AI in testing: Data and Customer to pay for it. While in two years we have shifted away from needing data and primarily discussing use of generative models now,  the essential challenge of Customer to pay for it remains. Customers are making choices of who they bet on to pay to learn AI with them. 

It would seem to me that these things are good reasons to bet on us:

  • We have a track record of having created customer specific systems with AI in them
  • We have a track record of having new products with AI in them, most notably in modernization of legacy systems, but generally too many to list
  • We publicly say (and can back it up) that we have already invested 1M into AI in the last year, and built quite a platform of knowledge with it
  • We know software development, and we know testing. And we know these in scale. 
That's the high level. Yet I think that reframing the question from do we have solutions to who are you paying to learn AI with you is the way to go. And personally I think that you would do well learning AI with me and my crowd. 
  • We have looked at large numbers of testing tools with AI in them, and can help you sort out positioning of those tools
  • We have used tools with AI in creating test artifacts, and can help you sort out sociotechnical guardrails of use of these tools so that you can steer your learning
  • We're happy to pick up a tool you want to learn with even if we haven't yet, and amplify your learning with success in mind 
There is a lot going on in the scale I get to pull from. I chose 6 activities, 4 values that are core to the approach I work with. 

We need to know where we are to make sense of where we are heading. We were expecting improvement and agreeing how improvement can be recognized is key.

We experimented already, and we scale to experiment with more customers. Any solution in this space has learning at heart, and keeping learning at heart steers to benefits. 

We collected sociotechnical guardrails for different kinds of applications of AI in testing. A lot of what we have been learning we can feed into new organizations, and improve with ongoing learning that benefits us all. 

We rely on building new habits that are good habits, to instill and sustain a culture of learning. This usually means we need to work with people in working through the change rather than input material. 

What we learn, we teach on. Sharing is a way of continuously seeking improvement. 

Some of this we package and make available in scale that helps anyone. These are new tools and services emerging. We recognize attribution is also IP, and we recognize scale will have a mix of different kinds of IP. 

In these six activities, we value four things: 
  • Our approach is human-centric and we are learning best ways to have people in the loop
  • We seek enhancing to better
  • Our expectation is incremental, with controlled investments that can expect results
  • By focusing on many customers while carefully hearing each customers specific challenges, we seek to make helpful impact in testing field in scale
This said, the question remains: who are you paying to learn AI with you, and could be us? 

The author is Director, Consulting Expert at CGI Finland, focusing on AI-driven application testing. She usually writes about her work that isn't specific to CGI and felt like making an exception today. She is seeking primarily Finnish customers to join in increasing use of AI in testing, and believes open calls for collaboration are preferable to approach when seeking early adopters. Expectation for her new position at CGI is that she meets customers 130 times per year, and you scheduling a short conversation on how she could help would be mutually beneficial while unusual approach. She can be reached at maaret.pyhajarvi@cgi.com.  

Do testers need to be devops engineers too?

At a point of my testing career, I specialized in understanding test environments. It started off with seeing connections between subsystems, and recognizing compatible data. Well, there was no other choice to test effectively in insurance industry, where a new IBM mainframe test environment (I got one of those too!) took 1 million and 1 year. I can't remember if it was time when we still had Finnish mark as unit of money, or if it was already euro time, I just remember the overwhelming sense of responsibility for enabling  project with a million spending. That time of my career added awareness to any environment I would test in since, and categorizing workstations and servers and versions became a routine. 

When cloud later landed on my world, the categorization and foundation of test environments was particularly useful. Recognizing locations for storage, compute and specialized services and setting up connections and dependencies between all these geographically distributed and provisioned as needed with controls we have and controls that we recognize but don't have helped a lot in figuring out what it was that I was testing. 

It was easy to grasp that the new project that I just started with will have better working test environment early in the week, and we could expect troubles as the week advances. I could see what symptoms are likely to be about having provisioned a smaller test environment, what are likely to be results of data and memory, and I can design the way I test particular things around that weekly and daily cadence that I recognize going on for the test environment. 

I was thinking about his today, as I saw someone asking if testers need to understand CI/CD, and what of it, and what specific skills are we supposed to have in that space? 

Many of my colleagues extending in test automation space go CI/CD pipelines route after they realize that programmatic tests that run on their machine manually won't be much in the way of automation. There is a significantly higher value if tests are run right after a change that could break things is done and that requires designing the tests into a CI/CD pipeline. Many of those colleagues find barely a day a week for doing testing, when after running the tests nightly turns into optimized sets in the pipelines, and environments turn into dockerized orchestrated platforms where nothing changes in the infra without changing lines of code (Infrastructure as Code). 

I still work with testers who understand environments on the level I used to - recognizing that there is a different address for two different yet same test environments, with heuristics on what to pay attention to each. They, like me before I integrated a lot of this CI/CD pipelines stuff into my thinking, use environments with specific timing patterns to control the version they are experiencing in testing. They may design environments on the level of not allowing change, because installing means often unavailable or out of control we understand. These testers need to understand CI/CD as mechanism of publishing and scheduling, but go no further. 

Increasingly, I work with testers who design and enhance pipelines. While they don't need to do all the changes themselves, they need to read pipelines to see what goes on. Red has a reason, and drives their days of working to see where red is coming from. Majority of people in this group configure new jobs within the same realm of examples, and don't really take things further. Only some bring in new tools. But the new tools part, that is something people seem to love doing.

Then there are people who live in pipelines. Tests are placeholders and boxes, but they rarely have time to go and think about their coverage themselves. Working for the pipelines is the work. Making them run on better infra. Adding new tools. Upgrading the existing tools. Building all the machinery that could support the teams. These people, even with testing background, tend to call themselves devops engineers, to emphasize their attention to infrastructure and pipelines. 

When hiring for a tester, you may expect any of these levels of knowledge. A lot of people search for the middle ground. More and more, we expect people to come with the knowledge of what control pipelines give to your test environments and options of testing. 

And more and more, finding a balance where people know enough yet still manage to test not only build pipelines is what we seek.