More info on me

Thursday, July 28, 2022

Language change in Gherkin Experiment

I find myself a Gherkin (the language often associated with BDD) sceptic. The idea that makes other people giddy with joy on writing gherkin scenarios instead of manual tests makes me feel despair, as I was never writing the manual tests. The more I think about it and look at it, the more clear it is that the Gherkin examples when done well are examples rather than tests, and some of the test case thinking is causing us trouble. 

What we seek with Gherkin and BDD is primarily a conversation of clarity with the customer and business. When different, business-relevant examples illustrate the scenarios our software needs to work through, the language of the user is essential. 

In our experiment of concise-but-code tests vs gherkin-on-top tests, I find myself still heavily favouring concise-but-code. 

def test_admin_can_delete_users(
assigned_user: User, users_page_logged_in_as_admin: UsersPage, users_page: UsersPage
) -> None:
users_page_logged_in_as_admin.create_new_user(assigned_user.name, assigned_user.password)
users_page.delete_user(assigned_user.name)

I'm certain that the current state of fixtures has some cleaning up to do, but I can have a conversation also with this style in user's language. Before implementing, we talk in Friends format the one where admins can delete all users including themselves and after implementing, we have the format turned into something where just the name of the test(s) and main steps need to be occasionally looked at together. 

It is clear though this is not a format in which the users/business would already be writing our tests for us, and currently I am in a team where we have a product owner who would love nothing more than to be able to do that. There is also a sense of security for external customers if we could share them the specs that we test against that makes considering something like Gherkin worthwhile.

In this post, I want to look at the user/business collaboration created Gherkin for this feature in our project compared to the result that is running in our pipeline today. Luckily, the feature we are experimenting is so commonplace that we reveal absolutely nothing of our system by sharing the examples. 


The user/business collaboration generated examples started with two on creating new users: 
From: old

These turned to three by now. 
From: new

You can see a difference in language. Earlier we talked about users, where some of them get admin rights and others don't. Now, the language emphasis is on having admin and user. Also the new language isn't yet consistent on these, having admin and admin user used interchangeably in the steps. The new language also reveals a concept of default admin, which just happens to be one user's information on the database when it starts - clearly something where we should have a better conversation than the one I was around to document with the user/business collaboration session. 

The second one of the new also threw me off now that I compare it. I first reacted on admin's not being able to create users without admin rights - there is no admin without rights, those are the same thing. But then I realized that the it is trying to say Admin can create users that are not admins. 

Another thing that this sampling makes obvious is that two scenarios turned to three, and the split to creating admins and users makes sense more for a testing point of view than business point of view. Again, admin is just a user with admin rights. Any user could be an admin, permanently or temporarily.  

Similarly, it turned longer. Extra steps - deleting in particular - was introduced as step to each. Clearly a testing step made visible, and adding most likely not the essential part of the example for user/business.  

And it also turned shorter. Changing password, the essential functionality originally described vanished and found a new home, simplifying this scenario as it was also separately described in the original. 
From: new

From: old

With this one, steps exist for testing purpose only. They now describe the implementation. A common pitfall, and one we most definitely fell into: not illustrating the requirement as concisely as we could, but illustrating operating of the system as test steps. 

Interestingly, after the first example and mapping the original and the new, we have six things that originate with user/business, and only one that remains in the implemented side. 
From: new

We can find the match from the originals to this one. 
From: new


So what else got lost in translation? These three. 


From: new

The first two are the real reason why such functionality exists in the first place, and it is telling that those are missing. They require testing two applications together, a "system test" so to speak. 

The last one missing is also interesting. It's one that we can only do manually editing the database after no admin access, again a "system test". 

Having looked at these, my takeaways are: 
  1. Gherkin fooled us in losing business information while feeling like it was "easier" and "more straightforward". The value of reading the end result is not same as value of reading the business input. 
  2. Tests will be created on testing terms and creating an appearance of connection with business isn't yet a connection
  3. The conversation mattered, the automated examples didn't. 


Tuesday, July 26, 2022

Limiting what's held in my head

I'm struggling with the work I have now. For a long time, I could not quite make sense of why. I knew what I was observing: 

  • a tester before me assigned into this team could not find their corner of being useful in a year
  • a tester after me stayed less than a month
  • a tester before us all felt overwhelmed and changed to something else where work is clearer

This project repels testers left and right. Yet it has some of the loveliest developers I have met. It has a collaborative and caring product owner. But somehow it repels testers. And I have never before experienced projects that repel testers while having developers who love testing like these folks do. 

In the six months I have spent with this team, I have managed to figure out some of the testing I want to do and I've:

  • clarified each new feature for what is in scope and what is not in scope, and tested to discover when those boundaries are fuzzy
  • learned how to control inputs, the transformations happening on the way, and how to watch the outputs
  • shortened release cycles from years to months
  • introduced some test automation that helps me track when things change
  • introduced test automation other people write that exercise things that would otherwise be hard to cover, allowing for the devs to find bugs while writing tests
  • had lovely conversations with the team resulting in better ways of working
I have plenty of frustrations:
  • I can barely run our dev environment because I hate how complex we've made it and can almost always opt to avoiding working with tests like the rest of the team - it took me 5 months of avoiding and having all my code on the side, now putting some of it together and losing debug
  • We use linux because devs like it, but customers expect windows because they like it. Discrepancies like this can bite us later and they already do if you happen to join the team with a windows workstation (like all testers other than myself who changed to Mac recently)
  • Our pipelines fail a lot, and we're spending way too much time on individual branches that live longer than I would like
  • Our organizations loves Jira, and our team does not. That means that we don't properly fake using it. The truth is in the commits and conversations (great) but I feel continuously guilty for not living up to some expectation that then makes it harder for testers who think they can rely on Jira for info.
  • We have so much documentation that I can't get through it in a lifetime.
So I have done my share of testing, I live with frustrations, and navigate day to day by juggling too many responsibilities anyway. What am I really struggling with? The adaptations.

I have come to limit heavily what of the things the team discusses I hold in my head. I don't need all of it to do what I have set out to do. I model the users' and other stakeholders space and I care about tech for the interfaces it gives me for visibility. I model what ready looks like and how we can make it smaller. I don't accept all the tools and tech, because there's more than I can consume going on. I focus on what others may not look at on the system. 

I start my days with checking if there is anything new in master, and design my days around change for the users. I choose to mention many things over doing them myself. I limit what I hold in my head to have energy for seeing what others may have missed. 

There is no simple "here's your assignment, test it - write test automation code for it" for me, there is always someone (currently a junior I'm training up on programming/testing). Most test automation in this team requires diving in deep into the depths of the implementation. The work is intertwined, messy and invites often to deep end without focus time unless you create it for yourself. 

So I struggle: finding a tester to do this work seems increasingly difficult. Training newbies feels a more likely direction. And it makes me think that some of this is the future we see: traditional testers and test automation folks won't find their corner of contribution. 

This world is different and it calls for different focus - intentional limiting of what we hold in our head to collaboratively work while being different. 



 



Saturday, July 23, 2022

Optimising start of your exploratory testing

Whenever I teach people exploratory testing, we start the teaching experience with an application they don't yet know. Let's face it: exploring something you have no knowledge of, and exploring something you have baseline knowledge on are different activities. Unless we take a longer course or the context of the application you work with, we really start with something we don't know. 

I have two favourite targets of testing I use now. Both of them are small, and thus learning them and completing the activity of testing them is possible. In addition, I teach with targets that are too large to complete testing on, for the variety they offer. 

The first one is a web application that presents input fields and produces outputs on a user interface. The second one is code that gets written as we are exploring the problem we should have an implementation for. 

With both test targets, I have had the pleasure of observing tens of testers work with me on the testing problem, and thus would like to mention a few things people stereotypically may do that aren't great choices. 

1. Start with input filtering tests

Someone somewhere taught a lot of testers that they should be doing negative testing. This means that when they see an input field, be it UI or API level, they start with the things they should not be able to input. It is relevant test, but only if you first:

  • Know what positive cases with the application look like
  • Know that there is an attempt of implementing error handling
  • Specifically want to test input filtering after it exists
With the code-oriented activity, we can see that input filtering exists only when we have expressed the intent of having input filtering and error handling. With both activities, we can't properly understand and appreciate what is incorrect input before we know a baseline of what is a correct input. 

A lot of testers skip the baseline of how this might work and why would anyone care. Don't. 

2. Only one sunny day

Like testers were taught about negative tests, they were also taught about sunny day scenarios. In action, it appears many testers hold the false belief that there is one sunny day scenario, when in fact there's many, and a lot of variation in all of them. We have plenty to explore without trying incorrect inputs. We can vary the order of things. We can vary what we type. We can vary times between our actions. We can vary how we observe. 

There are plenty of positive variations for our sunny day scenario and we need to start with searching for them. When problems happen in sunny day scenarios, they are more important to address.

Imagining one sunny day leads also to people stopping exploring prematurely, before they have the results the stakeholders asking for testing have the information, the results they'd expect. 

3. Start with complex and long

To ground exploring to something relevant, many people come up with a scenario that is complex or long, trying to capture all-in-one story of the application. As an anchor for learning it's a great way of exploring, but it becomes two things that aren't that great: 

  • Scenario we must go through, even if that means blind sighting for what is reality of the application
  • Scenario we think we got through no matter what we ended up doing
I find that people are better at tracking small things to see variation than they are tracking large things to see variation. Thus an idea of a scenario is great, but making notes and naming things that are smaller tend to yield better results. 

Also, setting up a complex thing that takes long to get through means delay to finding basic information. I've watched again and again people doing something really complex and slow to learn about problems later they could have shown and had fixed early if they did small before large, and quick before slow. 

The question with this one is though - does speed of feedback matter? In the sense of having to repeat it all after the fix, it should matter to whoever was testing, and knowing of a problem sooner tends to allow motivation to fix it without forgetting what introduced the problem.  Yet better later than never. 

4. Focus on the thing that isn't yours

Many people seem to think exploratory testing should not care for team limits but the customers overall experience. Anything and everything that peaks the tester's curiosity is fair game. I've watched people test javascript random function. I've watched people test browser functionalities. I've watched people test the things that aren't theirs so much that they forget to test what is theirs. 

This is usually a symptom of not thinking at all in terms of what is yours - architecture wise. When you find something does not work, who will be able to address it? Your team can address how they use 3rd party services and change to a different one. Just because you can test the things that you rely on, does not mean you always should. 

I find that if we think in terms of feedback we want to react on and can react on, we can find more sense to information we provide for our teams. Yes, it all needs to work together, but if we are aware of who is providing certain functionalities, we can have conversations of reacting to feedback we otherwise miss.

5. Overemphasis on usability

Usability is important and since you are learning a new domain and application while exploring, you probably have ideas on it. Sometimes we push these ideas to the center so early that we don't get to the other kinds of things expected as results. 

This usually for me is a symptom of the "even a broken clock is right twice a day" syndrome, where any results of testing are considered good, instead of looking at the purpose of testing as a whole. It feels great to be able to say that I found 10 bugs, and it sometimes makes us forget if those 10 bugs had the ones that exist that people care the most for. 

Delaying reporting of some types of bugs and particularly usability bugs is often a good approach. It allows for you, the tester, to consider if with 4 hours of experience you still see the same problems the same way, and why is learning the application changing your feedback.

6. Explicit requirements only

Finally, my pet peeve of giving up control of requirements to an external source. Exploratory testing, in fact, is discovering theories of requirements and having conversations on these discovered requirements. Limiting to explicit requirements is a loss of results. 

There's a whole category of important problems exploratory testing is able to uncover - bugs of omission. The things that we reasonably should expect to be there and be true, but aren't. While we try to think of these in advance to the best of our abilities, we are at extra creative with the application as our external imagination, letting us think in terms of what we may be missing. 

With the code-oriented exploratory testing target, I have two dimensions of things the PO holds true and assumes the others would know, yet the others will come with completely different set of assumptions.

I'll leave this today with the action call for exploratory testing: 

Go find (some of) what the other's have missed! 

The recipe is open, but some ingredients make it less likely to do with great results. 


Saturday, July 16, 2022

Exposure does not pay the bills except that it does

We're making rounds again on discussing diverse representation in conference speaking - a topic near and dear to my heart. I very strongly hold a belief that conference speakers being such a tiny fraction of all possible speakers should represent population and future, and that requirement adds the work of organizers and increases the quality of the content. 

What does representing population and future mean? It means working actively against current systemic structural forces that make it likely to see all male lineups, and awfully pale lineups. It means working against the bias towards finding UK/US/Canada -speakers for appearance of greatness for fluency in language. 

I learned this week - again - that I am personally still not able to always think in terms of future since I defended a lineup in Germany on this day's European minority percentages instead of systematically requiring normalizing to global percentages on major events that set the face of what is normal and expected.

Pay to Speak

In terms of a concrete action conferences - *all conferences* - need to do is work against having to pay to speak. Pay to speak refers to money out of speaker's pocket for securing a speaking position and it comes in many forms. 

  • Having to pay your own travel and accommodation to a conference not in your home town
  • Having to take unpaid vacation from your work to be at conference as all employers don't allow you to clock in conference time
  • Having to pay an entrance fee to the conference to show up in whole event to continue conversations your talk and presence starts
  • Having your company become a sponsor so that you can speak even when your content isn't about the company's offering
Getting Paid In Exposure

We would like to not only have to not pay to speak, but to be paid for speaking. The hours we put in rehearsing, living in projects to speak from experience, improving ability to deliver talks, preparing this talk and delivering the talk are counted in fairly large time investments. Exposure does not pay the bills. Except that it really does, in long term. 

Long Term Benefits of Exposure 

When you get on a stage, it can be a very powerful calling card you are broadcasting to multiple people. It is particularly brilliant for socially awkward ambiverts like myself. People come and talk to me on topics I speak on. Their questions drive my learning deeper. Their experiences give my experiences counterexamples and diversify my approaches. And I take a better version of me back to work. 

Growing enough of that platform, it turns into money *in my pocket*. I get paid more for work than my peers because of the accelerated learning conference speaking and meeting people to learn with gave me. I get also sometimes paid extra to use the specific skills of speaking in public that people allowed me to rehearse on their stage. They carried the early risks of me flunking my talk on stage and annoying their paying audience

Exposure is what moves people in underprivileged groups to higher privilege. I should know. 

With my 28 talks currently scheduled for 2022 I can confidently say that I have - for this moment in time - become one of the speakers often requested in the field of software testing. Two of these session I am paid for, and one I pay to speak at. Only three of the sessions require travel. 

The Underprivileged

Investing into this exposure isn't that easy, and it is worse for the underprivileged. That is, finding time today to invest in your future is already a stretch. Finding money, relevant amounts of it when talking about international travel, even bigger stretch. 

So how do we help? 

  • When we can only pay fees or travel only to some of our speakers, we pay the underprivileged. And I don't mean the sappy "apply for scholarship and we pay you for your sad story" demeaning stuff, I just mean that choose underprivileged groups to pay them. Trust that if they have acquired privilege they will use some of their earnings to make world better for those who come after. To get to equal future, we need to do the opposite action to the structure. 
  • When we can't pay fees (the sum of fees is small compared to the side costs of fees), pay expenses. No business should rely on other people paying our expenses. 
  • Invite the underprivileged, not just to submit but to guaranteed speaking slots. They have enough of things that take the free time so you could shoulder more of the work, no matter how "equal" you think open CfPs are. Because they are not. Those with assistants have a lot more time in reaching out with their proposals. 
The Organiser Risk

Finally, we need to discuss a little bit the relationship of speakers and organizers. Because for speakers exposure can turn into money but for organizers, event income in general stops at the event. Some sell access to videos or "information banks" with slides, but that is not commonly yet a great source of income. Organizers don't get the exposure benefit, it is on speakers only. Like me riding on the fact that I have done 500 talks in 28 countries, the organizers enabling that get at most nice words and occasional promotion if I choose to do so. 

Organiser risk, on the other hand, is very real. RISK is a possible future event with a negative impact. It's all great when none of those realise but when they do, it's all on the organiser. 

Some risks that realized for me as an organiser are:
  • I paid two speaker's travel and hotel even though they never spoke at the conference
  • I received a death threat from an unhappy ticket holder who was using the conference as visa entry to Europe and when denied visa didn't like the cancellation terms *
  • I spend hundreds of hours in work post conference to legally profit share with speakers (insurance, taxes are a LOT OF work)
  • I lost a lot of money on two conferences for failing at marketing, leading me to avoid finalising paperwork and thus more financial losses with how insurance and taxes are in Finland. 
And a close one:
  • I almost had to opt out of my own conference and contribute from outside the venue because I would prioritise someone with a guide dog over my own presence with severe animal allergies. 
After all of this, I still recommend organising, but I also appreciate that the hours needed for organising to try to avoid risks that may still not be avoidable are much more than any hours I have used on speaking. I think we speakers give ourselves too much credit for our importance to conferences. 

To bring better understanding to comparison of how the money is shared, let's consider these:
  • Hours used to speak vs. Hours used to organize
  • Timeframe speaking keeps us busy (temporarily limited) vs. Timeframe organising keeps us busy (continuous, reactionary growing)
  • Costs incurred
  • Continuity that allows for future years of the same


* On death threat - didn't seem like I was really in danger with the distance. They bought a ticket, didn't come and wanted money back after the conference because they weren't allowed a visa. We had cancellation terms and all that but I get that losing 500 euros is a lot of money for some people in the world. I was ok then, I am ok now. I couldn't talk about it while the conference exists to not escalate the risk or to scare of paying participants. There's a lot that happens in organising that we do not see. 

Thursday, July 14, 2022

Four levels of running test automation

I work with many teams, and find myself facilitating a growth journey in test automation with regards to how we run it. It is fascinating in the sense that the conversation around coverage remains unchanged when we turn up the speed of feedback. Let's look at this a little more.

Level 0. Test automation runs on demand

You have an individual or a subteam implementing test automation for the system at hand, and as they are developing it, they run it. They notice something new is around for testing, they run it. And since their whole work centres around test automation, they run automation many times in a working day. 

The feedback of fails comes to the expert creating automation, and majority of the fails is what we consider maintenance - adjusting automation to fit the changed expectations with the product after confirming that this is in the neighbourhood of what we expected. 

Sometimes we joke around this saying this is not *automation* - there is always, always manual activity for running it. Manual when it fails. Manual when it succeeds. Fast-forwarding some actions in between with automation. 

Level 1. Nightly test automation 

You have a job somewhere that when called, triggers run of automation. Because triggers are complicated for various reasons, you have chosen the simplest trigger possible: time. And since probably your automation is growing from minutes to hours of execution time, it becomes natural that nights are the time when robots are doing some of the testing for you, with regards to whatever batch of changes were introduced during the previous day. 

You usually still have an individual or subteam with the purpose of starting their morning on checking all is well after the necessary maintenance, and you may even manage to get to green on most days - at least during the day when the necessary maintenance / corrective action was taken. 

When no one changes anything, this is automation. It doesn't fail because there is nothing to note with fails. It runs automatically and succeeds. It baselines your day. Manual when it fails and calls you to explore what changed in the day. 

We used to think this is what automation is. But the delay between the change and feedback introduces side effects. We may not understand why it is failing, and the work around figuring things out may grow significantly. And when corrective actions are needed, they are always delayed at least by that one day, causing context switching in the development team. 

Level 2. Test automation on merge to master

You graduate from time-based trigger to activity-based triggers, and the activity you choose is to test whenever a change is made available for the builds you give - eventually - to customer. This works really well if and when we do continuous small changes to master, and we may even end up turning it to block pipelines on failure so that we never deploy a thing that does not pass all of our tests. 

We may not be ready to wait for all the tests, and with failures from each merge, we can pinpoint in time the change that introduced fails, and it may already be helping us with both speed of feedback, granularity of feedback and avoidance of that specialist in team that always keeps an eye on automation when we all can expect our work as developers isn't done without a green pipeline. 

However, not all teams do so well on the small changes to master. With pull requests and branches, some teams make significant changes before bringing the changes to master, and end up again with that delayed feedback. Not works than with levels 0 or 1, but still relevant. 

Level 3. Test automation at developer's fingertips

The state towards we now are aiming for is moving all test automation to a position of being developer's productivity tools. We can't be done without quality in place (for varying degrees of quality), and if there is a subset of relevant feedback, we should have the control of running it and agreements of running it optimised for the developer's use of feedback. 

Being able to run test automation on demand in development environment is a part of this. But also, making sure it is run whenever a scope that allows for running it is changed automatically is necessary. We see this usually as per pull request runs of automation, knowing that the coded tests in various scopes pass before we propose to merge. 


Over the last two years, I have worked with multiple teams. One level 0 moved to level 1. One level 2 moved to level 3. Most teams on level 1 or level 2, no movement. And one team on level -1 (no automation), moved in last months through 0 to now level 1. 

Tuesday, July 12, 2022

The Five Year Heuristic

As I am celebrating my 25th career anniversary this year, I find myself looking back at what I have done and where my career has taken me, and what contributions I may have had my share on. There has always been a day-job at testing something, and the hobby of learning from it and everything available, in scale of hours that don't fit a regular workday. Also, would not be prioritised same given the power over to my employers. 

My hobby side of things has taken me to deliver that close to 500 sessions in 28 countries. It has kept the work side of things growing in ways I could not foresee, and lead in transformations of things I am interested in while still grounding on testing or rather testing both products and organizations, while embracing agency that allows me not only to provide information but to take part in doing something more with that information. 


Over 25 years, I have had themes I am more into but I have not yet managed to leave a single theme behind me, even if sometimes I feel like it has been long enough since I cared for agile transformations or organising acceptance testing in customer/contractor organizations that I would prefer to think of them as things of past. 

A quote also driving a lot of my thinking is:

“The future is already here – it's just not evenly distributed.
The Economist, December 4, 2003” ― William Gibson

The quote for me does not relinquish my agency, but reminds me that what I see in one place for software development is still future for another, and moving from organization to organization, project to project has a sense of time-travel. 

Maybe it is the dizziness with what is future in a world of invisible (software), but this idea of looking at what we do lead me also to coining a heuristic that helps me ground myself a bit on my ideas. With 25 years of experiences, I keep going back to the five year heuristic - of when the experience I am sharing happened. 

What being a new tester (or programmer) is today is entirely different experience than 25 years ago. But so is being a very senior tester in a new software project being recruited to a new organization. 

It is worthwhile asking for each story I share from my experiences of when it happened. While it made me the professional I am today, especially I find the five year heuristic a good one for experimenting. Anything I haven't tried in last 5, I could try again for new experiences and results. 

And I really wish some of the consultants who have been in this industry as long as I have, or longer, would also apply this rule and no longer live in the assumption that the experiences from an industry of a different era remain relevant. 

Our experiences need replenishing even when the foundations remain, to notice some of our built in assumptions that no longer have to be true. 

Friday, July 1, 2022

Testing on THEIR production

Many years ago, a candidate was seeking employment as software tester for a team I was interviewing for. The candidate had done prep work and tested the company's web site looking for functional, performance and security problems. They had caused relevant load (preventing other's from using the site), found functionalities that did not match their expectations and had ideas of possible vulnerabilities. They were, however, completely oblivious to the idea that other organisations production environments are available for *fair use* as per *intended purposes* and testing is not an intended purpose of production environments. They had caused multiple denial of service attacks to a site that was not built to resist those and considered it a success. We did not. We considered it unethical, borderlining illegal, and did not hire.

For years to come, I have been teaching on every single course that we as testers need to be aware of not only what we test, but where we test too. THEIR production isn't our test environment. 

When I discovered a security bug in Foodora that allowed me to get food without paying, I did my very best on not hitting that bug because I did not want to spend time on reporting it. THEIR production was not my test environment. Inability to avoid it lead to some folks in the security community speak poorly of me as I was unwilling to do the work but mentioned (without details) that such a problem existed, after I had done the work I did not want to do on helping them fix it. They considered that since I knew how to test (and was more aware of how the bug could be reproduced), my responsibilities were higher than a user's. I considered requiring free use of my professional skills unfair. 

What should be clear though: 

Other organisations' production is not your test environment. That is just not how we should roll in this industry.

When I teach testing, I teach on other people's software deployed to my own test environment. When I test in production, I do so because my own company asks and consents to it. When I test on other people's production, I do that to provide a service they have asked for and consented to. 

There are some parallels here to web scraping which isn't illegal. The legal system is still figuring out "good bots" and "bad bots", requiring us to adhere to fair use and explicitly agreed terms of use to protect data ownership. 

Building your scrapers and testing web sites are yet a different use case to running scrapers. When building and testing, we have unintentional side effects. When testing in particular, we look for things that are broken and can be made more broken by specific use patterns.

Testing on someone else's production isn't ethically what we should do even if legally it may be grey area. We can and should test on environments that are for that purpose. 

Regularly I still come across companies recruiting with a take-home assignment of automating against someone else's production. Asking a newer tester to show their skills by potentially causing denial of service impacts without consent of the company whose site is being tested is not recommended. Would these people have the standing to say no - most likely not. 

So today I sent two email. One to a testing contractor company using a big popular web shop as their test target letting them know that they should have permission to make their candidates test on other people's production. Another to the big popular web shop to let them know which company is risking their production for their test recruiting purposes. 

The more we know, the more we can avoid unintentional side effects but even then - THEIR production isn't your test environment. Stick to fair use and start your learning / teaching on sites with consent for such pattern.