His test would create a call to the 3rd party app (diff tools in this particular case) and save it into a file and launch that 3rd party app. This would only pop up on failure. If the call had remained the same, he would only verify the call, but on1st time and later on failure he needed to see also that the 3rd party app did what was expected.
This was a much more advanced way of doing the semi-automated tests I found useful already a decade ago. This turned the semi-automated while testing once into automated for regression purposes. Surely it does not test "the real thing". But the abstraction it tests feels useful.
With all the tests I've been seeing, why haven't I seen more of this before? Have you?