If you lived in a company that relies heavily on A/B tests, you would create three variations and make releases available with each variation. A percentage of your users would get the red button, and similarly for the two other colors. You'd have a *reason* why the button is there is the first place. Maybe it is supposed to engage the users to click and a click is ordering. Maybe it is supposed to engage users to click and a click is just showing you're still active within the system. Whatever the purpose, there is one. And with A/B tests, you'd see if your users are actually clicking, and if that clicking is actually driving forward the behaviors you were hoping for.
So with your UX tests, everyone says red, and with your A/B tests, you learn that while they say red, what they indeed do is blue. People say one thing, and do another. And when asked on why it is the way it is, they rationalize. A/B tests exist to an extent because people being asked is an unreliable source.
What fascinates me around A/B tests is the idea that as we are introducing variation, and combinations of variation, we are exploding the space that we have to test before delivering a product for a particular user to use. Sometimes I see people trusting that the features aren't intertwined and being ok with learning otherwise in production, thus messing the A/B tests when one of the variation combinations has significant functional bugs. But more often I see people not wanting to invest in variations unless variations are very simple like the example of color scheme of buttons.
A/B testing could give us so much more info on what of our "theories" of what matter to user really matter. But it needs to be preceded with A/B building of feature variations. I'm still on the fence with understanding how much effort and for what specific purposes organizations should be willing to invest to really hear what the users want.