Having decided to only collect some of the data — to take a sample — the question then becomes: which data? How do you choose?
The basic principle is that the sample should be representative of the population. But this is not always desirable.
Suppose you are an insurance company and you’re investigating insurance claims for fraud. It may make sense to focus on the higher value claims, since these are the ones that cost the company the most to pay. Or suppose you’re doing a customer satisfaction survey: it may be sensible to focus on the customers who spend the most money — they’re the ones you want to keep the most, since they’re the ones making you the most profit.
Political considerations may make this kind of deliberate bias unacceptable. Racial profiling is generally regarded as unacceptable, for example.
Another example of a bias that can lead to positive outcomes is survivorship bias. Perhaps the canonical example of this is the lesson learned from damage sustained by aircraft during World War II.
I’ll let Wikipedia tell the story:
During World War II, the statistician Abraham Wald took survivorship bias into his calculations when considering how to minimize bomber losses to enemy fire. Researchers from the Center for Naval Analyses had conducted a study of the damage done to aircraft that had returned from missions, and had recommended that armor be added to the areas that showed the most damage. Wald noted that the study only considered the aircraft that had survived their missions—the bombers that had been shot down were not present for the damage assessment. The holes in the returning aircraft, then, represented areas where a bomber could take damage and still return home safely. Wald proposed that the Navy reinforce areas where the returning aircraft were unscathed, since those were the areas that, if hit, would cause the plane to be lost