Yeah, and a big part of the manipulation you mention can also be a result of sampling bias. Unfortunately there is no perfect way to get this kind of data; we have a finite amount of time and resources (manpower/cash), so we need to extrapolate from a sample. This is why even a large number in a sample might still yield garbage info. I have a friend who's a research chemist who spends a lot of of his time ranting about standard deviations, since that is apparently how his company measures the likelihood of a batch (of experiments? medications? who knows!) being bad. Though I can't recall any specific numbers from him.
For fun, I looked at my notes from a managerial statistics class which gives some basic definitions of sampling methods. Too bad there were no hard numbers in it:
Simple random sample: Every member of the population has an equal chance of being selected.
Advantages: Simple to design and analyze, Difficult to mess up
Disadvantages: Can lead to high variance, Might be costly
Stratified random sample: Divide population into strata according to a certain trait, then do a simple random sample proportionally from each strata according to strata size.
Advantages: Representation of key traits proportional to population.
Disadvantages: Wrong results if incorrect strata, Might be more costly
Cluster sample: Divide the population into clusters homogeneous with respect to the population. Perform a simple random sample on one (or a few) clusters.
Advantages: Less costly to perform
Disadvantages: Bad results if cluster is biased
And then of course we have how data (even GOOD data!) is presented - can anyone see what's screwed up about the below examples? Personally, I think this is where a lot of misinformation - directed to the public - about covid-19 will come from, but it will hopefully have less of an impact on the actual companies doing their research.