ZDNet's Ed Bott this morning reveals that Chitika's industry trends probably are flawed. This is the ad-server company that frequently publishes headline-inducing research reports, such as iPad usage falling 7% after Christmas.
Mr Bott investigates the firm and concludes, "Put all those pieces together and you don't get a picture of a company whose data should be trusted on its face." But Mr Bott makes his own error in the middle of his piece, Why you should be skeptical of Chitika's market-share reports when he writes:
As any statistician could tell you, simply having a large sample size doesn't mean you get valid conclusions. Garbage in, garbage out. Doubling or tripling the amount of data just makes for a bigger pile of garbage.
Any statistician will tell you that the larger the sample size, the more accurate the result. When you hear a poll result on the radio, the number has to include the number of people surveyed and the plus-or-minus of the accuracy. (Those of us who took statistics know this as the Chi factor.) The larger the sample size, the smaller the plus-or-minus.
(Statisticians have a formula that tells us how large the sample size needs to be for a specified accuracy of result. But larger sample sizes have to be balanced by polling companies against the higher cost of interviewing more people.)
Chitika claims to track a quarter-million sites, but Mr Bott found that 50% of them are dead or junk sites. This means that Chitika's sample size is very small, compared to the billions of Web sites that now exist. It's the small sample size that creates the problem.
When sample size is small, poll results vary wildly. This any statician will tell you. This is why Chitika's results vary wildly. This is why temperature results varied wildly for climate researchers using core samples from only seven trees in Siberia (later found to be the core of one tree).
Chitika made use of the big swings in its survey results to help market itself, because wide-eyed tech reporters naively believed the company's wild-eyed press releases. Thank you to Mr Bott for aiming your keen eye on the company's inconsistencies. But the corrected formula reads as follows:
Small data set = garbage out.