How are statistics being applied in computer science to evaluate accuracy in research claims?

Question Detail: 

I have noticed in my short academic life that many published papers in our area sometimes do not have much rigor regarding statistics. This is not just an assumption; I have heard professors say the same.

For example, in CS disciplines I see papers being published claiming that methodology X has been observed to be effective and this is proved by ANOVA and ANCOVA, however I see no references for other researchers evaluating that the necessary constraints have been observed. It somewhat feels like as soon as some 'complex function and name' appears, then that shows the researcher is using some highly credible method and approach that 'he must know what is he doing and it is fine if he does not describe the constraints', say, for that given distribution or approach, so that the community can evaluate it.

Sometimes, there are excuses for justifying the hypothesis with such a small sample size.

My question here is thusly posed as a student of CS disciplines as an aspirant to learn more about statistics: How do computer scientists approaches statistics?

This question might seems like I am asking what I have already explained, but that is my opinion. I might be wrong, or I might be focusing on a group of practitioners whereas other groups of CS researchers might be doing something else that follows better practices with respect to statistics rigor.

So specifically, what I want is "Our area is or is not into statistics because of the given facts (papers example, books, or another discussion article about this are fine)". @Patrick answer is closer to this.

Asked By : Oeufcoque Penteano
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/1100

Answered By : Patrick87

As a graduate student in computer science, who has exposure to research in fields other than computer science, and whose research group works in an area of computer science where statistics can be fruitfully applied, I can offer my experience; your mileage may vary.

In general, even the most well-meaning scientific research can fail to rigorously apply statistical analysis to results, and it is my experience that this does not always preclude papers including such poorly-analyzed results from being accepted for publication. The area in which my group operates is mainly in distributed computing and high-performance computer architecture. Often, research involves experimental designs the performance of which cannot easily be understood analytically in the required detail. As such, empirical results are often used as evidence for claims.

Clearly, experiments should be design - and results analyzed - in such a way as to provide some confidence that the results are statistically significant. Most of the time, this is not done, even in some of the most important venues. When statistical analysis is applied, it is almost never rigorous in any meaningful sense; the most one typically sees (and one is glad to see it!) is that an experiment was repeated n times, for some arbitrarily-chosen n, where typically $1 < n < 5$. The selection of error bars (if any are indicated) seems to be mainly a matter of personal preference or taste.

In summary, no, it's not just you; and it's not just software engineering. In general, based on my experience, several areas of computing research seem to err on the side of not doing enough. Indeed, it might be even be detrimental to the viability of a submitted paper to dwell on statistical considerations. That's not to say that I find the situation satisfactory; far from it. But these are my impressions. For instance, you may take a look at section 5 of this paper, which was presented at Supercomputing 2011, one of the most high-profile conferences in the area of high-performance computing. Specifically, take a look at some of the discussion of results in section 5, and see whether you arrive at the same conclusions that I do about the rigor of statistical analysis of experimental results.

More generally, this shortcoming may be symptomatic of a condition in some areas of computing to publish more papers rather than fewer, to target conferences rather than journals, and to emphasize incremental progress rather than significant and fundamental improvements in understanding. You might consult this article, which provides valuable insight along these lines.

No comments

Powered by Blogger.