Imperfect Bioinformatics

Accept the imperfections of Bioinformatics

I wonder if you share the same feeling that the so-called bioinformatics mining, the so-called data mining, is actually dancing with shackles. On the one hand, we have to respect the basic clinical principles, and on the other hand, we have to respect the basic characteristics of the data and the basic requirements of statistical methods. Bioinformaticians actually dance under the big basic principles and try to respect the data and methods. Data will be updated, methods will be iterated, and clinical perceptions will change, so any conclusions and perceptions are temporary and imperfect. Therefore, to be a bioinformatician, one must accept imperfection.

The imperfection of the data

First, accept the imperfection of the data. For many disease projects, data collection is difficult, or follow-up information is always so incomplete, or there is some discrepancy in the sample source. But there is no better solution, at present we can only use these data, in the case of ensuring that the general direction does not go wrong, to dig out better results, to do a small step of exploration. A little progress is great, do not have to be entangled in what to do and how to do.

The imperfections of bioinformatics methods

Then, to accept the imperfections of bioinformatics methods. A lot of the bioinformatics algorithms and tools follow certain data characteristics, statistical distribution. Rashly using mismatched algorithms for analysis, the results can also come out, but this analysis is unacceptable in a sense. As the saying goes, the process is faulty and the results are not credible. However, the analysis from the purpose of biological data mining, in fact, is to achieve dimensionality reduction with the help of statistical principles, to achieve the simplification of the model, to facilitate better understanding of disease or species characteristics.
To put it bluntly, the bioinformatics method is a tool. To give an example, people can either choose to walk or drive to a remote place, but driving there seems to be a more efficient way when time is limited. What we need to do is to seek a balance under the premise of not so matching data and tools and common understanding to achieve the purpose of analysis.

The imperfection of the bioinformatics results

Finally, it is important to accept the imperfection of the bioinformatics results. In many recent projects, the prognostic model is very good and the validation effect is poor, or the validation effect is very good and the association immunotherapy effect is bad. In fact, for scientific research, poor results are normal, and good results are abnormal. A new idea appears, how can all be positive, there must be many times of removing the false and keeping the truth. So in the process of doing the project, even if the data is perfect, the method is perfect, there is still a problem of perception, the result is still not good. Exploring a bad result is actually a small push, such as proving that iron death has nothing to do with head and neck squamous carcinoma, etc., but they are not so willing to accept it. So the bioinformatics analysis itself is not an overnight thing, what we can do seems to be similar to doctors: occasionally positive, often debugging, always exploring. Bioinformaticians used to struggle that why many technologies, such as the third generation sequencing technology just came out with so many problems, but people are still willing to use, rather than wait for three to five years after the maturity of the technology.

Two related sentences thrived from Sir John Sulston’s book The Common Thread on the Human Genome Project can give the answer.

  1. All we can do is to try to do a little better in sequencing with the present technology, rather than waiting for the technology to improve, because it is found to be too late to wait.
  2. For sequencing or analysis, we are looking for reliability rather than absolute correctness.

Reference

  1. The Common Thread, Sir John Sulston & Georgina Ferry, Joseph Henry Press.
  2. International Human Genome Sequencing Consortium. Nature 409, 860–921 (2001).
Yu Zhu
Yu Zhu
MSc Drug Design student

My research interests include bioinformatics, computational biology and systems biology, especially in protein related biological problems.