Zero debate “Science”?

There have been many mega debates on various issues in science which became landmarks in the progress of science. The debate on interpretation of quantum mechanics, the Lamarckian-Darwinian evolution debate, the sociobiology debate, the correlation causation debate all have been among the celebrated ones. All of them were crucial in deciding the path towards better science, clearer concepts and passionate pursuit of truth.

Can you name any ongoing debate about any fundamental scientific issue? At least I can’t. But we hear umpteen number of fights about research misconduct, fraudulent publications, paper mills, accusations of plagiarism, demands for retraction and court cases based on any of these. Where are academia going?

I have several anecdotes, not about debates, rather about what should have been debated, but wasn’t. The biggest story is that of Type 2 diabetes. My group showed with multiple lines of evidence (including reproducibility confirmed experiments, sound mathematical models, epidemiological data, clinical trial data, exposing logical anomalies in existing theory) that the insulin resistance based theory behind type 2 diabetes was unsustainable, falsified and utterly wrong. The glucose normalization focus of treatment has failed in arresting diabetic complications because glucose is not central to diabetic pathology. Many clinical trials say this honestly. Others have claims of success which are easily turned down by simply looking at their raw data. We published this in the form of a series of peer reviewed research papers, critical reviews, many talks and conference posters, a book, many PubPeer challenges to misleading clinical trials. I am open to the possibility that I may be wrong. But someone needs to show what is wrong in our stand. What we saw instead was a complete silence. No argument, no debate, no challenge, no cross questioning. Just mum.

This is not the only story. I tried to initiate several debates on different issues, every time being careful about being sound, analytical, evidence supported, logically and mathematically rigorous. I have related some of these stories on my earlier blog posts. But nothing happened. In some cases, at a personal level the researchers whom I had directly criticized responded in personal emails. (Some, though not all, were kind enough to make me their raw data available, on analyzing which I could not support their claims.) Their responses often sounded like “explainawaytions” which did not satisfy me, but that is a minor issue. What intrigues me is that they were not ready for any open debate in the public domain. I had not tried using the PubPeer platform until 3 years ago, but of late I posted on PubPeer serious questions and issues about many publications in leading journals including Science, Nature Medicine, PLOS medicine, Lancet Diabetes and Endocrinology. The issues raised were seriously challenging the validity of their conclusions. Again, if I was wrong, they could have shown where I was wrong. If it was inadvertent errors, they could have corrected. In one of the letters to the editor I wrote, “If it is an inadvertent mistake, it would be appropriate to correct it to avoid misinterpretation by the reader. But if it is meant for intentional misleading of the reader, then it need not be corrected.” AND THEY DID NOT CORRECT!!  

I have dozens of such stories but the top ranking story is about the editors of Science. Holden thorp and Meagan Phelan wrote an editorial in Science on 13th Feb 2025 entitled, “Breaking the silence”. Wow. They were saying just what I had in mind. If anyone raises any serious issue about the science you published anywhere, you may or may not agree with the challenger but you need to respond. “Silence can be detrimental to public trust”, they said further. I could conclude from this editorial that it is not my experience alone that nobody responds. It is a very common phenomenon and that is detrimental to the spirit of science. Reading the editorial further was somewhat disappointing. I realized that they are not talking about engaging in arguments, debates and challenges; they are mainly focusing only on accusations of fraudulent science!! The mind boggling work of the science sleuths, about which I have admiration, has made an undesirable change. Now the organizations working for science integrity, with good intentions, make every retraction a news headline. The bad effect of this is that, any issue or question raised, discussion, debate or challenge is viewed as an accusation of misconduct. No, we need debates without smelling misconduct. “I think you are mistaken”, or “this analysis could have been done in a better way” is not an accusation. It is to be taken constructively. It should initiate a debate. Difference of opinion is an intrinsic part of science. But the ado about misconduct and retractions has changed the culture unfortunately. Just as the number of papers and JIF are dumb numerical additions to CVs; a PubPeer comment is a taken as equally dumbly negative “score”. If you respond to the comment, there will be more conversation on it and it will be flagged “seven PubPeer comments on this paper.” Without reading what the comments actually say, it will be taken as detrimental to reputation. This is perhaps one major reason why public debates don’t happen now.

But shouldn’t they? Can science exist without debates?

Nobody listens to me; that I can understand. But shouldn’t an editorial in Science make some difference? Over the months that followed, I had many more experiences that made me understand why nobody listens to the editors of Science. The editorial asserted clearly, “Science responds when questions are raised by the scientific community or the public about its published research papers and counsels authors and institutions to do the same, ensuring that legitimate concerns are addressed. This means being straightforward when there are problems while standing up for papers that are correct.”

On 24th June 2025 Thorp wrote another editorial to which I responded pointing out (but in somewhat sugar coated words) that your editorial is honest but is looking at the problem rather superficially. There is a need to go to the root causes of the academic problems. Science made my e-letter public but there was no response to the comments. On a subsequent editorial of 18th Sept 2025 also addressing research integrity I decided to be more straightforward and responded saying very directly that your thinking is truly very superficial. The causes behind research integrity problems are behavioural and the fundamental solution lies in redesigning academia and science publishing making them behaviourally sound. Eliminating bad incentives and ensuring that the cost-benefits of genuine science become more favorable than the cost-benefits of fraudulent science. Of late, many academic groups have been working on behaviour optimized system design. You may encourage them to design behaviour optimized academic systems and that would be a fundamental and long-term solution. I also said that Trump administration has created an opportunity to introspect. Whatever their political motives are, the opportunity to rethink is real. If this is missed, academic reforms will become impossible. This time they did not even make my letter public, forget about giving any thoughtful response. Perhaps my language was too honest, I mean too crude for Science to published. Let’s assume it was rejected for the language issue. Not for the point made.

Then came a report of a study initiated by Science editorial team itself. Jeffray Brainard wrote a story on this report which said that researchers from different countries and institutions have very different acceptance rates in Science. The most straightforward interpretation of this, along with many other well-designed studies is that this is because of peer review bias. But most editors of Science that Brainard interviewed including Holden Thorp kept on illuding peer review bias as the cause. I wrote a PubPeer comment, individually refereeing to what everyone specifically said, that they were playing ostrich. There is ample evidence that there is large peer review bias and you selectively ignore that evidence. A little prior to this incident there was another news article in Science where again the data indicated peer review bias but nobody even considered the possibility. I wrote a PubPeer comment on that too, exposing the cherry picking in both the studies and their news coverage in Science. The response to this was also complete silence.

This contrasts the editorial promise of 13th Feb 2025 that Science will be prompt in engaging with any kind of critical response. It will not avoid getting into a debate. But in reality it has also failed to break the silence. Now I know why nobody would follow the advice of science editors. They don’t really mean it. It’s only lip service.

I have a request to my readers. Can you suggest me a sophisticated and sugar-coated word for “hypocrisy”? That word is too honest for the field of academia and science publishing!!

Yes, it is intentional misleading. The authors and editors seem to agree.

When one finds problems in the statistical analysis of a paper, serious enough to invalidate the conclusions drawn, what should the reader do?

It is in the right spirit of science that you assume this might be because of oversight. To err is human and researchers are humans to begin with. If a reader notes serious problems in a paper, it is necessary to point it out with an expectation that the authors and/or editors respond. Science would welcome two types of responses. (i) They disagree with you. What you perceive as serious mistake is not really a mistake and they have a sound justification for what they did. If this is the case they need to justify sufficiently elaborately. Often the differences of opinion may not resolve. In that case both the sides need to be made transparent for the reader.  That is the responsibility of the editor. (ii) In case the authors realize that there is some problem with their analysis and inferential logic, they need to correct the analysis or clearly state the limitations of the inference by publishing a correction to the paper. Both these responses are completely in the spirit of science and should be welcome.

There are two more possibilities that are not really in the spirit of science. (iii) The third possibility is that the authors neither have a sound justification, nor the readiness to correct themselves, but the editors realize the gravity of the problem and decide to retract the paper. This happens quite often but not so easily. Retraction is treated as serious damage to the prestige and reputation by the authors, their institutes, editors and the journals. Therefore they try to avoid, postpone or cover up the problem. (iv) The forth possibility is that the conclusions drawn from the flawed analysis were published with a deliberate intention to mislead the readers. If this is so, they will certainly not publish any correction because it goes against the very purpose of publishing. They will not have any justification to what they did because it was misleading anyway.

Either for the third or the fourth reason, editors are reluctant to take any action or just keep on delaying the action until the paper is old enough and readers have lost interest in reading any correction, even if published. By the time popular science perception has accepted the misled direction and then it is tough to change it. The publication of correction is a low key event, the purpose of deliberate misleading is served until then. If the editors do not take any action the sleuths still have an option of posting the cross questions in platforms such as PubPeer. If the authors think they are not wrong, they can publish a rebuttal on PubPeer.  If they accept the problem they can publish a correction. If nothing of this happens we are left with the conclusion that the authors as well as editors deliberately intend to mislead the readers.

In certain fields of science misleading has a great benefit. The field of medicine, in particular, is prone to this because of the millions of dollars of possible profits involved. After spending huge amounts on developing a drug, if a clinical trial does not show it to be sufficiently effective, accepting the result leads to huge losses for the pharma industry. In that case making an impression that the drug is effective is necessary for the business. And it is not very difficult to mislead the medicine community, because either they do not understand the scientific method and inferential logic, or they simply do not have the time to waste in being careful about what they accept as science. People in academia certainly have the expertise but have no motivation. Their career progress depends upon how many high impact papers they themselves publish. Hardly any credit goes to exposing frauds in the field.  The peril of being busy in making a successful career is that a lot of pervert science gets published and nobody cares.

What are the most common ways employed in misleading people in the field of clinical trials? Particularly clinical trials for life-style related disorders. There seem to be a few common tricks repeatedly used.

  1. Multiple statistical comparisons without correction: Statistical inferences are probabilistic. There is always a small chance that your inference can be wrong. When you do hundreds of statistical tests in a single study, at least a few of them turn out to show “significant” results. But this could be just a result of having done a large number of tests each one with a small chance of being wrong. This is a well recognized problem in statistics. Solutions have been suggested, which are not free of problems. But not having a good solution is not a justification for hiding or disowning the problem itself. Most clinical trials in the field of lifestyle related disorders typically use two strategies to take advantage of doing multiple tests and then hide or disown the problem.
    • a. Register a large number of clinical trials. Not every trial gives results you want. Publish the ones in which you get favorable results, don’t publish those that give inconvenient results. In the field of type 2 diabetes clinical trials registered on https://clinicaltrials.gov/ , only about one third of the completed trial results have been made public. Two thirds remain silent about what they found.  
    • b. A given trial looks at a large number of outcomes, then categorize by sex, age groups, BMI groups and other possible subgroups. So the total number of tests typically performed are in hundreds and sometimes even in thousands. Then they prominently report the ones that are significant in the direction that they expect. It is almost guaranteed that a few will turn out to be significant by chance alone. This is enough to start beating drums that the drug is effective.
  2. Selective reporting and different format of reporting: Just as some of the tests turn out to be significant in the expected direction, a few are significant in the other direction. They do one of the two things when this happens. They either just don’t report the inconvenient ones or report them in a way that makes a misleading impression. For example they would use indices of relative risk reduction (such as odds ratio, hazards ratio) while reporting desirable effects of the drug; in contrast they use absolute risk reduction indices while reporting undesirable effects of the drug. Absolute risk reduction generally turns to have smaller looking numbers than relative risk reduction. So the reader is made to think that the good effects of the drug are large and bad effects small.
  3. Convenient subgroups and subtotaling: The subgroups are made by convenience and their results are selectively reported. For example they may make totals of many subcategories when that gives more expected picture, but report the subcategories separately when that gives a more convenient picture. The adverse events where they get expected results, are called “serious adverse events”, the ones in which results go in the other direction are called non-serious adverse events. This classification is not accompanied by any clear definition of the word “serious”. There are no standard guidelines on how to report statistics and in effect they take a systematically misleading path.

These are common tricks used for painting a picture that the drug is effective. Then they concoct more ways to mislead in a context specific manner. Very intelligently twisting data they reach conclusions that have been already pre-decided.  Sometimes, raw data are available or are made available on request. I analyzed the raw data in many such cases and my analysis did not support their conclusions at all. The published conclusions could not be reached without cherry picking or twisting the data in some way or the other, and there are so many ways in which data can be twisted.

I am giving below the details of three clinical trials published in very reputed journals where we detected serious flaws in the published analysis, we wrote to authors and editors and ultimately made our concerns public on PubPeer. To begin with we were open to all the four possibilities. The authors could have counter-argued to show that they were not wrong; they could have published the corrected versions; if authors did neither, editors could have retracted the papers. But none of these things happened.  Both authors and editors kept mum and did nothing except some promises that they will consider the cross questions seriously. From the response (or lack of it) of authors and editors we could differentiate between the above four possibilities.  After careful scrutiny in all the three cases we are compelled draw a conclusion that both authors and editors in all these cases clearly intend to mislead people. Perhaps this is quite representative of clinical trials. We have more reasons to believe that a significant proportion of clinical trials have been systematically fooling people all the time.

The first case is that of a paper in Lancet Diabetes and Endocrinology. Our pubpeer comment on it and the prior correspondence with the editors is at these two links.

The Pubpeer comments were published in August 2024. The authors had ample time to respond, but they didn’t.

The second case is about a paper in PLOS Medicine. This paper was clearly hiding inconvenient results although the raw data could reveal them. We wrote to the editors in Aug 2024 who initially responded positively and asked clarifications from the authors. We have no idea whether the authors responded because the editors suddenly stopped all correspondence. Ultimately we published our comments on pubPeer in February 2025 to which the authors have not responded. Links to the correspondence with PLOS medicine editors and the pubpeer comments respectively are here.

Recently with some colleagues we started looking at the GLP-1RA trials about which there is much brouhaha. They are being projected as wonder drugs effective against anything that you name on earth. Looking at raw data shows that these clinical trials suffer the same set of problems which make their inferences invalid.  On one of the papers published in Nature medicine and another on NEJM both from the SELECT trial we wrote comments on PubPeer in February 2025 and the authors haven’t responded as yet.

Interestingly to the letter in response to PLOS Medicine paper and the PubPeer comments on the Nature medicine paper after requesting correction we wrote, “However, corrections need not be made if the misleading is intentional since the purpose is served.”  Both the authors did not make any correction, nor counter-argued in defense of their analysis in any form. This is a clear admission that the misleading was intentional. Since the editors also did nothing, it is clear that even editors of such prestigious journals are interested in deliberately misleading people.

Perhaps intentional misleading is common across clinical trials. Earlier I published my analysis of multiple clinical trials related to type 2 diabetes. There were many responses to this article, all in the public domain, mostly supporting our arguments. Not a single response was from authors of the original papers (https://www.qeios.com/read/IH7KEP ).

If anyone has any doubt about intentional misleading, please contact the authors of these papers or editors of the respective journals for confirmation.

April 11, 2025

health, news, politics, politics, research, science

Bad science soaring high

Vulture populations have declined now but bad science has taken their place in soaring high up.

Science is supposed to have a set of principles and researchers are expected to follow them. Further science is said to be self correcting and open minded debates are important for self correction to happen. The human mind has multiple inherent biases and scientists are not exceptions. Therefore conscious efforts are needed to come over the biases and follow the principles of science. Often it is difficult to ensure this for an individual and others opinion helps. But much science being published today does not seem to believe in this. On top of the inherent biases there are bad incentives created by publish or perish pressure, bibliometric indices, rankings and the like. For several reasons peer reviews fail to give the expected corrective inputs and often they actually add to the biases. An open peer review system, open debate is the only thing that I can see that might minimize, if not eliminate bad science.

There is a vulture related story of bad science soaring high, but before describing that let me illustrate the background with some anecdotes. Using bad statistics, flawed logic, badly designed experiments, inappropriate analysis, cooked up data are just too common in many papers in high impact journals coming from elite institutions. Because of the high prestige of the journals and institutions, bad science coming from there enjoys impunity. Any cross questions or challenges are suppressed by all means. Here are a few of my own experiences.

In 2010 a paper appeared in Current Biology from a Harvard group. A little later the corresponding author emailed me asking my opinion because the paper contradicted our earlier PNAS paper. I was happy because adversarial collaboration, (if not collaboration, at least a dialogue) is good for science. So a series of email exchanges began. At some stage I said I would like to have a look at your raw data. They had no problem in sharing it. It was huge and they had used a program written specifically to analyze it. We started looking at it manually although it appeared to be an impossible task. But very soon we discovered that there were too many flaws in the data itself. The program was apparently faulty and was picking up wrong signals and therefore reaching wrong conclusions. We raised a number of questions and asked the Harvard group for explanations. At this stage they suddenly stopped all communication. They had taken initiative in starting a dialogue, but when their own flaws started getting surfaced, they stopped all communication. We got no explanation for the apparent irregularities in the data. Interested readers will find the original email exchanges here (https://drive.google.com/file/d/164Jo15ydGgmCL4XvAwvivpnYtjoAagMQ/view?usp=drive_link ). At that time Pubpeer and other platforms for raising such issues were not established. Retraction was very rare and we did not think of these possibilities. We didn’t pursue the case with the journal for retraction. The obviously flawed paper remains unquestioned till today.

In 2017 a paper appeared in Science claiming that cancer is purely bad luck, implying that nothing can be done to prevent it. This came from a celebrity in cancer research, but had a very stupid fundamental mistake. The backbone of their argument was a correlation across different tissues between log number of stem cell divisions and log incidence of cancer. They said a linear relationship between the two indicates that only probability of mutation matters. The problem with the argument is that a linear regression on a log-log plot means linear relationship only if the slope is close to 1. Their slope was far away and therefore it actually showed a non-linear relationship but they continued to argue pretending that there was a linear relationship. Later using the same data set we showed that cancers are not mutation limited but selection limited, an inference diametrically opposite to theirs (https://www.nature.com/articles/s41598-020-61046-7 ). But we had hard time publishing this because we were directly challenging a giant in the field.

A long standing belief is that in type 2 diabetes controlling blood sugar arrests diabetic complications. We do not find convincing support to this notion in any clinical trial going by their raw data. But many published papers still keep on claiming this repeatedly. For doing so they need to violate many well known principles of statistics which they do coolly and publish in journals of high repute. We challenged this collectively (https://www.qeios.com/read/IH7KEP) as well as specifically challenged two recent publications that had obviously twisted data. One of them was published in Lancet Diabetes and Endocrinology and the other in PLOS Medicine. The editors of Lancet D and E refused to publish our comments (first without giving reasons and on insisting they gave very flimsy and illogical reasons) and the other one is still hanging. We then opened a dialogue in Pubpeer to which the authors haven’t responded. The reason that the lancet D and E reviewer gave for rejecting our letter are so stupid that they cannot give the same defense on Pubpeer because it is open. In confidential peer reviews the reviews don’t have to be logical. They can easily get away with illogical statements. This is well demonstrated by this case. This entire correspondence is available here (https://drive.google.com/file/d/1XNzxif4ybJdgAQ4YmiKg_6mSqZGh2Mn1/view?usp=drive_link ).

The case of whether lockdown helped in arresting the spread of infection during the Covid 19 pandemic is funnier. Just a few months before the pandemic WHO had published a report based on multiple independent studies on to what extent closing of schools or offices, travel bans etc can arrest transmission of respiratory infections (https://www.who.int/publications/i/item/non-pharmaceutical-public-health-measuresfor-mitigating-the-risk-and-impact-of-epidemic-and-pandemic-influenza ). The report clearly concludes that such measures are ineffective and therefore are not recommended. After the pandemic began, within just a few months so many leading journals published papers showing that lockdowns are effective. The entire tide turned in just a few months. All the hurriedly published papers have several flaws which nobody pointed out because of fear of being politically incorrect. Our analysis, on the other hand indicated that lockdowns were hardly effective in arresting the transmission (https://www.currentscience.ac.in/Volumes/122/09/1081.pdf ).

Repeated waves of infectious disease are caused by new viral variants is a common belief that has never been tested by rejecting a null hypothesis that a wave and a variant arise independently and get associated by chance alone (https://milindwatve.in/2024/01/05/covid-19-pandemic-what-they-believe-and-what-data-actually-show/ ). Our ongoing attempts to reject this null hypothesis w.r.t Covid-19 data have failed so far. In the absence of rejection of an appropriate null hypothesis, repeated surges are caused by new variants arising is no more than a religious belief. But it still constitutes the mainstream thinking in the field.

I suspect this is happening rampantly in many more areas today. My sample size is small and obviously restricted to a few fields of my interest. But within that small sample I keep on coming across so many examples of bad science published in leading journals that I haven’t been able to articulate all together yet. Some of the other potential fields where statistics is being misused commonly include the clinical trials related to the new anti-obesity drugs namely the GLP-1 receptor agonists. All the flaws in diabetes clinical trials are present in these papers too. Worse is the debate on what is a good diet that can keep away diabetes, hypertension, CVD and the like. Perhaps the same is happening in many papers related to the effects of climate change, effect of social media on mental health, human-wildlife conflict and so many other sentimentally driven issues.  

Added to this series is now the paper by Frank and Sudarshan in American Economic Review (https://www.aeaweb.org/articles?id=10.1257/aer.20230016 ) which claims that the decline of vulture populations in India caused half a million excess human deaths during 2000-2005 alone. The paper was given news coverage by Science (https://www.science.org/content/article/loss-india-s-vultures-may-have-led-deaths-half-million-people ) months before its publication. Because Science covered it, it became a headline in global media. The claim has perfect headline hunting properties and even Science falls prey to this temptation. Interestingly the data based on which the claim has been made was criticized by Science itself a couple of years ago. That time for estimating Covid-19 deaths in India the death record data from India was dubbed unreliable now the same data becomes reliable when the inference can make attractive headlines. Other set of data used by the authors is equally or much more unreliable. Further the methods of analysis used are also questionable. Everything put together makes a good case for intentional misleading and all the reasons why I say this are available here (https://www.qeios.com/read/K0SBDO ).

What is more interesting is the way it happened. On realizing that the analysis in this paper has multiple questionable elements, we looked at the journal where it was intended to be published. Most journals have a section called letters or correspondence whether readers can cross-question, raise issues or comment in any other form on  a published article. American Economic Review where this paper was accepted for publication, did not have such norms. This is strange and very clearly against the spirit of science. Suppressing a debate violates the chances of science being self correcting. In the absence of a self correcting element science is no different from religion. So logically AER is not a scientific journal but a religious one, let’s accept that. Nevertheless we wrote to the editor that we want to raise certain issues. The editor advised us to correspond with the authors directly. We wrote to the authors. Unlike the Lancet D and E and other examples mentioned above, in this case the authors were responsible enough and replied promptly. There were a few exchanges at the end of which we agreed on a few things but the main difference did not get resolved. This is fair; disagreement is a normal part of science. But if a difference of opinion remains, it is necessary that arguments on both the sides should be made available to the reader.

The authors were also kind in making links to their raw data available to us. This entire correspondence is here (https://drive.google.com/file/d/1d91UzBCMAY9Q3Nu5Yc5_Iqm7ycWPmTWR/view?usp=sharing ). We analyzed their data using somewhat different but statistically sound approach and did not reach the same inference. It is normal that a given question can be addressed using more than one equally valid analytical approaches. Robust patterns give the same inference by any of the approaches. If it turns out to be significant with one approach but not by another it is an inherently fragile inference. It turned out so. We tried again with AER. They asked us to Pay 200 dollars as “submission fees” and that is the most interesting part of the story. The original peer review of the paper was obviously weak because none of the issues that we raised seem to have surfaced in that. I am sure the journal did not pay the reviewers. What we did was an independent peer review itself, and for this we were charged $ 200!! We paid in the interest of science although we knew that the journal will not publish our comments. But it was necessary to see on what basis it rejects. Apparently one reviewer seems to have commented on our letter. We had raised about 11 issues out of which the reviewer mentions only 3 and says that they are not convincing without giving reasons why they are not. No mention about the rest of the issues raised. This could be obviously because they had no answers to them. This is clearly cherry picking. Where the reviewers themselves are cherry picking, what else can we expect from published articles? For interested readers the entire correspondence related to the rejection is available here (https://drive.google.com/file/d/1mq2e8sKiYKUMNtUjQKnleaAPm3a8oDZO/view?usp=sharing ).

The authors seem to have incorporated some corrections in response to our correspondence (without acknowledging us, but that is a minor issue) but the main problem remained unresolved. Now we have independently made our comments public on a platform which is open to responses by anyone. This openness is the primary requirement of science and if that is getting blocked the entire purpose of publishing become questionable.

The importance of this incidence goes much beyond the vulture argument. It is about the way in which science is practiced in academia. Throughout all the stories mentioned above I am only talking about statistical mistakes that seriously challenge the inference. There can be many mistakes which do not change the major conclusions and I am not talking about them. I am a small man and have a very small reach. If in my casual exploration I came across so many cases of misleading science, how large the real problem would be?

Last year over 10,000 papers were retracted and 2024 is likely to have a much larger number. The most common problem detected is image manipulation and that is because a method to detect image manipulation is more or less standardized. Detecting and exposing fraud has become one of the most important activities for science. The Einstein Foundation Award going to Elisabeth Bik (https://www.einsteinfoundation.de/en/award ) for her relentless whistle blowing has endorsed this fact. Scientists exposing bad science are most valuable to science. But the number of retractions is still the tip of the iceberg and it is only for certain types of frauds. How many papers should be retracted for misleading statistics? How many for claiming causal inference without specifically having causal evidence? Nobody has any idea. A paper appeared just last week showing that as large as 30 % of papers published in Nature, Science and PNAS are based on statistical significance that is marginal and fragile (https://www.biorxiv.org/content/10.1101/2024.10.30.621016v1). Many others use unreliable data, cherry picked data, use indices or protocols inappropriate for the context or ignore violation of assumptions behind a statistical test. All added together, my wild guess is that two thirds to three forth of science published in top ranking journals (perhaps more) will turn out to be useless. I am not the first one to say this. This has been said by influential and well placed scientists multiple times (for example https://pmc.ncbi.nlm.nih.gov/articles/PMC1182327/ ).  

In spite of the dire state of affairs bad science continues to get published in big journals, coming from elite institutions. Why does this happen? I think the reason is that career and personal success overpowers the interest in science. Scientists are more interested in having high impact publications in their names rather than revealing the nature of reality, addressing interesting questions, ensuring robust and reproducible outcomes and seeking really useful solutions. Headline hunting has priority over getting true insights. This they do at any cost and by any means. Even flagship journals like Science seem to be more interested in making headlines than supporting real science.

The other question is why readers get fooled so easily? I think two factors are involved. First academics themselves have no incentive to check reproducibility, access raw data and do independent analysis etc. This type of work doesn’t give high impact prestigious publications and therefore it is a waste of time for them. Whether the quality of science is affected is no more a concern for academics. What academics will never do can be done by citizen scientists. But academic publishers do anything and everything to deter them from being vigilant. Look at the AER behaviour who charged us $ 200 in order to make a futile attempt to cross-question misleading science. Citizen scientists are by default treated badly as an inferior or untouchable caste. The individual cost benefits of initiating a healthy debate are completely discouraging. As a result science is viewed as a monopoly of academics and others are effectively kept away. The net result is that science is losing trust. I no more consider science published in big journals and coming from elite institutions as reliable without looking at raw data. But it is not possible to look at raw data every time, so I will advise my readers to simply stop believing in published science. Giving it only as much importance as to social media posts is the only option left. Academia treats peer reviewed articles as validated howsoever biased and irresponsible peer reviews may be. This is a matter of faith just as much as believing water of Ganges as pious, howsoever polluted it may be.  

I will end this article with a gratification note. I am extremely thankful to those who published bad science, especially all the examples above which I could study in details. I am primarily a science teacher and teaching the fundamental principles and methods of science is my primary job. Illustrating with real life examples is the most effective way of teaching. All the above mentioned papers have been giving me live examples of how not to do science. I share these papers to make the students realize. I hope, at least the next generation of researchers receive this training at an early stage.

“Nature” on citizen science vs the nature of citizen science:

The 3rd October 2024 issue of Nature has an editorial on citizen science (https://www.nature.com/articles/d41586-024-03182-y). It has some brilliant and successful examples of involving volunteers outside formal academia in doing exciting science. But unwritten in the article are the limits of citizen science as perceived by academia. On the other hand, I have examples which go much beyond what Nature Editors see. Whether to call them successful or not the readers can decide by the end of this article.

In all examples that the Nature Editorial describes, volunteers have been used as free or cheap skilled labor in studies that mainstream academics designed; for kind of work that needed more manual inputs; where AI was not yet reliable; hiring full time researchers was being unaffordable; they could save time and money by involving volunteers.

In contrast I have examples where citizens’ thinking has contributed to concept development; to design and conduct of experiments, where the problem identification itself is done by citizens; where novel solutions are perceived, worked out and experimentally implemented by people formally unqualified for research; where citizens have detected serious errors of academics or even exposed deliberate fraud by scientists. I would certainly say that this is far superior and the right kind of use of collective intelligence of people. What citizens can’t do is the formalism of articulating, writing and undergoing the rituals needed to publish papers where academics may need to help. But in several respects citizens are better than academics in pursuing science.

I have described in an earlier blog article the work that we did with a group of farmers during 2017-2020 (https://milindwatve.in/2020/05/19/need-to-liberate-science-my-reflections-on-the-scb-award/). This started with a problem faced by the farmer community itself, to which some of us could think of a possible solution. Thereafter farmers themselves needed to understand the concept, design a working protocol based on it, taking it to an experimental implementation stage and maintain their own data honestly. Then back to trained researchers who analyzed the data, developed the necessary mathematics etc. By the time this was done I had decided to quit academia and my other students involved in this work also had quit for different reasons. The entire team was outside academia when the major chunk of work was done and we could do it better because we were free from the institutional rituals. This piece of work received an international award ultimately. Here right from problem identification farmers, including illiterate ones, were involved in every step except the formal mathematics, statistical analysis and the publication rituals.

In January 2024, I posted on social media that anyone interested in working with me on a range of questions (including the ones they themselves have) may contact me. The response was so large that I couldn’t handle so many people. I requested that someone from the group should take the responsibility of coordinating the group so that maximum use of so many interested minds can be made. This system did not take shape as desired because of many unfortunate problems coincidently faced by all the volunteering coordinators themselves. But a few volunteers continued to work and a number of interesting themes progressed. They ranged from problems in philosophy and methods of science to identifying, studying and handling problems faced by people.

One of the major patterns in this model of citizen science involves correcting the mistakes of scientists writing in big journals, some of which we suspect were intentional misleading attempts. For example, we came across a paper in The Lancet Diabetes and Endocrinology (TLDE) which was a follow up of an interesting clinical trial in which using diet alone they had claimed substantial remission of type 2 diabetes in one year. Their definition of diabetes remission was glucose control and freedom from glucose lowering medicines. After 5 year follow up they claimed that the group under the diet intervention who achieved remission by the above definition had significantly low frequency of diabetic complications. When we looked at their raw data, it certainly did not support their conclusion. They had reached this conclusion by twisting data and cherry picking on the results. Peer reviews never look at such things if it is coming from one of the mainstream universities. This is not a baseless accusation, there is published data showing the lop-sided behaviour of peer reviewers.

The true peer reviewers need to be the readers. But in academia nobody has time to read beyond the name of the journal, title and at the most abstract. The conclusions written at the end of the abstract are taken as final by everyone, even when they are inconsistent with the data inside. This is quite common with the bigger journals of medicine. The reason academics are not interested in challenging such things is that it takes a long time and painstaking efforts by the end of which they are not going to get a good original publication. The goal of science has completely changed in academia and the individual value of publishing papers in big journals has completely replaced the value of developing insights in the field. Since anyone in academics cannot do the job of preventing misleading inferences, citizens have to do it. Citizens can do what academics can’t because number of papers and journal impact factors don’t shape their career anyway. Citizen science should focus on doing things that people in academia cannot or may not. That is the true strength of citizen science. Since people in academia seem to be least bothered about the increasing fraudulent science, citizens outside academia will have to do this.

In this case, after redoing statistical analysis ourselves, we wrote a letter to the editor of TLDE, who responded after a long time saying that the issues you raised appear to be important and she will send the letter to the authors to respond. Then nothing happened for a long time again.  On sending reminders the editor responded saying that our letter was sent to a reviewer (no mention of what the authors’ response was) and based on the reviewer’s views it was rejected. The strange thing was that the reviewer’s comments were not included in the editor’s reply. After insisting on seeing the reviewer’s comments they were made available. And amazingly (or perhaps not surprisingly) the reviewer had done even more selective cherry picking on our issues. He/she gave some sort of “explanawaytions” to some of them. For example we had raised an issue that when you do a large number of statistical tests some are bound to turn out individually significant by chance alone. Therefore just showing that you got significance in some of them is not enough. This is a well known problem in statistics and there are solutions suggested. The reviewer said something to the effect that the solutions suggested are not satisfactory for us and hence we pretend that the problem does not exist!! The reviewer completely ignored the issues for which he/she did not have any answer. So the reviewer was worse than the authors. Then we published our comments on Pubpeer (https://pubpeer.com/publications/BB3FA543038FF3DF3F83B449F8E5AA) to which the authors never responded. This entire correspondence with TLDE can be accessed here (https://drive.google.com/file/d/16zjYPeKcz0JEnlrjSXP4p1QUimdBEPFy/view?usp=sharing). The absence of author response and the fully entertaining reviewer response makes it clear that the illogical statistics was intended to mislead and not an oversight.

Two more fights are underway and I will write about them as soon as they land up here or there. Either the paper needs to be retracted/corrected or our comments published along with the paper. But this will be detrimental to the journal as well as author reputation, so it is very unlikely. A more likely response will be that they will simply reject our comments or do nothing about anything. In either case I will make the entire correspondence public. In recent years a large number of papers are being retracted (over 10,000 in 2023, perhaps much more in 2024). A large number of them are because of image manipulation. But that is because the technique of detecting image manipulation is there now. I suspect a much greater number needs to be retracted for screwing up statistics with intentional misleading, or simply to get the paper accepted. Who will expose this? In my view this is beyond the capacity and motivation of academics and therefore this should be a major objective of citizen science.

I have no doubt that many people outside academia can acquire the skill-set to do so. All that is needed is common sense about numbers. Technical knowledge about statistical tools is optional. Most of the problems in these papers were the kind of misuse of statistics that a teacher like me tells the first year students not to do. In the quality of data analysis the scientists publishing in big journals are inferior to our first year students. I have seen many more examples of this earlier.

Detecting frauds in statistics is not difficult, but the further path is. The system of science publishing has systematically made the further path difficult. In a recent case, a paper had fudged data and had reached misleading conclusions in very obvious ways. The peer reviewers should have detected it very easily, but they failed. When a group of volunteers pointed out the mistakes, reanalyzed the raw data showing that the inferences were wrong; the editors said – submit your comments through the submission process and the submission process includes a $200 submission fee. I am sure the journal did not pay the earlier reviewers anything. And when someone else did a thorough peer review, he/she is being penalized for doing a thorough job!! This is how science publishing works.

In a nut shell, many in academia are corrupt and citizen scientists are likely to do much better science. But academics know this and therefore hurdles are purposely being created so that their monopoly can be maintained. The entire system is moving towards a kind of neo-Brahmanism where common man is tactfully kept away from contributing to knowledge creation. Multiple rituals are created to keep people away effectively. The rituals in science publishing are increasing for the same purpose. I am sure this was the way brahminical domination gradually took over in India. Now the entire world of science is moving in the same direction. Confidential Peer review and authors charges are the two tools being effectively used for monopolization. There is a need that citizens become aware and prevent this right at this stage. I see tomorrow’s science safer and sound at the hands of citizens than with academia. This is the true value and true potential of citizen science. Since academia is actively engaged in suppressing this kind of citizen science, we the science loving common people need to take efforts to keep it alive.

On redesigning academia

I am not only a critic of academia, I have also been working constructively to design an alternative system that is based on the foundations of human behaviour. Behaviour based policy and system design in a relatively novel concept but many academics have started talking about it and certain behaviour based system designs are implemented on a pilot scale and some even in real life. Interestingly none seems to have thought about behaviour based design of academic systems. I made an attempt in a document that I have opened up for everyone here (https://drive.google.com/file/d/1G7Ugv0Wo4gONBQsoTaX_-ggUgTH4ju6A/view?usp=sharing ). This is for sharing, with or without credit. Plagiarism or any version of it is also welcome, all that matters to me is that it is shared widely and read with interest. It is not necessary to agree with everything. In fact a wide and open minded debate on every possible platform is most welcome. I only expect that the debate is not only based on opinions and anecdotes. I have tried to support my arguments with data, whenever possible. The debate also needs to go on the same lines. Fortunately today there are many published studies that are useful for this purpose. Wherever there are gaps in data, let that also surface so that someone may be stimulated to collect data and provide better evidence.

My perspective is mainly from India, so the document addresses the problems of Indian academia mainly. But most of it is applicable for any non-mainstream science country and much of it is applicable to the mainstream as well.

The document first describes the serious flaws, malpractices, misconducts, bad incentives, imbalances and unfairness in the academia as of today. Science appears to have been monopolized by a handful of power centers and its dissemination throughout the world is prevented by the design of the science support systems themselves. The ideal structure of science support systems should be such that good science can be done and published from any corner of the world. The prevalent structure of academia is far removed from this ideal. You have to be a part of the publication mafia (not my words) in order to get published in a prestigious journal. There is published evidence that in academia, most decisions are made without reading the contents of scientific papers/proposals. There is published evidence that peer reviews are inherently biased, flawed and favor the imbalance of power. This is taking the field of science rapidly away from diversity towards more of a stereotyped system and career path.

The document then tries to go to the behavioural roots of these. This is not a conspiracy. It is an effect of having a system in place that is easily drifted from the collective goal towards personal selfish goals. There is an underappreciated but clear and direct conflict between what is good for science and what is good for a successful career.

Having diagnosed the causes, the document then suggests an alternative system that is based on the principles of human behaviour. If a system is designed for some ideology and expects people to mold themselves with the ideology against their nature, the system is bound to fail. A system that eliminates or minimizes the difference between individual optimum and collective optimum is a robust system. A system that coerces individuals to accept ethical norms that conflict their personal gains is a badly designed system. A system that works smoothly towards the intended goal when every individual behaves selfishly is a well designed system. The system I suggest here would minimize, if not completely eliminate the biases, imbalances and defects and facilitate a good and equitable science culture globally.

Why did I write this, being in no delusion that it will bring about any change? To quote from the document itself, “But I cannot imagine myself not writing this when I can clearly see a flawed system, when I know I can diagnose what is wrong and can also see alternative design that is behaviourally sound and correctly incentivised. I have nothing to achieve by spending time and energy on something that will not even be noticed by the mainstream. But I made a statement earlier that there is a mindset that will study, investigate and innovate without any incentive, without any output, returns or rewards. This effort is a demonstration that yes, such a mindset exists and academia need to take efforts to select such minds rather than select “intelligence” and incentivise it with rewards for proxies of success which is bound to corrupt the entire system.” Read and debate on any platform that you like. Feel free to criticize, but only after reading it carefully.

Academics: Mend your house first!!

Behavioural and Brain Science is a journal that publishes theme article along with invited commentary from multiple individuals in the field. For those who believe in impact factors, the IF of this journal was 29.3 for 2022. Last year an article by John et al (2023) was accepted by the journal, published online with a call for commentary. The article was about what they called proxy failure, which is not a new phenomenon but the authors articulated different aspects of it quite well. Often it is necessary to quantify the success of something and the further path is decided by this measurement. When the goal itself is difficult to measure, some proxy is used to reflect the progress in reaching the goal. This might work initially but often the proxy becomes more important than the goal itself and then shortcuts to the proxy evolve that may sideline the goal. The system then is likely to fail or derail because of the proxy. The authors illustrated this with several examples from biological, social and economic sector.

What struck me immediately was that the biggest example of proxy failure is research under Universities and Institutes that are supposed to support research. The original article had only a passing mention of academia. I wrote a commentary on this article focusing on proxy failure in academia, which was accepted and is now published. Since the original article had a word limit, I am giving below a little more elaborate and non-technical version of the article. The original with cited references is available at  https://doi.org/10.1017/S0140525X23002984.  

A very well known example of proxy is exam scores. They are supposed to reflect the understanding of a subject. But typically the proxy becomes the goal and all education is oriented towards scoring higher in the exams. The same happens in research. Research papers are to be written whenever something new and exciting is to be shared with others. But today published papers has become a proxy to one’s “success” in research. Getting jobs, promotions and all depends upon how many papers one publishes and where. So inevitably publishing papers is prestigious journals has become the goal of research. In education there is much awareness, realization and thinking so that there are individuals and institutions specifically focusing on education beyond exam centered coaching. But this level of thinking is absent in research and hardly anyone focuses on addressing this problem.

 I feel it is necessary to deal elaborately with proxy failure in academia for two reasons. One is that proxy failure has reached unprecedented and unparalleled levels in academia leading to bad incentives. So much so that we can easily identify consequences of proxy failure far ahead of what the authors describe in various other fields. The authors describe three major consequences of proxy failure namely proxy trademill (An arms-race between agent and regulator to hack and counter-hack a proxy), proxy cascade (In a nested hierarchical system, a higher-level proxy constrains divergence of lower-level proxies) and proxy appropriation (goals at one level are served by harnessing proxy failure at other levels). At least three more advanced levels are observed in academia that might be difficult to find in other fields.

Proxy complimentarity: In this, more than one types of actors benefit in different ways from a proxy and therefore they reinforce each others’ dependence on the proxy resulting in a rapidly deteriorating vicious cycle. Since prestige of a journal is decided by the proxy namely citations of its papers and the impressiveness of the CV of a researcher is decided by the impact factors of the journals, the two selfish motives complement each other in citation manipulation. Citation manipulation has become common because it is a natural behavioural consequence of a system relying on proxies and not only because some researchers are unethical. It is extremely common and inevitable that reviewers pressurize the authors to cite their papers and the authors agree in return of paper acceptance. The fact that this is a common practice is revealed by data in published systematic studies. Institutions and funding agencies are benefited by the citation based proxies since bibliographic indices lead to a pretense of evaluation saving the cost of in depth reading of a candidate’s research. Reading has a high cost, but a selection committee can (and mostly does) make a decision without reading a candidates work, thanks to the proxies. Such mutually convenient positive feedback cycles can potentially drive rapid deterioration of the goal. This is becoming the norm so rapidly that now nobody even thinks there is anything wrong in evaluating someone without actually reading their work.

Proxy exploitation: This is another inevitable phenomenon in which apart from the agents in the game optimizing their own cost-benefits, a party external to the field achieves selfish goals by exploiting  prevalent proxies in the field. In academic publishing profit making publishers of journals thrive almost entirely on journal prestige as the proxy. Editorial boards appear to strive more for journal prestige than the soundness and transparency of science. This was evident in the eLife open peer review debate. The members of the editorial board who opposed the change in editorial norms, never said open peer reviews would be bad for science. They said it will reduce the prestige of the journal, which for them was obviously more valuable than the progress of science itself. More prestigious journals often have higher author charges and thereby make larger profits with little contribution to the original goals. That’s why the journal appears to prestige matter more than the progress of science.

Predatory proxy: This might be the most advanced and disastrous form of proxy failure where the proxy devours the goal itself.  The authors of the original article described the process of proxy appropriation, where the higher level goal does a corrective hacking of lower level proxies. For example, the marketing team might use the number of customers contacted as a proxy of their effort and this proxy can be bloated easily. But in business, the higher level player directly monitors the goal of profit making and accordingly controls proxies at lower level. This does not work in academia since the higher level organizations themselves do not have an objective perspective of the goal. The goal of progress of science is not directly measurable. As a result not only the proxies are used to evaluate individual researcher, they might often be confused with the progress of science itself. Here clearly the proxy has replaced the goal itself.

In many fields of science highly complex work involving huge amounts of data and sophisticated methods of analysis are being published in prestigious journals adding little real insights to the field. For example in diseases like type 2 diabetes, in spite of huge amount of research being published and funds being allocated, there is no success in preventing, curing, reducing mortality or even addressing the accumulating anomalies in the underlying theory. All that we have are false claims of success of any new drug, which get exposed when anyone looks at raw data. A number of papers exposing all this fraud are already published. Nevertheless large numbers of papers continue to get published, huge amount of funding is allotted which by itself is viewed as “success” in the field. Researchers publishing in high impact journals get huge respect and funding although the disease keeps on increasing in prevalence and the society has not benefited by the research by even a bit.

Failure of achieving the goal in not a crime in science, but quite often the failure is disguised as “success” and researchers receive life time “achievement” awards. Such awards have been given for diabetes researchers. No scientist receiving any such awards appears to have admitted that they have actually failed to “achieve” the real goals. Efforts of a researcher, failed by this definition, should still be appreciated but it should not be called “success” or “achievement” just because they published papers in prestigious journals. The worst outcome of proxy failure in academia is the failure to identify research failure as failure. Many other fields including theoretical particle physics or string theory have received similar criticism. Much intellectual luxury is getting published without adding any useful insights in the field. It is published in high prestige journals and therefore is called success although it contributes nothing useful or insightful.

In the last few years many papers have demonstrated that the creativity and disruptive nature of research has declined substantially. Interestingly this decline is evident even when it is measured by proxies. The three outcomes of proxy failure are most likely to be the reason for this decline in real scientific progress. Simultaneously the frequency of research misconduct, data fabrication, reproducibility crisis, paper mills, predatory journals, citation manipulations, peer review biases and paper retractions are alarming and are on the rise. The blame for this cannot be thrust on some individuals indulging in malpractice. This is the path the system is bound to take by the principles of human behaviour.  The structure and working of academia pretends that human behaviour does not exists, there are only ideals and goals. An academic system that ignores human behaviour can never work because the epistemology engine runs entirely on the human fuel.

Interestingly, many researchers are working today on aspects of human behaviour, behavioural economics, behaviour informed system design or behavior based policy. This is a thriving field. Even noble prizes have been given in behavioural economics, for example. All this is potentially relevant to academia but researches in these fields avoid talking about the design of academic systems. The academic system is the nearest, most accessible and most relevant system to be studied. This is the second important reason why studying proxy failure in academia needs to be prioritized. However, research addressing behavioral aspects of academia is scanty and fragmentary and not yet even close to addressing the haunting questions at a system level. What academia have at the most done is having an office for monitoring research ethics, which hardly appears to prevent misconduct. Unless researchers address the issues of behaviour based system design in their own field and come out with sound solutions; unless they redesign their own systems to make them behaviorally sound and little prone to proxy failure, unless they are able to minimize flaws and make the system work smoothly towards the goals, why should other fields follow their advice to redesign their systems? When I read anything about behaviour based policy, the natural first reaction of a citizen like me working outside mainstream academia is “Researchers, mend your house first!!”