Reviewer3Reviewer3

    Newsroom

    Reviewer3 Partners with bioRxiv to Bring AI Review to Preprints
    Jun 11, 2026
    Reviewer3 Partners with bioRxiv to Bring AI Review to Preprints
    Today we're announcing a partnership with bioRxiv, the preprint server for biology. Starting now, bioRxiv authors can send their preprints directly to Reviewer3 for comprehensive AI review. Authors around the globe share work publicly for the first time on bioRxiv. Preprints democratize access to science and help researchers get timely feedback from the scientific community. But the literature is growing faster than the reviewers available to vet it, and AI-assisted writing makes it easier than ever to produce work that looks polished while hiding unsupported claims or fabricated citations. Rigorous and timely feedback is needed to vet an exponentially growing literature. This partnership gives bioRxiv authors access to that rigor from Reviewer3. What Reviewer3 Offers bioRxiv Authors When you send your preprint to Reviewer3, you receive a comprehensive technical review within minutes. Reviewer3 is a multi-agent, domain-specific review system built specifically for science. Specialized reviewer agents work in parallel, each augmented with custom tools for citation verification, statistical assessment, and claim and evidence verification. Together they flag unsupported claims, fatal design flaws, invalid references, and the other issues that bear directly on a paper's major conclusions. Reviewer3 focuses on objective, verifiable technical checks. It does not score manuscripts on novelty or significance. Our benchmark findings show that this kind of objective verification is exactly where AI contributes most to peer review, leaving questions of contribution and importance to human experts. Each comment is anchored directly to your manuscript and linked to evidence, with justifications for why it matters and actionable suggestions for improvement. In our survey of users, 90.6% of Reviewer3 comments are rated useful. Reviewer3 is benchmarked on 145,000+ human and AI review comments, and 87.9–93.2% of feedback is consequential to the major claims within a paper. How It Works Sending your bioRxiv preprint to Reviewer3 takes less than a minute and feedback arrives in your email inbox: 1. After submitting your preprint, go to your Author Area in the bioRxiv submission portal 2. Click "Submit Preprint to External Author Service" 3. Scroll to your paper and select "Reviewer3" from the dropdown 4. Click Submit and confirm Your manuscript and metadata will be transferred securely. You'll receive an email with access to your review. You can visit the link immediately and watch your results stream to you within minutes. Strengthening Your Work Before and After Sharing We recommend running Reviewer3 on your earliest drafts to catch and correct critical issues. Towards AI Review This partnership is the beginning of a broader vision. As preprint servers and journals experiment with AI-assisted review, we're building the infrastructure to support responsible adoption, with transparency, measurement, and continuous improvement at the core. We're grateful to the bioRxiv team for their commitment to innovation and their trust in this collaboration. Together, we're working toward a future where every researcher can share their best work with confidence. Ready to try it? If you have a preprint on bioRxiv, you can send it to Reviewer3 today through your Author Area. Or upload any manuscript directly to see what AI review looks like.
    Blog
    How One Researcher Is Opening the Door to AI-Assisted Peer Review
    Apr 17, 2026
    How One Researcher Is Opening the Door to AI-Assisted Peer Review
    Chi-Ping Day, a researcher at the National Cancer Institute, NIH, reviews a lot of manuscripts. When he found tools to support his review process, he brought them to his editors. What followed is a story about where peer review is headed. Starting the Conversation Dr. Day has spent years reviewing manuscripts. Like many experienced researchers, he's seen the volume of papers outpace the pool of qualified reviewers. The explosive increase in the number of new journals and the number of published papers has dramatically changed the landscape of manuscript reviewing. A researcher receives overwhelmingly more requests for review, and editors desperately search for reviewers, making the sense of academic recognition wane and eventually vanish for reviewers. Even a very motivated reviewer with a good spirit of community service can become fatigued quickly. Rather than quietly using AI tools or waiting for journals to issue formal policies, Dr. Day took a different approach: he asked editors directly whether he could use AI tools to assist his reviews. Many researchers actively use AI for writing and reviewing manuscripts, yet admitting it is still taboo. Why not propose the guidelines ourselves and use these tools openly? The response has been overwhelmingly positive. He's since received permission to use professional AI review tools, including Reviewer3. Editors have been receptive, recognizing that AI-assisted review can help address the growing backlog while raising the bar for quality. How He Uses AI in His Review Workflow When Dr. Day reviews a manuscript, he runs it through professional AI review tools like Reviewer3 alongside his own reading. It performs a technical audit of a research paper, identifying methodological gaps, verifying references and statistics, and surfacing logical inconsistencies. Each comment is linked directly to the paper, so he can quickly see whether the data support the major conclusions of the work. I use it to check whether I missed any technical issues. It saves me a lot of time determining the quality of a study. As we've shown in our benchmark, Reviewer3 emphasizes technical verification upstream of human judgment in peer review, while humans focus on contribution. Chi-Ping uses this function as a complement to his years of expertise. I prefer that AI review tools do not evaluate novelty or significance. It is the researcher's responsibility, based on their expertise and knowledge, to justify those for a study. Tools like Reviewer3 handle the technical verification — whether the data support the claims — so I can focus on the judgment only a human reviewer can bring. Transparency as a Model By getting explicit permission from editors, Dr. Day is establishing a transparent model for adopting these tools responsibly. He draws an analogy from Isaac Asimov, the mid-century science-fiction writer who imagined intelligent machines long before and proposed a set of guidelines for their safe development, the famous "three laws of robotics", noting that "only when people understand how a tool will be used can it be trusted, disseminated, and improved." For Dr. Day, the same logic applies to AI in peer review. He is particularly concerned about reviewers quietly turning to general-purpose models without any shared guidelines: In most cases, reviewers simply use general-purpose AI like ChatGPT or Gemini with personally written prompts to review a manuscript. This brings a lot of risks of hallucination, bias, and conceptual errors. The alternative, in his view, is for researchers to take the lead and adopt professionally trained review tools in the open. Reviewers do not have to passively take on tasks with the full burden on their shoulders. We can call for the right tools to improve the peer review process. Dr. Day's Proposed Guidelines To help other reviewers follow his approach, Dr. Day has shared the guidelines he uses in AI-assisted peer review. 1. Request permission from the editor. Respond to the review request letting the editor know you intend to use professional AI review tools under the conditions below. 2. Only use tools with appropriate data privacy protection. Manuscripts under review are confidential. Before uploading a manuscript, confirm the tool does not train on your inputs, does not retain the content beyond the review session, and is explicit about where the data is stored and who can access it. 3. Limit use to quality assurance. AI tools should be limited to assisting with copy editing , consistency of statements, statistical power in study design, selection of methods, and related checks. 4. Independently verify every AI-generated claim. Citing statements from an AI-generated review report is fine, provided that you personally approve or disapprove each claim or suggestion before including it. 5. Attach the AI-generated output to your final review report so editors can see exactly what the tool produced. Looking Ahead AI is already being used in peer review, often quietly and without disclosure. Dr. Day's worry is that if this continues, the damage to trust will be hard to undo. If AI in peer review remains taboo or is treated as black magic, the system will distort in silence until it collapses, as trust erodes among researchers, institutions, funding agencies, and the public. There is still time to act on transparency. Guidelines must be built, and AI use must be disclosed. Dr. Day's approach offers a concrete protocol for transparency around AI use: ask permission, use professional-grade tools, and make the process visible. The question is no longer whether AI will play a role in peer review, but how the community chooses to integrate it responsibly. Disclosure Dr. Chi-Ping Day is an employee of National Institutes of Health . He used the AI review tools only for his own research on ethics in AI. He declares no conflict of interest with the developers of these tools and does not endorse any commercial product mentioned in this article. His points of view do not necessarily reflect the views of the NIH or the US Department of Health and Human Services. Are you a peer reviewer? You can apply for free access to Reviewer3 in Journal Mode. Fill out the Peer Reviewer Access Form.
    Blog
    We Benchmarked 145,000+ Human and AI Review Comments Across Three Disciplines. Here's What We Found.
    Apr 8, 2026
    We Benchmarked 145,000+ Human and AI Review Comments Across Three Disciplines. Here's What We Found.
    AI review tools are emerging as a potential solution to our increasingly strained peer review system, but to date, there's no standardized way to evaluate them. We built ReviewBench, a venue-agnostic, extensible benchmark framework to evaluate human and AI peer reviews. We use it to compare human reviews with Reviewer3 , a multi-agent system, and two leading frontier reasoning models, GPT-5.2 and Gemini 3 Pro, across three disciplines: computer science , social science , and life science . The dataset consists of 145,021 review comments. AI Reviews Are More Structured A peer review comment can include a specification of the issue, a justification for why it matters, a remedy for how to address it, and an anchor to a location in the paper. Across all three disciplines, AI reviews are more structured than human reviews, with R3 leading on justification and actionability in every venue. !Justification Rate Justification rate per paper, by source across three venues. R3 Comments Are More Frequently Consequential For each paper, we extract 3–7 major claims and map each comment to this set. Comments can map to the same claim and stance but vary in impact. We define a consequential label for comments that, if correct, could undermine the mapped claim. R3 achieves the highest consequential rate across all three venues , compared to 73.7–81.4% for GPT-5.2, 65.5–77.5% for Gemini 3 Pro, and 60.0–67.7% for humans. !Consequential Rates Consequential rate by source across three venues. Agreement with Human Reviewers We compute stance-matched overlap: the percentage of human-addressed claims that the AI source also addresses with the same dominant stance. R3 and GPT-5.2 achieve similar overlap , with the highest agreement in Nature Human Behaviour. Per-paper rankings show R3 ranks first most frequently across all three venues. !Stance-Matched Overlap Stance-matched overlap fraction by source across three venues. !Ranking by Stance-Matched Overlap Per-paper ranking by stance-matched overlap fraction across three venues. Humans Critique Contribution, AI Verifies Human and AI sources provide different types of critiques. Humans focus on contribution and clarity, while AI sources devote more than half of their attention to validity and sufficiency, with R3 leading in validity across all three venues . !Critique Types Critique type distribution by source across three venues. What This Means These results point to a complementary model for peer review. AI systems can audit the technical validity of major claims at scale — R3 surfaces consequential issues at rates of 87.9–93.2% across disciplines — while human judgment remains essential for evaluating a work's contribution and broader value. The framework is venue-agnostic and extensible to any AI review system, and the full dataset and code are publicly available. Ready to see what AI peer review looks like? Try R3 on your own paper.
    Blog
    The Acceleration of Scientific Publishing
    Mar 20, 2026
    The Acceleration of Scientific Publishing
    Science is growing faster than previously documented. Hanson et al. report 47% cumulative growth between 2016 and 2022, from 1.92M to 2.82M articles, and 5.6% year-on-year exponential growth. We repeat this analysis with a fully open database, extend it through 2025 to revisit growth rates, and decompose by country and domain to explore relative contributions to growth. This analysis uses OpenAlex, an independent open database, filtering by DOI-indexed article types. All code is open source. OpenAlex data validates the Hanson et al. study, finding a comparable cumulative growth and a fitted exponential rate of 6.9%/yr. Growth is accelerating Extending the analysis through 2025 reveals that growth is accelerating. Mean year-over-year growth rate rose from 6.8% to 8.3% , with 2025 alone at +11.9%. The 2025 figure is the highest single-year growth rate in the DOI-indexed record since 2002. Table 1. Year-over-year growth rates since 2022. | Year | Articles | YoY Growth | |------|----------|------------| | 2022 | 5,701,659 | +3.8% | | 2023 | 6,108,201 | +7.1% | | 2024 | 6,464,543 | +5.8% | | 2025 | 7,233,887 | +11.9% | OpenAlex shows a +379.5% cumulative growth from 1.51M in 2000 to 7.23M in 2025. !Figure 1: Global article production, 2000–2025 Figure 1. Total DOI-indexed articles per year from 2000 to 2025, with fitted exponential trend . Social Sciences grew fastest while Physical Sciences added the most articles Grouping by domain, all four OpenAlex domains grew from 2015 to 2025, but at different rates. Physical Sciences added the most articles in absolute terms , but Social Sciences grew at the fastest rate . Domain composition has been relatively stable, with the notable exception of Social Sciences, which gained 4.4 percentage points at the expense of the other three domains. Table 2. Domain growth rates over the last decade. | Domain | 2015 | 2025 | Growth | Share 2015 | Share 2025 | |--------|------|------|--------|------------|------------| | Physical Sciences | 1.41M | 2.78M | +96.4% | 39.6% | 38.6% | | Social Sciences | 839K | 2.01M | +139.3% | 23.5% | 27.9% | | Health Sciences | 837K | 1.58M | +88.5% | 23.5% | 21.9% | | Life Sciences | 477K | 827K | +73.6% | 13.4% | 11.5% | <div style="display:grid;grid-template-columns:1fr 1fr;gap:1rem"<img src="/field_growth_absolute.png" alt="Articles by domain, 2000–2025" /<img src="/field_share_over_time.png" alt="Domain composition, 2000–2025" /</div Figure 3. Absolute article counts by domain over time , domain share of total output over time . China surpassed the US in 2022; the Global South drives the majority of growth In 2022, China overtook the US in number of articles and global share . - China's output has increased 43× from 27,173 articles in 2000 to 1,177,692 in 2025 - The US's output has increased 2.8× from 336,687 in 2000 to 947,239 in 2025 - The fastest growing countries in the 2015-2025 window are predominantly from the Global South: Indonesia , Pakistan , Saudi Arabia , Nigeria , China , Ukraine , Thailand , Hong Kong , India , Turkey - China's increase over this period only reflected the fifth highest growth rate, but its increase alone exceeded the total 2025 output of 198 of the 200 countries returned by the API query. Only the US and China itself produced more in total than what China added in the last decade. <div style="display:grid;grid-template-columns:1fr 1fr;gap:1rem"<img src="/top_countries_over_time.png" alt="Top 10 countries by output, 2000–2025" /<img src="/country_share_shift.png" alt="Global share of top 5 countries, 2000–2025" /</div Figure 4. Absolute article counts for the top 10 countries by output from 2000 to 2025 , and global share of total articles for the top 5 countries over the same period . Data Limitations Data were obtained from the OpenAlex API , which has a larger corpus size than Scopus or WoS . We limit to type:article, has_doi:true to restrict results to DOI-indexed works registered through providers such as Crossref and DataCite. Despite higher absolute counts than those of Hanson et al. , growth rates align suggesting comparable trends. Conclusion Growth in publication volume has accelerated. We report a 6.8% YoY growth rate for 2016-2022 which has increased to 8.3% in the years since . The 2025 growth rate of +11.9% is the highest in over two decades. Over the last 25 years, global article output on OpenAlex has grown 4.8× from 1.51M to 7.23M . The geography of science has also fundamentally shifted. US share fell from 22.3% to 13.1%, China rose from 1.8% to 16.3% and became the top producer in 2022, and the 10 fastest-growing systems are predominantly from the Global South. China's added output over the last decade exceeded every other country's total 2025 output, with the exception of only the US and China itself. The scientific community must confront these trends. Peer review, editorial systems, and knowledge synthesis hinge on infrastructure that was designed for a different literature composition and volume. These processes may be under strain as the system continues to accelerate and shift in its geography. Quality dimensions, such as citation impact, retraction rates, reproducibility, should be measured to understand how quality is impacted as volume grows. All data and code are fully open and reproducible in this repository. References - Culbert, J. H., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., & Mayr, P. . Reference coverage analysis of OpenAlex compared to Web of Science and Scopus. Scientometrics, 130, 2475–2492. - Maddi, A., Maisonobe, M., & Boukacem-Zeghmouri, C. . Geographical and disciplinary coverage of open access journals: OpenAlex, Scopus, and WoS. PLOS ONE, 20, e0320347. - Hanson, M. A., Gómez Barreiro, P., Crosetto, P., & Brockington, D. . The strain on scientific publishing. Quantitative Science Studies, 5, 823–843. - Priem, J., Piwowar, H., & Orr, R. . OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv:2205.01833.
    Blog
    90.6% of Reviewer3 Comments Are Rated Useful
    Feb 26, 2026
    90.6% of Reviewer3 Comments Are Rated Useful
    How do you quantify AI review quality? We started collecting feedback from researchers on Reviewer3 comments. Every comment in comes with a thumbs up or thumbs down feedback button. In the first 500+ responses, we found that 90.6% of Reviewer3 comments have been rated useful. !Individual Comment Ratings What About at the Paper Level? We wanted to understand this data at the paper-level. Within a given paper, what percent of the feedback is rated useful? We found similar results when we looked at the ratings within a paper. Across 155 papers, the average upvote rate per paper is 88.4%. !Average Ratings per Paper The Distribution Tells a Stronger Story The pie chart only shows us the average. What about the distribution by paper? We find that the histogram is heavily right-skewed: for most papers, 90-100% of comments are rated useful. The median also shows 100% of comments are rated useful within a paper. !Distribution of Upvote Rate by Paper What This Means These numbers help us understand how we are doing and where we can improve. For most papers, nearly every comment is considered useful by the researcher. We're continuing to collect feedback at the comment level and will keep reporting on these metrics as our dataset grows. If you'd like to see for yourself, upload your paper and rate the feedback.
    Blog
    Peer Review is Under Strain, Here's What We're Doing About It
    Sep 30, 2025
    Peer Review is Under Strain, Here's What We're Doing About It
    Peer review is under strain. Here's what we're doing about it. Peer Review is at a Breaking Point Manuscript submissions are growing rapidly. Over five million research articles are published annually. According to The Economist, the number of academic papers published each year has doubled since 2010. The largest traditional publishers—Elsevier, Taylor & Francis, Springer, Nature and Wiley—have increased their output by 61% between 2013 and 2022 alone. Meanwhile, there aren't enough reviewers to keep pace. Just 20% of scientists handle up to 94% of all peer reviews. Review times have extended to nearly five months. When journals send review invitations, only 49% are accepted. The reviewer pool isn't growing while manuscript submissions continue to rise, creating a fundamental sustainability problem for the peer review system. Can AI Help Scientific Peer Review? We were skeptical, at first. Peer review requires deep scientific reasoning. It demands expertise across experimental design, statistical analysis, and literature context. Could AI systems provide meaningful support for this process? Skepticism is what makes you a researcher. But so is the willingness to experiment. Since the alternative was a broken system with no clear path forward, we set out to do our first experiment. Skepticism is what makes you a researcher. But so is the willingness to experiment. Building Multiple Specialized AI Reviewers We built Reviewer3 with multiple specialized AI reviewers, each focused on a specific aspect of peer review: Study Design Reviewer evaluates scientific logic and experimental design. Does the data support the conclusions? Are critical controls missing? Reproducibility Reviewer assesses statistical rigor and reproducibility. Are the statistical tests appropriate? Can other researchers replicate this work? Limitations Reviewer reviews clarity, context, and literature. Are there missing citations? Is the work properly situated in existing research? We designed a multi-agent system with custom tools for each reviewer. Then we did what scientists do best: we started collecting data. 88% Rate Reviewer3 Better Than or Equal to Human After every review session, we asked users one question: Was this better, worse, or equal to human peer review? In a survey of 100 users, 88% rated Reviewer3 as better than or equal to human peer review. !User Feedback Comprehensive Feedback and Integrity Checks Reviewer3 operates in two configurations: Author Mode provides comprehensive feedback before submission, with three reviewers covering scientific logic, statistical rigor, and literature context. Journal Mode adds three specialized integrity checks: methodological review to identify fatal design flaws, prior publication screening to assess novelty, and security analysis to detect fraud and manipulation. | Reviewer | Focus | Author Mode | Journal Mode | |----------|----------|-------------|--------------| | Study Design Reviewer | Does the data support the conclusions? | ✓ | ✓ | | Reproducibility Reviewer | Are the statistics and methodology sound? | ✓ | ✓ | | Limitations Reviewer | Are limitations properly discussed? | ✓ | ✓ | | Fatal Flaws Reviewer | Are there fatal design flaws? | | ✓ | | Novelty Reviewer | Have authors published redundant work? | | ✓ | | AI Text Reviewer | Is this AI-generated? | | ✓ | Towards Sustainable Peer Review Reviewer3 isn't here to replace human reviewers. It's here to support them. To give authors better feedback, faster. To help journals manage the rising flood of submissions. And to make the entire process more sustainable. Unlike the status quo, we're measuring, iterating, and improving with every review. Ready to try it yourself? Visit our upload page and see what thousands of researchers already know: sometimes the best way to honor scientific skepticism is to run the experiment.
    Blog