Tether Expands AI Dataset with QVAC Genesis II, Reaching 148 Billion Tokens
Tether Data’s AI research division, QVAC, has unveiled QVAC Genesis II, significantly expanding its synthetic educational dataset for artificial intelligence pre-training. This latest release adds 107 billion new tokens, bringing the total to 148 billion tokens across 19 educational domains, according to Tether.
Enhancing AI Training
Building on the foundation of QVAC Genesis I, the new release covers 10 additional domains, such as chemistry, computer science, and machine learning. It also updates college-level physics using an advanced methodology. Together, Genesis I and II represent the most comprehensive synthetic educational dataset available to the public.
Innovative Data Generation
QVAC Genesis II introduces a novel data generation approach called Option-Level Reasoning. This method analyzes every answer option in multiple-choice questions, reinforcing correct reasoning and addressing common misconceptions. The approach aims to enhance clarity, causality, and decision-making in AI training data.
This complements the Failure Analysis method from Genesis I, forming a dual-method pipeline that ensures educational value in every generated question. Evaluations have shown that models trained on Genesis II data achieve higher reasoning accuracy and produce clearer answers compared to previous datasets.
Commitment to Open AI Research
Tether aims to shift the focus from volume to structure and reasoning in AI training. Paolo Ardoino, CEO of Tether, emphasized the importance of understanding over fluency in AI development. The dataset is available under a Creative Commons Attribution–NonCommercial (CC-BY-NC 4.0) license, supporting open, community-driven AI research.
The release aligns with QVAC’s mission to advance decentralized intelligence, allowing AI models to be trained and deployed without reliance on centralized cloud platforms. This approach seeks to lower innovation barriers and ensure accessible, high-quality AI training data for the global research community.
Further Information
The technical details of the dataset, titled “QVAC Genesis II: Expanding the Largest and Highest-Quality Multi-domain Educational Synthetic Dataset for Pre-training,” are available on the QVAC research blog. Additionally, researchers can access the dataset and models on Hugging Face.
Read More
Algorand (ALGO) Announces Winners of 2025 Startup Challenges
Jan 08, 2026 0 Min Read
TRX Breaks Bitcoin Correlation as Visa Partnership Sparks Rally
Jan 08, 2026 0 Min Read
Open Campus and Madhya Pradesh Partner to Digitize 50 Million Academic Records
Jan 08, 2026 0 Min Read
Litecoin Bulls Eye $93 Breakout Despite Bitcoin Correlation Concerns
Jan 08, 2026 0 Min Read
BSC's Fermi Hard Fork Slashes Block Times to 0.45 Seconds
Jan 08, 2026 0 Min Read