A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

[Submitted on 6 Jun 2025 (v1), last revised 5 Mar 2026 (this version, v3)] Authors:Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Om Chabra, Sivaprasad Sudhir, Anna Zeng, Anton A. Zabreyko, Chenning…

Dataemia

Low-input deep learning platform for citrullinated peptide identification, autoantigen discovery and rheumatoid arthritis treatment stratification

Kacen, A. et al. Post-translational modifications reshape the antigenic landscape of the MHC I immunopeptidome in tumors. Nat. Biotechnol. 41, 239–251 (2023).Article  CAS  PubMed  Google Scholar  Kelly, S. D., Allas,…

Dataemia

How everyday foam reveals the secret logic of artificial intelligence

Foams appear in everyday life as soap suds, shaving cream, whipped toppings and food emulsions like mayonnaise. For many years, scientists believed foams behaved much like glass, with their tiny…

Dataemia

Golden Retriever genes linked to anxiety, aggression, and intelligence in humans

Researchers at the University of Cambridge have uncovered new insights into the emotional lives of dogs, helping explain why some golden retrievers are more anxious, energetic, or aggressive than others.…

Dataemia

AI turns x-rays into time machines for arthritis care

A new artificial intelligence system developed by researchers at the University of Surrey can forecast what a patient's knee X-ray might look like one year in the future. This breakthrough…

Dataemia

A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems

Large Language Models (LLMs) have rapidly evolved from text-only assistants into complex agentic systems capable of performing multi-step reasoning, calling external tools, retrieving memory, and executing code. With this evolution…

Dataemia

Accelerating Diffusion Models with an Open, Plug-and-Play Offering

Recent advances in large-scale diffusion models have revolutionized generative AI across multiple domains, from image synthesis to audio generation, 3D asset creation, molecular design, and beyond. These models have demonstrated…

Dataemia

Why You Should Stop Writing Loops in Pandas 

: when I first started using Pandas, I wrote loops like this all the time: for i in range(len(df)): if df.loc > 1000: df.loc = "high" else: df.loc = "low"…

Dataemia

Near-term Improvements and Long-term Convergence

[Submitted on 18 Oct 2025 (v1), last revised 5 Mar 2026 (this version, v2)] View a PDF of the paper titled Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements…

Dataemia

[2602.23008] Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

[Submitted on 26 Feb 2026 (v1), last revised 6 Mar 2026 (this version, v2)] View a PDF of the paper titled Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy…

Dataemia
error: Content is protected !!