Research
Published work and things I tried that didn't necessarily work out.
Published
Calibration Failures in Retrieval-Augmented Generation Systems
2024-01We show that RAG systems exhibit systematic overconfidence when retrieved context contradicts the model's parametric knowledge, and propose a lightweight calibration intervention.
Read paper ↗Experiments
Things I tried. Including failures.
Using LLMs as automated paper reviewers
abandonedTried to use GPT-4 to pre-screen papers for a workshop. The reviews were superficially plausible but consistently missed domain-specific errors that any expert would catch. Abandoned after testing on 20 papers.