Model Evaluation

We introduce a novel test for self-recognition in language models and show that current frontier LMs do not consistently recognize their own outputs. Models instead prefer answers they perceive as best, regardless of source.

Nov 1, 2024