• 5 Posts
  • 65 Comments
Joined 2 years ago
cake
Cake day: June 11th, 2023

help-circle


















  • vatlark@lemmy.worldtoTechnology@programming.dev*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    20
    ·
    edit-2
    3 months ago

    That’s a clever way to setup the test.

    The LLM got 9%-38% worse at the task when the correct answer was changed to be “none of the others” (ie: all of the answers were wrong).

    I’m curious how humans perform against the original questions and the modified questions, because humans are the benchmark.

    “All models are wrong, but some are useful” is a quote from before LLMs, but it still applies.