• SirGolan@lemmy.sdf.org
    link
    fedilink
    arrow-up
    5
    arrow-down
    2
    ·
    1 year ago

    GPT4 with reflexion prompting gets 90% correct (for HumanEval coding benchmark). The paper this is based on is misleading at best.