• FooBarrington@lemmy.world
    link
    fedilink
    arrow-up
    9
    arrow-down
    3
    ·
    3 days ago

    First: that’s wrong, every big LLM uses some data cleaned/synthesized by previous LLMs. You can’t solely train on such data without degradation, but that’s not the claim.

    Second: AI providers very explicitly use user data for training, both prompts and response feedback. There’s a reason businesses pay extra to NOT have their data used for training.

      • FooBarrington@lemmy.world
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        2 days ago

        I mean - yeah, it is? This is a well-researched part of the data pipelines for any big model. Some companies even got into trouble because their models identified as other models, whose outputs they were trained on.

        It seems you have a specific bone to pick that you attribute to such training, but it’s just such a weird approach to deny pretty broadly understood results…

          • FooBarrington@lemmy.world
            link
            fedilink
            arrow-up
            3
            arrow-down
            1
            ·
            2 days ago

            No, it doesn’t. Unless you can show me a paper detailing that literally any amount of synthetic data increases hallucinations, I’ll assume you simply don’t understand what you’re talking about.

            • baines@lemmy.cafe
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              4
              ·
              2 days ago

              what paper? no one in industry is gonna give you this shit, it’s literal gold

              academics are still arguing about it but save this and we can revisit in 6 months for a fat i told you so if you still care

              ai is dead as shit for anything that matters until this issue is fixed

              but at least we can enjoy soulless art while we wait for the acceleration

              • FooBarrington@lemmy.world
                link
                fedilink
                arrow-up
                3
                arrow-down
                1
                ·
                2 days ago

                Yeah, that’s what I guessed. Try to look into the research first before making such grandiose claims.

                • baines@lemmy.cafe
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  1
                  ·
                  2 days ago

                  i know the current research, i know it’s going to eat your lunch

                  • FooBarrington@lemmy.world
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    2 days ago

                    Ah yes, and you can’t show us that research because it goes to another school? And all companies that train LLMs are simply too stupid to realize this fact? Their research showing the opposite (which has been replicated dozens of times over) was just a fluke?