Google accused “commercially motivated” actors of trying to clone its Gemini AI after indiscriminately scraping the web for its models.

  • WatDabney@sopuli.xyz
    link
    fedilink
    arrow-up
    185
    ·
    6 days ago

    Google has become a colonialist project.

    First they gained access to the communsl property of the internet. Then they stole it from the original inhabitants. And now they’re trying to claim a legal right to exclusive control over the property they stole.

      • Big Baby Thor@sopuli.xyz
        link
        fedilink
        arrow-up
        6
        ·
        6 days ago

        AI DRM. It’s coming. All outputs will only be available to the browsers that support it. Also, future clipboards will be tied in.

        Ctrl+V

        “I’m sorry, but you don’t have permission to output that into this application. To apply for a licence click here.”

        • tyler@programming.dev
          link
          fedilink
          arrow-up
          1
          ·
          5 days ago

          You will never stop computers from being able to copy what is shown on the screen. Right now you can go in and just disable copy paste blocking in your browser if you really wanted. It’s just javascript.

  • Bustedknuckles@lemmy.world
    link
    fedilink
    arrow-up
    50
    ·
    6 days ago

    “My output is valuable, proprietary, and demands remuneration; my inputs are fair use and of negligible valuable”

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    36
    ·
    6 days ago

    It’s mental. The terms and conditions of some AI music generators will make people pay for a “license” to use the output for example for commercial purposes. They themselves of course claim “fair use” and steal all the music out there to train their models. I think some companies now don’t claim ownership any more, for images and video snippets. And of course AI output isn’t copyright-able in the first place.

    The companies will occasionally use their trademarks, intellectual property or copyright against people. Of course those rules don’t apply the other way around. It’s completely fine their product draws all Disney princesses, comic and anime characters and reproduces half of Harry Potter. But beware someone names something with “Claude” in the name. Of course Google follows the same logic with this.

    And then my homepage gets hammered with their stupid AI crawlers, but I have to abide by the terms and conditions of their services…

    • queermunist she/her@lemmy.ml
      link
      fedilink
      arrow-up
      7
      ·
      6 days ago

      Supposedly, copyright needs to be defended or it is lost. It would never happen, but it’d be interesting if the companies allowing data scrapers and chatbots to violate their IP actually destroyed their own claim to copy right protections.

      • Logi@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        6 days ago

        Supposedly, copyright needs to be defended or it is lost.

        No, that only applies to trademarks, not patents or copyright.

      • hendrik@palaver.p3x.de
        link
        fedilink
        English
        arrow-up
        3
        ·
        6 days ago

        Yes. I don’t think it’s settled yet. There’s still many trials going on. The industry still tries to push the limits, including really weird stuff like Elon Musk probing if it’s okay to allow deep-fakes of random existing women and minors. I think lawmakers are having a difficult time to keep up with the pace. AI companies drown them with their near unlimited resources. We need to come up with new regulation. Fight all the court battles, overhaul copyright and discuss things in society… And then there’s preexisting influential structures, like Disney, the copyright industry… Sometimes they’re on opposing sides, sometimes they dabble in AI as well… I mean it’s complicated. And a long process. And it’s difficult to defend things. I mean I also defend my server. But it’s more an open war than anything with rules and terms.

        • queermunist she/her@lemmy.ml
          link
          fedilink
          arrow-up
          5
          ·
          6 days ago

          I think lawmakers are having an easy time accepting bribes from AI companies, actually. The pace is only a problem because they are being paid to slow down.

          The courts are more interesting, because they actually have to make decisions instead of just deliberating forever.

          • hendrik@palaver.p3x.de
            link
            fedilink
            English
            arrow-up
            4
            ·
            6 days ago

            Depends a bit on the country. In the United States, for sure. That’s just open corruption and you scratch my back and I’ll scratch yours. The government funnels $500bn taxpayer money into some Project Stargate, and God knows how much into really dark stuff with Palantir. Musk even “worked” for the government for a while… And next to the corruption money, these people are buddies. And they’re all working towards the same goal. Some idea of an apocalypse.

            In China, I don’t think they need to bribe the government. It was the CCP who came up with the idea in the first place. And the AI race between China and the USA is yet another thing.

            For Europe, I’m not so sure. There’s a bit more nuance here? I mean Ursula von der Leyen is an AI shill as well. She frequently likes to talk about it. I don’t think there’s as much open bribery, though. And I still hope they’re aware of the situation with US companies, how we diverge in our goals, and partnering with Palantir or X is likely going to end us up in a lot of pain… And the EU loves to regulate. And our own AI companies aren’t as big. So there’s that as well.

            • queermunist she/her@lemmy.ml
              link
              fedilink
              arrow-up
              3
              ·
              6 days ago

              China is an interesting inversion of the US. In the US, the government is invested in the AI race because they’ve been bribed and because the money line go up. In China, the government was invested in the AI race before the bubble started to inflate and is instead pushing its own companies to invest in AI. Basically: in the US markets are in command, in China politics are in command.

              It’ll be really interesting to see how the two countries respond to the bubble bursting.

              As for Europe, there’s been some murmurings about tech sovereignty which are really exciting to me. They need to get out of US tech, whether that means they put a lot more focus on building European AI firms or they just get out of AI entirely.

    • Grimy@lemmy.world
      link
      fedilink
      arrow-up
      3
      arrow-down
      9
      ·
      6 days ago

      Being pro-copyright is giving the keys to record companies though. They would be the only ones with a “legal” model. Udio got bought by universal not too long ago but as long as laws aren’t rewritten for the benefit of mega corps and copyright juggernauts, open source will ruin all the shenanigans they are trying to pull.

      It’s the same for all the text models. Open source is destroying openais business model. They need laws that restrict what you can train on so they can buy themselves a monopoly.

  • sp3ctr4l@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    30
    ·
    6 days ago

    Wow, they’re seriously saying this with a straight face, huh?

    Oh hi Google, my name is Epic Games, and I see your recently trained a new ‘AI’ of yours on Fortnite.

    Let me introduce you to my friend Fromsoft, who is pretty sure you uh, copied their notes from Dark Souls as well.

    … How are all these people this fucking stupid?

    There’s no possible resolution to the paradigm of ‘I can steal everything but you can’t steal anything’ other than total chaos.

    Total chaos ain’t a good standard for a legal system trying to figure out IP law.

    This is completely ludicrous.

  • OwOarchist@pawb.social
    link
    fedilink
    English
    arrow-up
    12
    ·
    6 days ago

    You see, big tech AI bros? This is why you’re dumb. Even if this all pans out and all your AI dystopia dreams come true, it doesn’t mean you’re going to be rich and powerful and at the top.

    If your AI becomes as good as it’s supposedly going to get … I can just ask it to develop a new AI for me. And then I don’t have to use yours anymore. Why would anybody pay you to use your AI when it becomes trivial to make a new one, tailored to their specific needs? Why would I need your big tech company for anything if anything you can provide could be readily replaced by just asking an AI for it. If AI becomes good enough to replace everyone’s job, it will replace big tech as well.

    The only people who might be benefiting from all this are the ones who manufacture and sell the hardware that runs it. If AI becomes good enough, all software companies will go bankrupt. Yes including Google, Microslop, etc.

    • wonderingwanderer@sopuli.xyz
      link
      fedilink
      arrow-up
      7
      ·
      6 days ago

      You can already self-host an open source LLM, and fine-tune it on custom datasets. Huggingface has thousands to choose from.

      The largest you’ll probably fit on consumer hardware is probably 32 billion parameters or so, and that’s with quantization. Basically, at 8-bit quantization, you need 1GB RAM for every billion parameters. So a 32 billion parameter 8-bit model would need 32GB RAM, plus overhead. At 16-bit it would need 64GB RAM, and so on. A 24 billion parameter model with 16-bit quantization would take up 48GB RAM, etc.

      The commercial LLMs that people pay subscriptions to use an API for tend to have like 130-200 billion parameters with no quantization (32-bit). So it wouldn’t run on consumer hardware. But you honestly don’t need one that big, and I think they actually suffer in quality by trying to overgeneralize.

      For most people’s purposes, a 14 billion parameter model with 16-bit architecture is probably fine. You just need 28GB of free RAM. Otherwise, on 14GB RAM you can do 14B params at 8-bit, or 7B at 16-bit. You might lose some accuracy, but with specialized fine-tuning and especially retrieval-augmented generation, it won’t be severe.

      Anything smaller than 7B might be pushing it, and likewise anything at 4-bit quantization would lose accuracy. 7B at 8-bit would also probably suffer on benchmarks. So realistically you’ll probably need at least 16GB of RAM accounting for overhead. More if you want to run any concurrent processes.

      The thing about making one from scratch though, is that it’s resource-intensive. You can try generating a 1 billion parameter model with blank or randomized weights, the algorithm isn’t a secret. But pre-training it could take weeks or months depending on your hardware. Maybe days if you have a high-end GPU. And that’s with it running non-stop, so you can imagine the electric bill, and the task of keeping your system cool.

      TL;DR, You can ask an LLM to vibe-code you a new model from scratch, but pre-training it you’re gonna be limited by the resources you have available. You can already download pre-trained open source models for self-hosting though, and fine-tune them yourself if you desire.

      • OwOarchist@pawb.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 days ago

        (I am kind of making the assumption that their perfect, all-powerful AI, once developed, would also be a bit more efficient than current models, allowing it to more easily run on consumer-grade hardware. Also, in the meantime, consumer-grade hardware is only getting better and more powerful.)

        You can ask an LLM to vibe-code you a new model from scratch, but pre-training it you’re gonna be limited by the resources you have available

        Why would you ask the uber-LLM to code you a new model that hasn’t been trained yet? Just ask it to give you one that already has all the training done and the weights figured out. Ask it to give you one that’s ready to go, right out of the box.

        • wonderingwanderer@sopuli.xyz
          link
          fedilink
          arrow-up
          1
          ·
          6 days ago

          once developed, would also be a bit more efficient than current models

          That’s not how it works though. They’re not optimizing them for efficiency. The business model they’re following is “just a few billion more parameters this time, and it’ll gain sentiency for sure.”

          Which is ridiculous. AGI, even if it’s possible (which is doubtful), isn’t going to emerge from some highly advanced LLM.

          in the meantime, consumer-grade hardware is only getting better and more powerful

          There’s currently a shortage of DDR5 RAM because these AI companies are buying years-worth of industrial output capacity…

          Some companies are shifting away from producing consumer-grade GPUs in order to meet demand coming from commercial data centers.

          It’s likely we’re at the peak of conventional computing, at least in terms of consumer hardware.

          Why would you ask the uber-LLM to code you a new model that hasn’t been trained yet? Just ask it to give you one that already has all the training done and the weights figured out. Ask it to give you one that’s ready to go, right out of the box.

          That’s not something they’re capable of. They have a context window, and none of them has one large enough to output billions of generated parameters. It can give you a python script to generate a gaussian distribution with a given number of parameters, layers, hidden sizes, and attention heads, but it can’t make one that’s already pre-trained.

          Also, their NLP is designed to parse texts, even code, but they already struggle with mathematics. There’s no way it could generate a viable weight distribution, even if it had a 12 billion token context window, because they’re not designed to predict that.

          You’d have to run a script to get an untrained model, and then pre-train it yourself. Or you can download a pre-trained model and fine-tune it yourself, or use as is.