Google Says People Are Copying Its AI Without Its Permission, Much Like It Scraped Everybody's Data Without Asking to Create Its AI in the First Place

lobut@lemmy.ca · 3 months ago

Google Says People Are Copying Its AI Without Its Permission, Much Like It Scraped Everybody's Data Without Asking to Create Its AI in the First Place

WatDabney@sopuli.xyz · 3 months ago

Google has become a colonialist project.

First they gained access to the communsl property of the internet. Then they stole it from the original inhabitants. And now they’re trying to claim a legal right to exclusive control over the property they stole.

errer@lemmy.world · 3 months ago

The free parts of the internet already feel like tiny reservations trapped within a vast collection of oppressors…

atomicbocks@sh.itjust.works · 3 months ago

Google has taken over the Web. Lucky for us there are other software platforms on the Internet.

100_kg_90_de_belin@feddit.it · 3 months ago

And it’s going to populate it with AI slop-fed consumers.

gravitas_deficiency@sh.itjust.works · 3 months ago

Olgratin_Magmatoe@startrek.website · 3 months ago

Still too big

MinnesotaGoddam@lemmy.world · 3 months ago

despite_velasquez@lemmy.world · 3 months ago

AI output can never be copyrightable

OwOarchist@pawb.social · 3 months ago

So far… Just wait until the lobbyists get their hands on our laws…

jaybone@lemmy.zip · 3 months ago

I don’t think you have to wait.

limpatzk@bookwyr.me · edit-2 3 months ago

deleted by creator

Big Baby Thor@sopuli.xyz · 3 months ago

AI DRM. It’s coming. All outputs will only be available to the browsers that support it. Also, future clipboards will be tied in.

Ctrl+V

“I’m sorry, but you don’t have permission to output that into this application. To apply for a licence click here.”

tyler@programming.dev · 3 months ago

You will never stop computers from being able to copy what is shown on the screen. Right now you can go in and just disable copy paste blocking in your browser if you really wanted. It’s just javascript.

Big Baby Thor@sopuli.xyz · 3 months ago

You think that stops giant corporations from implementing DRMs?

Bustedknuckles@lemmy.world · 3 months ago

“My output is valuable, proprietary, and demands remuneration; my inputs are fair use and of negligible valuable”

Malfeasant@lemmy.world · 3 months ago

AKA capitalism as usual…

DGen@piefed.zip · 3 months ago

jlow (he / him)@discuss.tchncs.de · 3 months ago

get rekt.

hendrik@palaver.p3x.de · 3 months ago

It’s mental. The terms and conditions of some AI music generators will make people pay for a “license” to use the output for example for commercial purposes. They themselves of course claim “fair use” and steal all the music out there to train their models. I think some companies now don’t claim ownership any more, for images and video snippets. And of course AI output isn’t copyright-able in the first place.

The companies will occasionally use their trademarks, intellectual property or copyright against people. Of course those rules don’t apply the other way around. It’s completely fine their product draws all Disney princesses, comic and anime characters and reproduces half of Harry Potter. But beware someone names something with “Claude” in the name. Of course Google follows the same logic with this.

And then my homepage gets hammered with their stupid AI crawlers, but I have to abide by the terms and conditions of their services…

queermunist she/her@lemmy.ml · 3 months ago

Supposedly, copyright needs to be defended or it is lost. It would never happen, but it’d be interesting if the companies allowing data scrapers and chatbots to violate their IP actually destroyed their own claim to copy right protections.

hendrik@palaver.p3x.de · 3 months ago

Yes. I don’t think it’s settled yet. There’s still many trials going on. The industry still tries to push the limits, including really weird stuff like Elon Musk probing if it’s okay to allow deep-fakes of random existing women and minors. I think lawmakers are having a difficult time to keep up with the pace. AI companies drown them with their near unlimited resources. We need to come up with new regulation. Fight all the court battles, overhaul copyright and discuss things in society… And then there’s preexisting influential structures, like Disney, the copyright industry… Sometimes they’re on opposing sides, sometimes they dabble in AI as well… I mean it’s complicated. And a long process. And it’s difficult to defend things. I mean I also defend my server. But it’s more an open war than anything with rules and terms.

queermunist she/her@lemmy.ml · 3 months ago

I think lawmakers are having an easy time accepting bribes from AI companies, actually. The pace is only a problem because they are being paid to slow down.

The courts are more interesting, because they actually have to make decisions instead of just deliberating forever.

hendrik@palaver.p3x.de · 3 months ago

Depends a bit on the country. In the United States, for sure. That’s just open corruption and you scratch my back and I’ll scratch yours. The government funnels $500bn taxpayer money into some Project Stargate, and God knows how much into really dark stuff with Palantir. Musk even “worked” for the government for a while… And next to the corruption money, these people are buddies. And they’re all working towards the same goal. Some idea of an apocalypse.

In China, I don’t think they need to bribe the government. It was the CCP who came up with the idea in the first place. And the AI race between China and the USA is yet another thing.

For Europe, I’m not so sure. There’s a bit more nuance here? I mean Ursula von der Leyen is an AI shill as well. She frequently likes to talk about it. I don’t think there’s as much open bribery, though. And I still hope they’re aware of the situation with US companies, how we diverge in our goals, and partnering with Palantir or X is likely going to end us up in a lot of pain… And the EU loves to regulate. And our own AI companies aren’t as big. So there’s that as well.

queermunist she/her@lemmy.ml · 3 months ago

China is an interesting inversion of the US. In the US, the government is invested in the AI race because they’ve been bribed and because the money line go up. In China, the government was invested in the AI race before the bubble started to inflate and is instead pushing its own companies to invest in AI. Basically: in the US markets are in command, in China politics are in command.

It’ll be really interesting to see how the two countries respond to the bubble bursting.

As for Europe, there’s been some murmurings about tech sovereignty which are really exciting to me. They need to get out of US tech, whether that means they put a lot more focus on building European AI firms or they just get out of AI entirely.

Logi@lemmy.world · 3 months ago

Supposedly, copyright needs to be defended or it is lost.

No, that only applies to trademarks, not patents or copyright.

Grimy@lemmy.world · 3 months ago

Being pro-copyright is giving the keys to record companies though. They would be the only ones with a “legal” model. Udio got bought by universal not too long ago but as long as laws aren’t rewritten for the benefit of mega corps and copyright juggernauts, open source will ruin all the shenanigans they are trying to pull.

It’s the same for all the text models. Open source is destroying openais business model. They need laws that restrict what you can train on so they can buy themselves a monopoly.

sp3ctr4l@lemmy.dbzer0.com · 3 months ago

Wow, they’re seriously saying this with a straight face, huh?

Oh hi Google, my name is Epic Games, and I see your recently trained a new ‘AI’ of yours on Fortnite.

Let me introduce you to my friend Fromsoft, who is pretty sure you uh, copied their notes from Dark Souls as well.

… How are all these people this fucking stupid?

There’s no possible resolution to the paradigm of ‘I can steal everything but you can’t steal anything’ other than total chaos.

Total chaos ain’t a good standard for a legal system trying to figure out IP law.

This is completely ludicrous.

𝕸𝖔𝖘𝖘@infosec.pub · 3 months ago

I’m not even going to break out my worlds smallest violin for this one. It’s just not worth it.

Jared White ✌️ [HWC]@humansare.social · 3 months ago

boo fucking hoo. 🤧

moendopi@lemmy.world · 3 months ago

Well well well how turns have tabled

RampantParanoia2365@lemmy.world · 3 months ago

Why in the holy hell would anyone want to copy that useless garbage. Their AI is a moron.

Rooster326@programming.dev · 3 months ago

deleted by creator

OwOarchist@pawb.social · 3 months ago

You see, big tech AI bros? This is why you’re dumb. Even if this all pans out and all your AI dystopia dreams come true, it doesn’t mean you’re going to be rich and powerful and at the top.

If your AI becomes as good as it’s supposedly going to get … I can just ask it to develop a new AI for me. And then I don’t have to use yours anymore. Why would anybody pay you to use your AI when it becomes trivial to make a new one, tailored to their specific needs? Why would I need your big tech company for anything if anything you can provide could be readily replaced by just asking an AI for it. If AI becomes good enough to replace everyone’s job, it will replace big tech as well.

The only people who might be benefiting from all this are the ones who manufacture and sell the hardware that runs it. If AI becomes good enough, all software companies will go bankrupt. Yes including Google, Microslop, etc.

wonderingwanderer@sopuli.xyz · 3 months ago

You can already self-host an open source LLM, and fine-tune it on custom datasets. Huggingface has thousands to choose from.

The largest you’ll probably fit on consumer hardware is probably 32 billion parameters or so, and that’s with quantization. Basically, at 8-bit quantization, you need 1GB RAM for every billion parameters. So a 32 billion parameter 8-bit model would need 32GB RAM, plus overhead. At 16-bit it would need 64GB RAM, and so on. A 24 billion parameter model with 16-bit quantization would take up 48GB RAM, etc.

The commercial LLMs that people pay subscriptions to use an API for tend to have like 130-200 billion parameters with no quantization (32-bit). So it wouldn’t run on consumer hardware. But you honestly don’t need one that big, and I think they actually suffer in quality by trying to overgeneralize.

For most people’s purposes, a 14 billion parameter model with 16-bit architecture is probably fine. You just need 28GB of free RAM. Otherwise, on 14GB RAM you can do 14B params at 8-bit, or 7B at 16-bit. You might lose some accuracy, but with specialized fine-tuning and especially retrieval-augmented generation, it won’t be severe.

Anything smaller than 7B might be pushing it, and likewise anything at 4-bit quantization would lose accuracy. 7B at 8-bit would also probably suffer on benchmarks. So realistically you’ll probably need at least 16GB of RAM accounting for overhead. More if you want to run any concurrent processes.

The thing about making one from scratch though, is that it’s resource-intensive. You can try generating a 1 billion parameter model with blank or randomized weights, the algorithm isn’t a secret. But pre-training it could take weeks or months depending on your hardware. Maybe days if you have a high-end GPU. And that’s with it running non-stop, so you can imagine the electric bill, and the task of keeping your system cool.

TL;DR, You can ask an LLM to vibe-code you a new model from scratch, but pre-training it you’re gonna be limited by the resources you have available. You can already download pre-trained open source models for self-hosting though, and fine-tune them yourself if you desire.

OwOarchist@pawb.social · 3 months ago

(I am kind of making the assumption that their perfect, all-powerful AI, once developed, would also be a bit more efficient than current models, allowing it to more easily run on consumer-grade hardware. Also, in the meantime, consumer-grade hardware is only getting better and more powerful.)

You can ask an LLM to vibe-code you a new model from scratch, but pre-training it you’re gonna be limited by the resources you have available

Why would you ask the uber-LLM to code you a new model that hasn’t been trained yet? Just ask it to give you one that already has all the training done and the weights figured out. Ask it to give you one that’s ready to go, right out of the box.

wonderingwanderer@sopuli.xyz · 3 months ago

once developed, would also be a bit more efficient than current models

That’s not how it works though. They’re not optimizing them for efficiency. The business model they’re following is “just a few billion more parameters this time, and it’ll gain sentiency for sure.”

Which is ridiculous. AGI, even if it’s possible (which is doubtful), isn’t going to emerge from some highly advanced LLM.

in the meantime, consumer-grade hardware is only getting better and more powerful

There’s currently a shortage of DDR5 RAM because these AI companies are buying years-worth of industrial output capacity…

Some companies are shifting away from producing consumer-grade GPUs in order to meet demand coming from commercial data centers.

It’s likely we’re at the peak of conventional computing, at least in terms of consumer hardware.

Why would you ask the uber-LLM to code you a new model that hasn’t been trained yet? Just ask it to give you one that already has all the training done and the weights figured out. Ask it to give you one that’s ready to go, right out of the box.

That’s not something they’re capable of. They have a context window, and none of them has one large enough to output billions of generated parameters. It can give you a python script to generate a gaussian distribution with a given number of parameters, layers, hidden sizes, and attention heads, but it can’t make one that’s already pre-trained.

Also, their NLP is designed to parse texts, even code, but they already struggle with mathematics. There’s no way it could generate a viable weight distribution, even if it had a 12 billion token context window, because they’re not designed to predict that.

You’d have to run a script to get an untrained model, and then pre-train it yourself. Or you can download a pre-trained model and fine-tune it yourself, or use as is.

Holyhandgrenade@lemmy.world · 3 months ago

Ghostie@lemmy.zip · edit-2 1 month ago

deleted by creator