Dodecahedron December

I try things on the internet.

rarely, shit just works.

  • 0 Posts
  • 18 Comments
Joined 1 year ago
cake
Cake day: July 5th, 2023

help-circle





  • Ok, so you are actually new to the internet. I’ll explain, human to human, human.

    A domain name like reddit.com or katcr.co is a registration someone gets for a period of time, at least 1 year but sometimes more than a year. One year, a user can purchase katcr.co and put up their personal website, because their name is Kat Crosby, and they are a company - katcr.co fits so they buy it and put up a site for a year or two. Life happens and they abandon the site. The domain becomes available again. Someone purchases katcr.co and makes a cookie business for a few years, abandoning the site. Someone else buys it later when it’s available and makes a bittorrent site out of it, runs it for a few years. the domain gets siezed and they can no longer use that domain. The katcr.co domain becomes available again. no one buys it.

    Someone said they used to go to katcr.co years ago, someone else chimes in and says “that site doesn’t exist, you’re a liar”, and then someone with more understanding of the internet sends an archive.org link.

    Why archive.org? It’s the only site that does this thing.

    What is the thing it does? It will, and has over the years visited websites and saved snapshots of it. Archiving it, if you will. You can then go to web.archive.org and enter the domain name of any site and it will send you to the link you’ve been given a few times. This link is to a page that shows all the times archive.org has captured a snapshot of that link. It allows you to view that page (usually just text, usually missing a lot of content like images and external files) as it was at that time.

    In this case, the existence of the link immediately disproves your argument.

    In other words, you’re entirely wrong. Both about katcr.co being fake because it’s currently not online, and also about me being a bot.








  • Gang, I hate to tell you this but this is what we mean when we say “you are the product” especially with free offerings.

    But if you hate that I have a worse thing to introduce you to: the internet. If you respond to this comment, or any comment on any lemmy instance or other federated service or website or blog… your words can be consumed, copied and used to train whatever anyone wants. It is trivially easy to create web scrapers with just a bit of coding knowledge. These days it’s pretty easy to then use that data to train AI models. To a computer, it’s just data.

    Grammarly is a product where you give it bad grammar and it gives you good grammar. Grammarly, like many products, gets better over time when it can understand what went wrong so its teams can make it right. This can often include any text entered into the program. I don’t know the specifics but they should be outlined in the privacy policy. A company using data it already has to train AI makes sense, especially if it anonymizes that data. It may not be ethical given that users weren’t aware of AI at the time they accepted the privacy policy, but with american capitalism a company can change a privacy policy and you can opt out if you don’t like it.

    That’s why we all have lawyers on retainer to read and translate all privacy policies for all websites and applications we interact with in a daily basis. Right? That’s normal, right?

    I will say, could this support person have meant that an organization with 500+ employees get a custom AI model trained on only the organization’s 500+ accounts? Because that would be better, and likely more ethical too.

    If that’s not the case and any content you have put into grammarly is being used to train AI, then I guess it’s time to stop using grammarly then huh? But it’s also time to stop posting anything on the web, too. Oh, and don’t publish anything, ever.

    Or, you could go with the flow. This data is mixed with millions of other accounts… sort of like what happened when chatgpt trained on anything you’ve already put out there. The only real concern I could see is if you discussed a very specific thing or invented your own personal coded style of writing and used it so much that, among the millions of other users, dominated the corpus and skewed the training model. Say there are only 5 grammarly users and you are number 5… you keep talking about “procorpia” being “mass sledge”, generating hundreds of entries with thousands of tokens “words”. By contrast let’s say the other 4 grammarly users only used it a few times a month to send short emails. Now, after training, the 6th grammarly user mispells a word as “procorpia” and grammarly generares “procorpia is totes mass sledge brah”. Suddenly, your secret is out.

    If, on the other hand you speak the same broken english as the rest of us, you are probably fine.