Oh, no, that wasn’t excusing Meta in general. Just giving them a pass on that they’ve had, to my knowledge, a history of respecting robots.txt, which makes this piece of software better than outright malware. Starting it secretly and not giving site hosts a chance to make sure they had their privacy configured the way they liked first was a shady as hell move, no argument there.
I think of this as a problem with opt-in only systems. Think of how sites ask you to opt in to allow tracking cookies every goddamn time a page loads. A rule based system which lets you opt in and opt out, like robots.txt, to let you opt out of cookie requests and tell all sites to fuck would be great. @aniki@lemm.ee is complaining about malicious instances of crawlers that ignore those rules (assuming they’re right and that the rules are set up correctly), and lumping that malware with software made by established corporations. However, Meta and other big tech companies haven’t historically had a problem with ignoring configurations like robots.txt. They have had an issue with using the data they scrape in ways that are different than what they claimed they would, or scraping data from a site that does not allow scraping by coming at it via a URL on a page that it legitimately scraped, but that’s not the kind of shenanigans this article is about, as meta is being pretty upfront about what they’re doing with the data. At least after they announced it existed.
An opt-in only solution would just lead to a world where all hosts were being constantly bombarded with requests to opt in. My major take away from how meta handled this is that you should configure any site you own to disallow any action from bots you don’t recognize. As much as reddit can fuck off, I don’t disagree with their move to change their configuration to:
User-agent: *
Disallow: /
I know what you’re trying to say, but that phrasing though. Being able to opt out is an important part of consent. No means no, man.
But meta’s will, and Alta Vista. I’m not angry at them when a script kitty makes a bad crawler
I guess I don’t really see the problem with that though. There are configuration levers you could be pulling, but those sites you’re hosting are not. There are lots of shady questions about how these models are getting training data, but crawlers have a well defined opt out mechanism.
The web would not be what we know it as without them, because it’s how you find sites. Why shouldn’t Alta Vista have one? I don’t object to what Alta Vista does with the data.
Have you used a search engine? Crawlers are not generative AI.
Does that mean this new bot is ignoring sites’ robots.txt files? The Internet works because of web crawlers, and I’m not sure how this one is different
Edited to add: Apparently one would need to add Meta-ExternalAgent to their robots file unless they had a wildcard rule, so this isn’t as widely blocked by virtue of being new. Letting it run for a few months before letting anyone know it exists is kinda shady.
It’s the copy and paste thing support posts or says every fucking times you try and talk to them. I guess the down voters think I’m pro this weird act they do to make it feel like you have control of your data.
For a limited period of time necessary to resolve your customer support issue, you agree to allow Google customer support to access data about and associated with your Google product and account, which may include product information such as IMEI, Serial Number, country in which your product was purchased, account history and limited historical usage data.
The data accessed will be used to improve your customer service experience, to troubleshoot issues with this product, promotion history and for fraud prevention. Google will handle this data as described in Google’s https://policies.google.com/privacy?hl=en-US. Do we have your consent?
Please, be reasonable:
return post.Contains("Twitter") || post.Contains("elon") ? "loser" : "cool";
Either this is implying that Linux users are hiding who they are to fit in, and Windows users are comfortable in their own skin, or this is just transphobic as fuck.
If it’s the former, this may be the wrong sub for you. If it’s the latter, this may be the wrong world for you.
When grooming is life
I know this is what-about-ism but I really wish we cared half as much about Meta having already destabilized the last two presidential elections.
I was just stirring the pot, and I love this response