Creating a torrent that includes all of humanity's knowledge/art/entertainment?

AnarchistsForDemocracy@lemmy.world · edit-2 1 year ago

Creating a torrent that includes all of humanity's knowledge/art/entertainment?

AnarchistsForDemocracy@lemmy.world · 1 year ago

it’s a lot of work

so per your suggestion using for example the zlibrary book/paper repo and training sets of openai as starting point one could maybe get around the brunt of the work.

rufus@discuss.tchncs.de · edit-2 1 year ago

ZLibrary isn’t something that pays attention to licensing. It’s mainly copyrighted and pirated material.

I meant something like the dump of wikipedia, project gutenberg, and whatever archive.org has available tagged with some favorable licenses.

I think there are datasets compiled with sources like those. I’m not an expert on this, something like RedPajama just without random web-scraping.

https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research