Oh, they’re 100% pirated. Sorry this isn’t open, but the preview should give you enough information. The database is available elsewhere, IIRC. https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/
The database is here. You’ll have to sign up for a free trial if you’re not a subscriber to The Atlantic already. https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/