Common Crawl
Crawl the Web, Absorb the Bias: NLP Models Absorb Biases from Web Training Data
The emerging generation of trillion-parameter models needs datasets of billions of examples, but the most readily available source of examples on that scale — the web — is polluted with bias and antisocial expressions. A new study examines the issue.