This dataset relates to a large scale web crawl performed over the Alexa 10K in 2019. For each website, the Javascript code is analyzed with taint tracking, a dynamic analysis technique, to determine if DOM XSS (Document Object Model Cross Site Scripting) vulnerabilities are present.
From taint tracking, we generate two datasets: a dataset of "unconfirmed" functions, labeled with the result of this taint analysis, and a dataset of "confirmed" functions, labeled with the result of a proof-of-concept DOM XSS exploit.
Both datasets were used to train a variety of machine learning models, and the resulting models are also included for reference.