Scripts to download and convert different datasets to Vowpal Wabbit format. Repository contains scripts for the following datasets:
Usage: bash ./<repo>/get_<dataset-name>.sh
xml_repo
(multilabel):
- Amazon-3M
(amazon-3M)
- Amazon-670K
(amazon)
- AmazonCat-13K
(amazonCat)
- AmazonCat-14K
(amazonCat-14K)
- Bibtex
(bibtex)
- Delicious
(delicious)
- Delicious-200K
(deliciousLarge)
- EURLex-4K
(eurlex)
- Mediamill
(mediamill)
- RCV1-2K
(rcv1x)
- Wiki10-31K
(wiki10)
- WikiLSHTC-325K
(wikiLSHTC)
- Wikipedia-500K
(WikipediaLarge-500K)
- aloi.bin
- Dmoz
- imageNet
- LSHTC1
- sector
- Eur-Lex
- rcv1_regions
- bibtex
- LSHTCwiki
brew install gnu-sed