Lexpanded-PPDB: Lexically-Expanded Paraphrase Database

Introduction

Lexpanded-PPDB is a lexically-expanded (lexpanded) version of an existing paraphrase database, PPDB. The dataset is developed aiming at improving natural language processing systems by making them more robust to the variability of language expressions.

Features

News

Download

Language Package # of pairs in PPDB # of pairs in Lexpanded-PPDB (file size)
English S670 thousand132 million (728MB)
M1.32 million230 million (1.2GB)
L3.12 million418 million (2.0GB)
XL7.19 million692 million (3.1GB)
French S6.21 million230 million (1.1GB)
M12.5 million385 million (1.6GB)
L25.2 million612 million (2.4GB)
XL50.6 million942 million (3.6GB)
Spanish S5.22 million138 million (610MB)
M10.2 million241 million (982MB)
L19.5 million396 million (1.5GB)
XL37.1 million628 million (2.3GB)
German S461 thousand154 million (763MB)
M883 thousand239 million (1.1GB)
L1.68 million359 million (1.6GB)
XL3.47 million532 million (2.2GB)

Todo

References

Precautions

License

Creative Commons License

Use and/or redistribution of the Lexpanded-PPDB is permitted under the conditions of Creative Commons Attribution-ShareAlike License 3.0.

Acknowledgments

This work was partly supported by the following fundings.

The dataset has been developed as a part of work at Advanced Translation Technology Laboratory, Advanced Speech Translation Research and Development Promotion Center, National Institute of Information and Communications Technology.