Resources
We have created several resources, including softwares and datasets, for computational linguistics and natural language processing. Many of them have no longer been maintained, but below shows those that are still available as of March, 2024.
Softwares
- TexTra-MTQE
[Paper]
- Raphael Rubino (Atsushi Fujita managed the publication process)
- Software for machine translation quality estimation (MTQE)
- MIT License
- QENN
[Paper]
- Lemao Liu and Atsushi Fujita
- A word-level translation quality estimation system based on feedforward neural networks
- GNU Lesser General Public License, Version 3 (LGPL-v3)
Datasets
- MTPEdocs-MQM
- Atsushi Fujita
- Span-based issue annotations based on MQM-like manual quality assessment of MT outputs for local government documents.
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- MultiEnJa
- The Kaken TNTC Project (Atsushi Fujita is a co-investigator)
- Sample English documents in various content domains that are often dealt with by translation service providers, their translations in Japanese, and markups for machine translation quality estimation
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- Staged PE Dataset
[Paper]
- Atsushi Fujita
- Translated documents produced through 2-stage post-editing of MT outputs with the annotations for the revisions in the second stage
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- Annotations for the Meta-Evaluation of Machine Translation Research
[Paper]
- Benjamin Marie, Atsushi Fujita, and Raphael Rubino
- A large scale meta-evaluation of automatic evaluation methodology in *ACL publications on MT
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- JaRuNC
[Paper]
- Aizhan Imankulova, Atsushi Fujita, and Kenji Imamura
- Japanese-Russian-English News Commentary Parallel Data
- We claim nothing, following the original data (NewsCommentary11)
- NICT QE/APE Datasets
[Paper]
- Atsushi Fujita and Eiichiro Sumita
- A multilingual parallel corpus consisting of utterances in Japanese and their MT outputs in several languages, manually associated with their gradings and post-edits
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- Lexpanded PPDB
[Paper]
[論文]
- Atsushi Fujita
- Lexically-expanded paraphrase database
- Creative Commons Attribution-ShareAlike License 3.0 (CC BY-SA 3.0)
Documents
- TML-Translation-Difference
[Paper for v1]
[Paper for v2]
- Tomono Honda and Atsushi Fujita
- A Metalanguage of translation difference, consisting of the three typologies in the form of decision lists, instructional materials, and a list of examples
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- TML-Strategy
[Paper for v1]
[Paper for v2]
- Mayuka Yamamoto, Atsushi Fujita, and Kyo Kageura
- A Metalanguage of translation strategy, consisting of the three typologies in the form of decision lists
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- TML-IssueType
[Paper for v1]
[Paper for v1]
- Chiho Toyoshima, Kikuko Tanabe, and Atsushi Fujita
- A Metalanguage of translation issue, consisting of the issue typology and the decision tree
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)
- Guidelines of creating detailed outlines (In Japanese)
[Paper]
- Atsushi Fujita
- Instructional documents for creating detailed outlines before creating documents
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)