The Multimodal Attribute Extraction (MAE) dataset is the first benchmark dataset for the task of multimodal attribute extraction. It is composed of mixed media data for 2.2 million product items. For each item there is a textual description, set of product images, and open-schema table of product attributes. For more information, read our paper:
Leaderboard submission instructions are currently under construction.
For support, please reach out to us at the MAE Dataset Google group.
Last updated: 12/02/2017
Rank | Model | Accuracy |
---|---|---|
1 | Most Common Value | 33.99% |
Last updated: 11/12/2017
Rank | Model | Accuracy |
---|---|---|
1 | Multimodal Baseline - Concat | 59.48% |
2 | Text Baseline | 58.41% |
3 | Multimodal Baseline - GMU | 52.92% |
4 | Most-Common Value | 38.81% |
5 | Image Baseline | 38.07% |