MAE Dataset

Multimodal Attribute Extraction

What is MAE?

The Multimodal Attribute Extraction (MAE) dataset is the first benchmark dataset for the task of multimodal attribute extraction. It is composed of mixed media data for 2.2 million product items. For each item there is a textual description, set of product images, and open-schema table of product attributes. For more information, read our paper:

Download MAE (v.0.0)

Evaluation Instructions

Leaderboard submission instructions are currently under construction.


For support, please reach out to us at the MAE Dataset Google group.

Leaderboard - All Attributes

Last updated: 12/02/2017

Rank Model Accuracy
1 Most Common Value 33.99%

Leaderboard - Top 100 Attributes

Last updated: 11/12/2017

Rank Model Accuracy
1 Multimodal Baseline - Concat 59.48%
2 Text Baseline 58.41%
3 Multimodal Baseline - GMU 52.92%
4 Most-Common Value 38.81%
5 Image Baseline 38.07%