Training datasets are very important for experimenting with varied data to train new AI models. However, many commonly used public data sets contain labeling errors. This makes it challenging to train robust models, particularly for novel tasks. Many researchers use techniques such as employing a variety of data quality control procedures to overcome these shortcomings. However, there is no centralized repository consisting of examples of using these strategies.
Meta AI researchers have recently released Mephisto. It is a new platform to collect, share, and iterate on the most promising approaches to collecting training datasets for AI models. Researchers can exchange unique collecting strategies with Mephisto in a reusable and iterable format. It also allows them to change out components and quickly locate the exact annotations required, minimizing the barrier to custom task creation.
The team uncovers many common pathways for driving a complex annotation activity from concept to data collection in Mephisto. In addition to improving the quality of datasets, Mephisto also enhances the experience of the researchers and annotators who created the data set.
Comments are closed.