A team of researchers from OpenAI recently published a paper describing GPT-3, a deep-learning model for natural-language with 175 billion parameters, 100x more than the previous version, GPT-2. The model is pre-trained on nearly half a trillion words and achieves state-of-the-art performance on several NLP benchmarks without fine-tuning.
In paper published on arXiv, a team of over 30 co-authors described the model and several experiments. The researchers’ goal was to produce an NLP system that performs well on a variety of tasks with little or no fine-tuning, and previous work had indicated that larger models might be the solution. To test that hypothesis, the team increased the size of their previous model, GPT-2, from 1.5 billion parameters to 175 billion. For training, the team collected several datasets, including the Common Crawl dataset and the English-language Wikipedia. The model was evaluated against several NLP benchmarks, matching state-of-the-art performance on “closed-book” question-answering tasks and setting a new record for the LAMBADA language modeling task.
OpenAI made headlines last year with GPT-2 and their decision not to release the 1.5 billion parameter version of the trained model due to “concerns about malicious applications of the technology.” GPT-2 is one of many large-scale NLP models based on the Transformer architecture. These models are pre-trained on large text corpora, such as the contents Wikipedia, using self-supervised learning. In this scenario, instead of using a dataset containing inputs paired with expected outputs, the model is given a sequence of text with words “masked” and it must learn to predict the masked words based on the surrounding context. After this pre-training, the models are then fine-tuned with a labelled benchmark dataset for a particular NLP task, such as question-answering.