GPT

Introduced by Radford et al. in Improving Language Understanding by Generative Pre-Training

GPT is a Transformer-based architecture and training procedure for natural language processing tasks. Training follows a two-stage procedure. First, a language modeling objective is used on the unlabeled data to learn the initial parameters of a neural network model. Subsequently, these parameters are adapted to a target task using the corresponding supervised objective.

Source: Improving Language Understanding by Generative Pre-Training

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	88	11.24%
Large Language Model	51	6.51%
Question Answering	34	4.34%
Prompt Engineering	25	3.19%
Text Generation	23	2.94%
Retrieval	21	2.68%
Sentence	20	2.55%
Decision Making	20	2.55%
In-Context Learning	20	2.55%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Attention Dropout	Regularization
BPE	Subword Segmentation
Dense Connections	Feedforward Networks
Discriminative Fine-Tuning	Fine-Tuning
Dropout	Regularization
GELU	Activation Functions
Layer Normalization	Normalization
Linear Warmup With Cosine Annealing	Learning Rate Schedules
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions
Weight Decay	Regularization

Categories

Add Remove

Transformers

Autoregressive Transformers

GPT

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove