AdaGrad is an advanced gradient descent optimization algorithm that adjusts the learning rate individually for each parameter based on scaling the gradients. It performs larger updates for infrequent parameters and smaller updates for frequent parameters. This equates to having an independent, adaptive learning rate for each parameter. The core idea is to rescale gradients so that parameters with small gradients get larger learning rates and parameters with large gradients get smaller learning rates.