Quadratic discriminant analysis (QDA) and linear discriminant analysis (LDA) have a lot of in common. They both assume that the observations from each class are drawn from a Gaussian (or multi-variate Gaussian) distribution, and plugging estimates for the parameters into Bayes’ theorem in order to perform prediction. However, unlike LDA which assumes a class-specific mean vector and a common covariance matrix for all K class, QDA looses the restriction of the common covariance matrix. That’s saying, it assumes that each class has its own covariance matrix (i.e. an observation from kth class is of the form $X \sim N(\mu_k, \Sigma_k)$, where $\Sigma_k$ is a covariance matrix for kth class). Under this assumption, the Bayes classifier assigns an observation $X = x$ to the class for which

$\displaystyle \delta_k(x) = -\frac{1}{2}(x-\mu_k)^\intercal \Sigma_k^{-1} (x-\mu_k) + \log\pi_k \\=-\frac{1}{2}x^\intercal \Sigma_k^{-1}x + x^\intercal \Sigma_k^{-1}\mu_k - \frac{1}{2}\mu_k^\intercal \Sigma_k^{-1}\mu_k + \log\pi_k$

is largest. So QDA classifier plugs estimates $\Sigma_k, \mu_k$ and $\pi_k$ into the equation above and then assigns an observation $X = x$ to the class for which this quantity is largest. Note now the equation above is represented as a quadratic function of x and that’s where QDA gets its name.

Overall, both LDA and QDA try to approximate Bayes classifier by assuming a linear or non-linear decision boundary, respectively. So there is a trade-off: LDA is much less flexible classifier than QDA (potentially higher bias) but has substantially lower variance and vice versa. When the following conditions meet, the QDA will be preferred and may potentially lead to improved prediction performance.

1. The true decision boundary (Bayes decision boundary) is non-linear or the linear assumption is not reasonable
2. The assumption of a common covariance matrix for K classes is clearly untenable
3. The size of training data is big so reducing variance is not crucial. On the other side, if there are relatively few training observations, fitting a QDA may cause serious overfitting problem.

Reference:

An Introduction to Statistical Learning by Trevor Hastie, Robert Tibshirani