Content Preview

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Odit molestiae mollitia laudantium assumenda nam eaque, excepturi, soluta, perspiciatis cupiditate sapiente, adipisci quaerat odio voluptates consectetur nulla eveniet iure vitae quibusdam? Excepturi aliquam in iure, repellat, fugiat illum voluptate repellendus blanditiis veritatis ducimus ad ipsa quisquam, commodi vel necessitatibus, harum quos a dignissimos.

Close Save changes

Keyboard Shortcuts

Help F1 or ? Previous Page ← + CTRL (Windows) ← + ⌘ (Mac) Next Page → + CTRL (Windows) → + ⌘ (Mac) Search Site CTRL + SHIFT + F (Windows) ⌘ + ⇧ + F (Mac) Close Message ESC

10.2 - Discriminant Analysis Procedure

Discriminant analysis is a 7-step procedure.

Step 1: Collect training data

Training data are data with known group memberships. Here, we actually know which population contains each subject. For example, in the Swiss Bank Notes, we actually know which of these are genuine notes and which others are counterfeit examples.

Step 2: Prior Probabilities

The prior probability \(p_i\) represents the expected portion of the community that belongs to population \(\pi_\). There are three common choices:

Equal priors: \(\hat
_i = \frac\) This is useful if we believe that all of the population sizes are equal
Arbitrary priors were selected according to the investigator's beliefs regarding the relative population sizes.

Note! We require: \(\hat

_1 + \hat

_2 + \dots + \hat

_g = 1\)

Step 3: Bartlett's test

Use Bartlett’s test to determine if the variance-covariance matrices are homogeneous for all populations involved. The result of this test will determine whether to use Linear or Quadratic Discriminant Analysis.:

Case 1: Linear

Linear discriminant analysis is for homogeneous variance-covariance matrices:

\(\Sigma_1 = \Sigma_2 = \dots = \Sigma_g = \Sigma\)

In this case, the variance-covariance matrix does not depend on the population.

Case 2: Quadratic

Quadratic discriminant analysis is used for heterogeneous variance-covariance matrices:

\(\Sigma_i \ne \Sigma_j\) for some \(i \ne j\)

This allows the variance-covariance matrices to depend on the population.

Note! We do not discuss testing whether the means of the populations are different. If they are not, there is no case for DA

Step 4: Estimate the parameters of the conditional probability density functions \(f ( \mathbf |\pi_)\).

Here, we shall make the following standard assumptions:

The data from group i has common mean vector \(\boldsymbol<\mu_i>\)
The data from group i have a common variance-covariance matrix \(\Sigma\).
Independence: The subjects are independently sampled.
Normality: The data are multivariate normally distributed.

Step 5: Compute discriminant functions.

This is the rule to classify the new object into one of the known populations.

Step 6: Use cross-validation to estimate misclassification probabilities.

As in all statistical procedures, it is helpful to use diagnostic procedures to assess the efficacy of the discriminant analysis. We use cross-validation to assess the classification probability. Typically you are going to have some prior rule as to what is an acceptable misclassification rate. Those rules might involve things like, "what is the cost of misclassification?" This could come up in a medical study where you might be able to diagnose cancer. There are really two alternative costs. The cost of misclassifying someone as having cancer when they don't. This could cause a certain amount of emotional grief! There is also the alternative cost of misclassifying someone as not having cancer when in fact they do have it. The cost here is obviously greater if early diagnosis improves cure rates.

Step 7: Classify observations with unknown group memberships.

The procedure described above assumes that the unit or subject being classified actually belongs to one of the considered populations. If you have a study where you look at two species of insects, A and B, and the insect to classify actually belongs to species C, then it will obviously be misclassified as to belonging to either A or B.