Lorem ipsum dolor sit amet, consectetur adipisicing elit. Odit molestiae mollitia laudantium assumenda nam eaque, excepturi, soluta, perspiciatis cupiditate sapiente, adipisci quaerat odio voluptates consectetur nulla eveniet iure vitae quibusdam? Excepturi aliquam in iure, repellat, fugiat illum voluptate repellendus blanditiis veritatis ducimus ad ipsa quisquam, commodi vel necessitatibus, harum quos a dignissimos.
Close Save changesHelp F1 or ? Previous Page ← + CTRL (Windows) ← + ⌘ (Mac) Next Page → + CTRL (Windows) → + ⌘ (Mac) Search Site CTRL + SHIFT + F (Windows) ⌘ + ⇧ + F (Mac) Close Message ESC
Discriminant analysis is a 7-step procedure.
Training data are data with known group memberships. Here, we actually know which population contains each subject. For example, in the Swiss Bank Notes, we actually know which of these are genuine notes and which others are counterfeit examples.
The prior probability \(p_i\) represents the expected portion of the community that belongs to population \(\pi_\). There are three common choices:
_i = \frac\) This is useful if we believe that all of the population sizes are equal
_1 + \hat
_2 + \dots + \hat
_g = 1\)
Use Bartlett’s test to determine if the variance-covariance matrices are homogeneous for all populations involved. The result of this test will determine whether to use Linear or Quadratic Discriminant Analysis.:
Case 1: LinearLinear discriminant analysis is for homogeneous variance-covariance matrices:
\(\Sigma_1 = \Sigma_2 = \dots = \Sigma_g = \Sigma\)
In this case, the variance-covariance matrix does not depend on the population.
Case 2: QuadraticQuadratic discriminant analysis is used for heterogeneous variance-covariance matrices:
\(\Sigma_i \ne \Sigma_j\) for some \(i \ne j\)
This allows the variance-covariance matrices to depend on the population.
Note! We do not discuss testing whether the means of the populations are different. If they are not, there is no case for DA
Here, we shall make the following standard assumptions:
This is the rule to classify the new object into one of the known populations.
As in all statistical procedures, it is helpful to use diagnostic procedures to assess the efficacy of the discriminant analysis. We use cross-validation to assess the classification probability. Typically you are going to have some prior rule as to what is an acceptable misclassification rate. Those rules might involve things like, "what is the cost of misclassification?" This could come up in a medical study where you might be able to diagnose cancer. There are really two alternative costs. The cost of misclassifying someone as having cancer when they don't. This could cause a certain amount of emotional grief! There is also the alternative cost of misclassifying someone as not having cancer when in fact they do have it. The cost here is obviously greater if early diagnosis improves cure rates.
The procedure described above assumes that the unit or subject being classified actually belongs to one of the considered populations. If you have a study where you look at two species of insects, A and B, and the insect to classify actually belongs to species C, then it will obviously be misclassified as to belonging to either A or B.