This tutorial and associated technical appendix have been modified from and are based on the BMD (BIOMED) statistical package
documentation associated with the BMD08M factor analysis program.
Documentation and the BMD08M program were developed under a National Science Foundation grant.
OVERVIEW:
Factor analysis is a data reduction technique for identifying the internal structure of a set of variables. Unlike other
techniques like Regression analysis or ANOVA, factor analysis does not require that predictor and criterion
variables be defined. Factor analysis attempts to identify the relationship between all variables included in the analysis
set.
Factor analysis is decompositional in nature in that it identifies the underlying relationships that
exist within a set of variables. Factor analysis creates groups of metric variables (interval or ratio scaled) called factors. A factor is an underlying quality found to be characteristic of the original variables. Two types of factors exist. Common factors have
effects shared in common with more than one observed variable. Unique factors have effects that are unique to a
specific variable.
OBJECTIVES OF THE FACTOR ANALYSIS :
The basic objectives of a Factor Analysis are:
i) To determine how many factors are needed to explain the set of variables
ii) To find the extent to which each variable is associated with each of a set of common factors.
iii) To provide interpretation to the common factors.
iv) To determine the amount of each factor possessed by each observation. (Identified by the factor scores)
In summary then, the goal is to explain a portion of their variance in the set of variables input into the analysis by identifying certain underlying common
dimensions... called the factors. Factor analysis helps in identifying this set of k dimensions underlying the m variables
in a data set (where k < m).
A Factor Analysis Example:
For discussion purposes, consider the following five variable data set that is later used for the
Factor program.
79652 55462 12345 16523 46525 79665 65321 98653 46521 65435
32165 56523 65454 16589 98965 73195 15937 35079 62486 46428
This data represents the scores (0 to 9 scale) of 20 students on five finals (e.g. Math, English, History, Geography,
Science). Can we say that the students' exam grades in the different subjects are related? The relationship between the
student grades are not directly measureable, but are in fact latent. Grades in different courses could be related because
of the student's intellectual capabilities, memory capacity, or just interest. Although it should be noted that the test
grades of one person may not be completely correlated with one another, we can conclude that the grades in all subject
areas should depend to some degree on the general intelligence or other factors common to the learning of the subject
material. Accordingly, we may identify one or more factors that explain the `common' portion of the variance in the
original raw scores.
Organizing Your Data for Factor Analysis
Data sets are traditionally in the form of an observations
by variables matrix. Some researchers may, however, have need for analysis of data forms that do not conform to the
traditional mode. For example, occasions (repeated measures) may be included or data matrices could be transposed.
Each of these data forms may be analyzed using factor analysis, but will produce a decomposition of observations or
occasions. Alternate forms of the factor analysis data matrix appear below. (The most common forms of factor
analysis are R Type, where factors are loaded by variables and are computed across the persons and Q Type, where
factors are loaded by persons and are computed across the variables).
Graphical Portrayal of Modes of Factor Analysis
The alternative modes of factor analysis can be portrayed graphically. The original data set is viewed as a variables-persons-occasions matrix. R-Type and Q-Type techniques deal with the variables-persons dichotomy. In contrast P-type and Q-Type analysis are used for the occasions-variables situation and S-Type and T-Type are used when the
occasions-persons relationship is of interest (c).
VARIABLES VARIABLES
+-----------+ +-----------+
¦¦¦ ¦ +-----------¦
PERSONS ¦¦¦ R-TYPE ¦ PERSONS ¦ Q-TYPE ¦
¦¦¦ ¦ +-----------¦
+-----------+ +-----------+
VARIABLES VARIABLES
+-----------+ +-----------+
¦¦¦ ¦ +-----------¦
OCCASIONS ¦¦¦ P-TYPE ¦ OCCASIONS ¦ O-TYPE ¦
¦¦¦ ¦ +-----------¦
+-----------+ +-----------+
PERSONS PERSONS
+-----------+ +-----------+
¦¦¦ ¦ +-----------¦
OCCASIONS ¦¦¦ S-TYPE ¦ OCCASIONS ¦ T-TYPE ¦
¦¦¦ ¦ +-----------¦
+-----------+ +-----------+
DEFINITION OF TERMS COMMONLY USED IN FACTOR ANALYSIS
BIQUARTIMIN: The factor loadings matrix is transformed by an oblique (so the factors are correlated) rotation such
that there is one variable with a large squared loading on the factor and the rest of the variable loadings on the factor
would be close to zero.
COMMON FACTOR ANALYSIS: Factor analysis based upon a correlation matrix, with values less than 1.0 on the
diagonal. The values on the diagonal, are known as communalities and are inserted in the diagonal to represent only
the common variance (excludes specific and error variance), that should be solved for by the factor analysis
COMMUNALITY: The amount of variance in the variable shared with all other variables.
PRINCIPAL COMPONENTS ANALYSIS: One variety of factor analysis. The factors are based upon an analysis of
the total variance in the original data. In application, this means that eh factor analysis begins with a correlation
matrix which has the value of '1' used on the diagonal. This computationally implies that all 100% of the variance is
common or shared between the variables. Other forms of factor analysis may begin with other values in the diagonal
that reflect the amount of variance expected to be explained for each variable.
CORRELATION MATRIX: A table showing intercorrelation among all variables analyzed.
EIGENVALUE: The sum of squares of the loadings in a column in the factor matrix. Eigenvalues are also referred to
as latent roots and represent the amount of variance accounted for by a factor.
FACTOR: The smaller set of underlying composite dimensions of all variables in the data set. Factors are linear
combinations of the original variables.
FACTOR LOADINGS: These are the correlation coefficients between the variables and the factors. The variables with
the highest correlations provide the most meaning (in an interpretation sense) to the factor solution. The sum of the squared loadings for a given factor sum to the eigenvalue for that factor.
FACTOR MATRIX: This k variable by m factor matrix contains the factor loadings of all variables on each factor.
FACTOR ROTATION: Given a cartesian coordinate system where the axes are the factors and the points are the
variables, factor rotation is the process of holding the points constant and moving (rotating) the factor axes. The
rotation is done in a manner so that the points are highly correlated with the axes and provide a more meaningful
interpretation of the factor solution.
FACTOR SCORES: This is the score of each observation on the newly identified factors. This factor score is a linear
combination of all of the original variables that were relevant in making the new factor.
GAMA OF ROTATION: A user input parameter that leads to different rotation schemes. Standard values of gama
include 0 (for quartimax, quartimin, direct quartimin), .5 (for bi-quartimin), and 1.0 (for varimax and covarimin).
KAISER NORMALIZATION: A process by which each row of the initial factor loading matrix is normalized by
dividing by the square root of hi, the row's commonality. This normalization has the effect of making the sum of
squares for each row sum to 1.0. This transformation does not affect the varimax solution.
OBLIMIN: Also called simple structure and refers to the rotated factor loadings matrix. Simple structure is difficult to
define in that it refers to the situation where most of the loadings on any specific factor are small and a few loadings
are as large as possible.
OBLIQUE FACTOR SOLUTIONS: A computed factor solution where the extracted factors are not independent, but
are correlated. In many siutations, there is no arbitrary (or theoretical) reason why the factors should be independent of
each other. The analysis is conducted to express the relationship between the factors that may or may not be
orthogonal; rather than arbitrarily constraining the factor solution so that the factors are independent of each other.
ORTHOGONAL: Refers to mathematical independence of the factors. Operationally, orthogonal factor axes are at
right angles to each other (90o).
ORTHOGONAL FACTOR SOLUTIONS: The directional cosines of the angle between the factors in the factor
solution corresponds to the correlations between the factors. Orthogonality refers to no correlation and is synonimous
to a 90o angle in a cartesian coordinate system. Orthogonal factor solutions then extract the factors so that the factor
axes are maintained at right angles. Thus each factor is independent of all other factors and the correlation between
the factors is zero.
SQUARED FACTOR LOADINGS: Because loadings are the correlation between the variables and the factors, the
squared factor loadings could be compared to R-Square in a regression analysis. The squared factor loadings indicates the percentage of the variance of the original variable is explained by the factor. For a given factor, the sum
of these squared factor loadings is the eigenvalue or latent root associated with that factor.
TRACE: It is the Sum of Squares of the numbers on the diagonal of the correlation matrix used in the factor analysis,
the trace is equal to the number of variables, based on the assumption that the variance in each variable is equal to 1.
With the common correlation matrix, the trace is equal to the sum of the communalities on the diagonal of the reduced
correlation matrix which is also equal to the amount of common variance for the variables being analyzed.
VARIMAX ROTATIONS: An orthogonal rotation of factors that redistributes the variance accounted within the
pattern of factor loadings. Both the communalities and the total variance accounted for are the same before and after
rotation. This procedure is the most commonly used to re-orient or clean up the loadings obtained in a principal
components analysis.
AN EXAMPLE OF FACTOR ANALYSIS
Factor analysis may be run based on either a raw data set or a correlation matrix. Initial communality estimates may be squared
multiple correlations, regression variances, maximum absolute row values, or they may be specified
by the user. If requested, the program will iterate on the initial communality estimates. Multiple types
of rotations are available, all based on the oblimin criterion. In the first, the factors are restricted to be
non-orthogonal, which yields quartimax and varimax rotations (as well as other rotational solutions). In the second, the
criterion is applied to the reference factor structure and the factors are allowed to be oblique which
yields standard oblimin rotations. In the third, the factors are applied to primary factor loadings,
allowing the factors to be oblique and yielding simple loading rotations.
Typical Results:
Typical factor analysis output includes:
1) Mean and Standard Deviation for the variables.
2) Variance-Covariance Matrix
3) Correlation Matrix
4) N Matrix
5) Eigenvalues
6) Cumulative proportion of total variance
7) Proportion of Variance per Eigenvalue
8) Factor Matrix before rotation
9) Rotated Factor Matrix
10) Factor Score Coefficients
Factor Analysis Sample Output
PC-MDS
FACTOR ANALYSIS
ANALYSIS TITLE BMD08M TEST DATA
INPUT DATA FILE A:FACTOR.DAT
OUTPUT PRINT FILE A:FACTOR.PRN
NO. OF VARIABLES 5
DATA TREATED AS HAVING NO MISSING VALUES
DATA FOR RECORD: 1
.70E+01 .90E+01 .60E+01 .50E+01 .20E+01
DATA FOR RECORD: 20
.40E+01 .60E+01 .40E+01 .20E+01 .80E+01
VARIABLE MEAN STAND. DEV. MINIMUM MAXIMUM
V1 4.7500 2.53138 1.00000 9.00000
V2 5.4500 2.08945 2.00000 9.00000
V3 4.4500 2.28208 .00000 9.00000
V4 4.6500 2.32322 2.00000 9.00000
V5 4.6500 2.36810 1.00000 9.00000
CORRELATION MATRIX
V1 .10000E+01
V2 .42042E+00 .10000E+01
V3 .17538E+00 .61757E+00 .10000E+01
V4 .22597E+00 -.20438E+00 -.27647E+00 .10000E+01
V5 -.37534E+00 .20061E+00 -.12515E+00 .38792E+00 .10000E+01
1 2 3 4 5
N-MATRIX
V1 20
V2 20 20
V3 20 20 20
V4 20 20 20 20
V5 20 20 20 20 20
1 2 3 4 5
FACTOR ANALYSIS SUMMARY STATISTICS
NUMBER OF CASES 20
NUMBER OF VARIABLES 5
MAX. ITERATIONS FOR COMMUNALITIES 1
MAX. ITERATIONS FOR ROTATION 50
NUMBER OF FACTORS TO BE ROTATED 2
EIGENVALUE CUTOFF CONSTANT 1.000000
UPPER LIMIT ON CORRELATION COEFFICIENT .95000
DIAGONAL ELEMENTS ARE UNALTERED
VARIMAX ROTATION IS PERFORMED
EIGENVALUES
2.08418 1.25547 1.04697 .36381 .24957
CUMULATIVE PROPORTION OF TOTAL VARIANCE
.41684 .66793 .87732 .95009 1.00000
PROPORTION OF VARIANCE PER EIGENVALUE
VARIANCE PERCENT
..............................................................
. .
.4168 .*********** .
.*********** .
.*********** .
.*********** .
.2779 .*********** .
.*********** .
.*********** *********** .
.*********** *********** *********** .
.1389 .*********** *********** *********** .
.*********** *********** *********** .
.*********** *********** *********** .
.*********** *********** *********** *********** .
.*********** *********** *********** *********** *********** .
..............................................................
EIGENVALUE 0 0 0 0 0
1 2 3 4 5
VARIABLE ESTIMATED FINAL
COMMUNALITY COMMUNALITY
V1 1.000000 .817099
V2 1.000000 .711402
V3 1.000000 .560990
V4 1.000000 .884644
V5 1.000000 .365514
FACTOR MATRIX BEFORE ROTATION
VAR# VARIABLE NAME FACTOR
1 2
1 V1 .55331 .71481
2 V2 .82906 .15512
3 V3 .74201 -.10205
4 V4 -.44063 .83096
5 V5 -.58819 .13983
ORTHOGONAL ROTATION
ITERATION SIMPLICITY
CRITERION
0 -1.095068
1 -1.095877
2 -1.095877
FACTOR - 1 VARIANCE ACCOUNTED FOR: .4168
VARIABLE
2 V2 .82062
3 V3 .74606
5 V5 -.59424
1 V1 .51822
4 V4 -.48016
FACTOR - 2 VARIANCE ACCOUNTED FOR: .2511
VARIABLE
4 V4 .80876
1 V1 .74064
2 V2 .19489
5 V5 .11132
3 V3 -.06618
ROTATED FACTOR MATRIX:
VAR# VARIABLE NAME FACTOR
1 2
1 V1 .51822 .74064
2 V2 .82062 .19489
3 V3 .74606 -.06618
4 V4 -.48016 .80876
5 V5 -.59424 .11132
FACTOR SCORE COEFFICIENTS
VAR# VARIABLE NAME FACTOR
1 2
1 V1 .2377 .5815
2 V2 .3914 .1426
3 V3 .3595 -.0640
4 V4 -.2431 .6509
5 V5 -.2873 .0976
FACTOR ANALYSIS COMPLETE, NORMAL END OF PROGRAM
FACTOR ANALYSIS Technical Appendix



|