导图社区 探索性因子分析EFA全过程详解
根据《Multivariate Data Analysis》第八版中第三章中对于EFA的讲解制作而成。个人学习使用,分享给大家。不足之处接受批评。该书畅销多年,为多元数据分析的秘籍宝典。
编辑于2021-12-28 19:06:52EFA
区别
EFA
take what the data give you
do not set any a priori constraints or the number of components
C3
CFA
Researcher has preconceived structure of the data based on theories
To assess the degree to which the data meet the expected theoretical structure
C9&C10
confirmatory composite analysis
PLS 中应用,C13
Stage 1: objectives of Factor Analysis
specifying the unit of analysis
Variables
Cases
常用cluster analysis解决这一类问题
achieving data summarization and/or data reduction
data summarization
Stage 1-6
最小化因子个数, 最大化信息保留
形成的因子只为解释variable set而不是要预测什么
data summarization without interpretation
principal components analysis
data summarization with interpretation
common factor analysis / principal components analysis
用于scale development
there is a prespecified structure that is hopefully revealed in the analysis
conceptual definitions
reliability
validity
Data reduction
Stage 7
最小化values,最大化信息保留
summated scales
variable selection
using exploratory factor analysis results with other multivariate techniques
Stage 2: Designing an exploratory Factor Analysis
Variable selection
Metric variables
Dummy variables
若全是Dummy,选择Boolean factor analysis
Sample
不少于50; 100或更多;200或更多
5:1; 10:1: 20:1
Pilot可以适当减少
communalities are .70 or above:100; communalities .40-.70: 200 communalities below .40: 400
Q & R EFA
R: correlations among variables
Q: correlations among respondents
Cluster analysis:distance-based similarity measure (参见Fig3.4)
Stage 3: Assumptions in exploratory Factor Analysis
这步是检验前提条件,不满足这些条件,说明variables没有structure,则无需做EFA
Conceptual Issues
It is the researcher's responsibility to ensure the meaning of factor extracted
ensure that the sample is homogeneous
Statistic Issues
Assuming the researcher has met the conceptual requirements for the variables included in the analysis, the next step is to ensure that the variables are sufficiently intercorrelated to produce representative factors.
检验正态分布等,Chapter 2
Overall Measures of Intercorrelation
visual inspection: reveals a small number of correlations among the variables
EFA greater than 0.3
partial correlation:the correlation that is unexplained when the effects of other variables are taken into account (anti-image correlation matrix)
EFA High >.7
EFA Small
Bartlett test of sphericity: 显示至少某些variables存在high correlations;sample size越大越敏感
measure of sampling adequacy (MSA)
需要高于0.5,否则需要根据variable-specific MSA values来删除相应variable
Variable-Specific Measures of Intercorrelation
individual variable MSA
一个一个删除,只删除最小值,然后重新计算数值。 直到数值大于0.5
EFA
Stage 4: Deriving Factors and Assessing overall Fit
Selecting The Factor Extraction Method
Partitioning the Variance of a Variable
Common variance = communality / increasing when a variable is more highly correlated with one or more variables / measured by MSA
Unique variance
Specific variance : associated uniquely with a single variable
Error
Common Factor Analysis Versus Principal Component Analysis
(1) the objectives of the factor analysis and (2) the amount of prior knowledge about the variance in the variables
Principal component analysis
summarize most of the original information (variance) in a minimum number of factors for prediction purposes
prior knowledge suggests that specific and error variance很小
results are used as a preliminary step in the scale development process
Common factor analysis
identify the latent dimensions or constructs
as typified in the scale development process
little knowledge about the amount of specific and error variance
Criteria for the number of factors to extract
conceptual foundation (How many factors should be in the structure?) empirical evidence (How many factors can be reasonably supported?)
A Priori Criterion:已经知道了要多少factors
Latent Root Criterion
保留factor条件:latent roots or eigenvalues greater than 1
Reliable条件: variables is between 20 and 50 and communalities above .40
因其过于简单,通常作为第一步。
Percentage of Variance Criterion
natural sciences: at least 95 percent
social sciences: 60 percent or even less. 没有一定之规
变体: selecting enough factors to achieve a prespecified communality for each of the variables。
Scree Test Criterion
画 latent roots-factor个数 散点图,找拐点。拐点可含可不含。
Parallel Analysis
Factors above the threshold established by parallel analysis
Heterogeneity of the Respondents
Alternatives To Principal Components And Common Factor Analysis
Factor Analysis of Categorical Data
Variable Clustering
特殊情况,需要再查阅,P164-166
Stage 5: interpreting the Factors
1. Estimate the Factor Matrix
Factor extraction: estimating the factors and loadings
principal components
common factor
principal axis factor/iterated principal factor analysis
alpha factoring and image factoring
也不知道是啥
communality estimates exceed 1.0的变量需要删除。
根据the diagnostic information provided by the MSA判断
留意very high bivariate correlations
2. Factor Rotation
旋转一下,will simplify the factor structure (i.e., have each variable load highly on only one factor)
Orthogonal Versus Oblique Rotation
Orthoganal
axes are maintained at 90 degrees, factors are not correlated
well developed, 各种软件都有
factor scores are orthogonal which may be useful for other multivariate data analysis
Oblique
axes are maintained at 90 degrees, factors are correlated
Not well developed, 还有些争议
Theoretically more accurate as dimmensions are correlated
Orthogonal Rotation Methods
结论:用VARIMAX
VARIMAX
simplifying the columns of the factor matrix
give a clearer separation of the factors
more invariant
the most widely used
QUARTIMAX
simplify the rows of a factor matrix
并不能简化factor structures
EQUIMAX
上述两者中和,并不常用。
Oblique Rotation Methods
IBM SPSS provides OBLIMIN
SAS has PROMAX and ORTHOBLIQUE
If interpretation is important, performing both rotational methods provides useful information on the underlying structure of the variables and the impact of orthogonality on the interpretation of the factors.
3. Factor Interpretation and Respecification
因为4种原因,可以从stage 4 重做
Judging the Significance of Factor Loadings
Ensuring Practical Significance
practical, not statistical, significance 也就是没那么严格
Factor loadings less than ±.10 can be considered equivalent to zero for purposes of assessing simple structure
Factor loadings in the range of ±.30 to ±.40 are considered to meet the minimal level for interpretation of structure.
Loadings ±.50 or greater are considered practically significant.
Loadings exceeding ±.70 are considered indicative of well-defined structure and are the goal of any factor analysis
Assessing Statistical Significance
很严格,比conventional correlation还严格,可在此基础上适当放宽条件
Interpreting a Factor Matrix
Step 1: Examine the Factor Matrix of Loadings
Rotated & unrotated
用上面的标准判断significance
factor pattern matrix & factor structure matrix
oblique rotation only: 通常report前者
Step 2: Identify the Significant Loading(s) for Each Variable
simple structure solution
cross-loading
P175 删除不合格variable的条件
Step 3: Assess the Communalities of the Variables
每个变量是否至少有一个significant loading
每个变量communalities要大于.50
Step 4: Respecify the Factor Model if Needed
(a) a variable has no significant loadings
(b) even with a significant loading, a variable’s communality is deemed too low
(c) a variable has a cross-loading
P176 很多方法可以解决这些问题
Step 5: Label the Factors
Stage 6: Validation of exploratory Factor Analysis
Use of Replication or A Confirmatory Perspective
a split sample in the original dataset or with a separate sample
操作性不强,可用CFA替代
confirmatory factor analysis (CFA)
Assessing Factor Structure Stability
增加cases-to-variables ratio,方可保证factor structure
拆分sample,做完后对比
Detecting Influential Observations
Chapter 2 the identification of outliers
Chapter 5 the influential observations in regression
Stage 7: Data Reduction—Additional Uses of exploratory Factor Analysis Results
区别
Stage 1-6 data summarization:可就此打住,也可作为initial examination of the data preceding a confirmatory factor analysis
Stage 7 data reduction: 形成新量表
选用factor中最具代表性variable
形成新的量表:summated scales;factor scores
Selecting Surrogate Variables For Subsequent Analysis
略过
Creating Summated Scales
P199的example值得一看
formed by combining several individual variablesinto a single composite measure
定义
Reducing Measurement Error
Represent Multiple Aspects of a Concept in a Single Measure
Represent Multiple Aspects of a Concept in a Single Measure
优点
scale development
conceptual definition
Content validity / face validity
ratings by expert judges
pretests with multiple subpopulations
dimensionality
assess unidimensionality
每个variable只在一个factor上有high loading。即不可cross-loading
EFA
CFA
Reliability
Test-retest at different time point
Internal consistency
Single items
item-to-total correlations exceed .50 the inter item correlations exceed .30
Cronbach’s alpha
.70 通常情况下
.60 in exploratory research
items越多,此值越高,需其他严格条件控制 超过10个即为多
CFA Measures
composite reliability
the average variance extracted
Construct validity
Convergent Validity
Discriminant Validity
Nomological Validity
实操方法见P182的other source 以及 P199的example
Calculating Summated Scales
Computing factor scores
缺点是很难复制,但有些软件如PROC SCORE in SAS可解决
Selecting Among The Three Methods - P184