The Examination for the Professional Practice of Psychology: An Examination of Convergent and Discriminant Validity

Project Title: The Examination for the Professional Practice of Psychology: An Examination of Convergent and Discriminant Validity

Project Lead: Samantha Saldaña, M.S. (PI/mentor: Callahan) 

UNT IRB#: IRB-21-653

Funding Source: via a gift from the Lupe Murchison Foundation (Drs. Callahan and Ruggero, co-PIs)

Study Status: Manuscript under review with a peer-reviewed journal

Background to the Study:

The Examination for Professional Practice in Psychology (EPPP) is maintained by the Association of State and Provincial Psychology Boards (ASPPB) Examination Committee, which is a non-profit association comprised of psychology licensing boards in the United States and Canada. The EPPP serves to assess breadth and depth of knowledge in psychology across eight primary areas: biological bases, cognitive-affective, social and cultural, growth and lifespan development, assessment and diagnosis, treatment, intervention, and prevention, research and statistics, and finally ethical/legal/professional issues (ASPPB, 2020). Critically, the exam is documented with acceptable reliability in both split half and Kuder-Richardson domains (Rosen, 2000) and is regularly recalibrated by way of practice analyses. Surveys for practice analyses are typically administered through licensing boards, training organizations, and professional organizations to gain a variety of responses with the most recent one having taken place in 2016 (ASPPB, 2020). Currently the EPPP is being expanded into two parts, with part 1 being the knowledge-based exam referred to above. The newly developed part 2 focuses on knowledge of skills and was developed by ASPPB and Pearson VUE to supplement a need for competency-based evaluation (Callahan et al., 2020).

Foundationally, the review process for the exam is based on ASPPB Examination Committee (ExC) consensus for each item (ASPBB, 2020, p. 42-43). Items are developed by individuals with expertise in each area submitting questions which are then reviewed by a member of the Item Development Committee (IDC) for accuracy, relevance, contribution to public protection, and freedom from bias (ASPBB, 2020, p. 42-43; Sharpless, 2019). It should be noted that both committees, ExC and IDC, are appointed by ASPPB and are fellow professionals. Thus, the review process is largely based on professionals’ participation. However, while test items are created and reviewed by professional psychologists and are deemed essential knowledge for competent psychologists, test takers – future potential professional psychologists – disclose perceiving many questions as irrelevant (Sharpless & Barber, 2009).

EPPP test developers focus extensively on content validity, but concern has been raised that there may be a lack of attention to the need for broader validation (Callahan et al., 2020; Sharpless & Barber, 2009).[1] For example, women pass at slightly higher rates than men (Schaffer et al., 2012) and those with higher Grade Point Averages (GPA) exhibit higher exam scores (Sharpless & Barber, 2013). Associations between EPPP scores and GRE general test scores (r = .78) (Sharpless & Barber, 2013) have also been indicated. Notably, this association is stronger than the association between the EPPP and psychology subject test scores (r = .24) (Sharpless & Barber, 2013). The differing relationships between GRE tests suggests, along with the associations between EPPP scores and GPA, that factors unrelated to knowledge of psychology may be impacting EPPP scores. Among those influences, may be a variety of biases.

The EPPP has a documented history of linguistic bias, favoring native English fluency (Callahan et al., 2020; Callahan et al., 2021). In a highly visible exchange, ASPPB developed a bilingual Spanish EPPP (S-EPPP) as part of an agreement with Puerto Rico when the territory became a member of ASPPB (Law 281-2012). Preparation of the S-EPPP was rushed, stakeholders were not involved, and the validation process was not adequate (Law 183-2015). These missteps resulted in an exam with an unusually high failure rate that negatively affected the workforce (Law 183-2015). As a result, the S-EPPP has since been discontinued (ASPPB, 2016). Rather than fixing the problems, the solution to date has been to simply not require the exam in locations where other languages predominant, including Puerto Rico and Quebec (ASPPB, 2020; Callahan et al., 2021).

Research utilizing the Freedom of Information Act offers further evidence of bias within the existing the EPPP. Drawing data from a dense and diverse state, Sharpless (2019) found that minorities, particularly Blacks and Hispanic candidates, score lower on the EPPP than their white majority peers. The failure rate for Black and Hispanic applicants was also found to be much higher, 2.5 times higher, than the rate for Whites. In fact, minority applicants fell below an 80% pass rate, which is the threshold for the four-fifths rule (APA, 2020). The four-fifths rule is a conventional rule of thumb for determining if discrimination is taking place and if an employer is engaged (unintentionally or unwittingly) in systematic discrimination (APA, 2020).

On the side of unwitting discrimination as explanatory, the EPPP has not historically inquired about demographic information prior to administering the exam (Sharpless & Barber, 2009). According to an officer of ASPPB, “Demographic variables other than gender (e.g., age, race, ethnicity) are not requested because of state and provincial prohibitions against asking such information in an environment where an applicant is required to complete this examination to gain entry to the profession (DeMers, 2009, pg. 350-351; Sharpless & Barber, 2013).” Yet, ignorance does not excuse poor outcomes. The standards for education and psychological testing, developed by the American Educational Research Association (AERA), explicitly task test developers with minimizing construct-irrelevant variance (AERA, APA, & NCME, 2014). Further, ASPPB appears to have subsequently determined they were not actually prohibited from gathering such information as they began routinely gathering demographic information in recent years. Unfortunately, the question of whether EPPP score variability is accounted for by construct-relevant variance, construct-irrelevant variance, and/or random variance remains unanswered (Sharpless, 2019).

While content validation has been a primary focus for ASPPB, additional validation efforts are prudent and necessary (Sharpless & Barber, 2009). As described by the American Educational Research Association (2014), “Sound testing practice involves careful monitoring of all aspects of the assessment process and appropriate action when needed to prevent undue disadvantages or advantages for some candidates caused by factors unrelated to the construct being assessed” (p. 169). The current study aimed to examine whether candidates’ scores on the EPPP are systematically influenced by non-competency factors. For example, intelligence and previous academic achievement are both positive indicators for successful competency development, though they are not profession-specific competencies in and of themselves (Humphreys et al., 2017; Kuncel et al., 2001).

The current study aimed to examine the construct validity of the existing EPPP via examination of associations between EPPP scores and retrospective/postdictive, convergent, and discriminant measures. Based on the available literature, we hypothesized that EPPP scores would vary systematically by race/ethnicity with underrepresented minorities obtaining lower EPPP scores than White, non-Hispanic examinees. We further hypothesized that EPPP scores would not significantly correlate with retrospective/postdictive or convergent measures but would correlate significantly with discriminant measures.  

Measures:

In addition to demographic information, data were gathered from the following standardized measures.

Trainee Evaluation. Archival Practicum Evaluation Form (PEF) data were drawn for the purpose of examining retrospective/postdictive validity. The PEF aligns with the competency framework presented in the Benchmarks document (Fouad et al., 2009) and was developed as a 52-item questionnaire to be completed by supervisors of internal and external practicum experiences (Price et al., 2017). The PEF yields very high person reliability (r =. 99) and high item reliability (r = .92), with no evidence of ceiling or floor effects in many-facet Rasch measurement (an item response theory approach) analyses (Price et al., 2017). Functional competencies are significantly associated with client attrition and treatment outcome scores on a standardized measure of psychotherapy outcome (Dimmick et al., 2022). In the current sample, coefficient alphas for foundational areas of competency (professionalism; reflective practice/self-assessment/self-care; scientific knowledge and methods; relationships; individual and cultural diversity; ethical, legal standards, and policy; and interdisciplinary systems) were found to be .93, .91, .94, .91, .95, .95, .86, respectively. Functional competencies include the following: Assessment; intervention; consultation; supervision; management-administration; advocacy. Alpha coefficients were found to be .87, .96, .93, .93, .81, .82, respectively.

The Intern Evaluation Form (IEF) was utilized to capture an intern’s competency perception by supervisor. The intern competency measure is the last point at which competency is evaluated by supervisors prior to the licensure examination. A total of 55 items capture both the foundational and functional competencies included in A Practical Guidebook for the Competency Benchmarks (2012). Of note, creation of the guidebook was specifically funded by a grant from ASPPB (which is also the EPPP developer). Behavioral anchors for readiness for internship and readiness for entry to practice directly correspond to those provided within the linked rating forms associated with each of those levels. Within the current sample, coefficient alphas are as follows: .81, .97, .91, .88, .70, .91, .95, .94, .88, .97, .95, .92, .90, respectively.

Inherent to entry into independent practice, licensed psychologists are expected to engage in reflective practice and self-monitoring of competency. Alumni were therefore asked to self-rate their foundational and functional competencies on the same 55-items found in the IEF. Within the current sample, coefficient alphas are as follows: .51, .98, .94, .93, .90, .82, .94, .97, .96, .98, .98, .96, .93, respectively. Item deletions were explored but did not result in sufficient improvement to the internal consistency of the early career professionalism subscale; it was therefore excluded from analyses.  

Examination for the Professional Practice of Psychology (EPPP). According to ASPPB, much of the development process focuses on content validity and the most recent study of validation was completed in 2010 (ASPPB, 2022). The number of correct responses on the exam determines a candidate’s score, with the scaled score ranging from 200 to 800 (ASPPB, 2020). A score of 500 is required for doctoral level practice in most jurisdictions (ASPPB, 2020). Participants provided score reports with their EPPP scaled score. Within the current sample, the obtained mean scaled score was 640.39 (SD = 67.47).

Cognitive Abilities. Cognitive information was gathered utilizing the TestMyBrain (TMB) Neuropsychology Toolkit, a not-for-profit product developed by Harvard Medical School and McLean Hospital (Inter organizational practice committee, n.d.). All tests in the toolkit were developed to parallel well-established neuropsychological measures. Previous research has found that individuals who perform well on standardized tests tend to have high working memory, processing speed, and abstract reasoning skills (Finn et al., 2014). Based on those findings, four tests were selected for use in the current study: Matrix Reasoning (abstract reasoning), Digit Span (span of apprehension/working memory), Trail Making (processing speed/working memory) and Verbal Paired Associates (memory). The Digit Span and Matrix Reasoning tests were adapted from measures of the same name in the Wechsler Adult Intelligence Scale—Fourth Edition (WAIS-IV; 2008). The TMB Trail Making Test was adapted from the widely available Trail Making Test (TMT) (Goetz, 2003). The Verbal Paired Associates Memory test was adapted from a subtest of the same name within the Wechsler Memory Scale—Fourth Edition (WMS-IV; 2009).

Emotional Intelligence. The Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT) consists of 141 items (digitally administered) to produce a total score as well as four domain scores: Ability to perceive emotions, ability to communicate feelings or use them in other processes, ability to understand emotions, and ability to manage emotions (Mayer et al., 2002). Content validity, discriminant, and predictive validity have also been indicated for the MSEIT (Brackett & Mayer, 2003; Mayer et al., 1999; Mayer et al., 2002; Pellitteri, 2002; Salovey et al., 2003). Within the current sample, the mean total score (119.46, SD = 14.69) is within the skilled (above average) range (Mayer et al., 2002).

Openness to Experience (Openness). Conceptualized as a discriminant construct, Openness to Experience was measured via 34 items drawn from the International Personality Item Pool (IPIP). The IPIP Openness to Experience measure strongly correlates with Costa and McCrae’s (1995) NEO domain by the same name (r = .87) but was preferred due to the likelihood of it being less familiar to participants since it is not routinely taught in the coursework (e.g., clinically focused assessment courses) completed by the targeted sample of participants. Coefficient alpha for each facet was found to be .66, .85, .66, .80, .56, .60, respectively, in the current sample. Following investigation into individual items in each scale, three items were deleted from the Imagination, Intellect, and Liberalism facets (one item per facet) to increase coefficient alphas (Imagination a = .71; Intellect a = .61; Liberalism a = .67). The improved facets were utilized in hypothesis testing.

Findings: 

A Shapiro-Wilk test confirmed that EPPP scores, the dependent variable, were normally distributed in this sample, W(33) = 0.96, p = .222. Analyses of Variance (ANOVAs) revealed significantly higher exam scores for White, non-Hispanic individuals, Welch’s F(3,29) = 4.31, p = 0.012, w2 = 0.23.

Kendall’s Coefficient of Rank analyses indicated that EPPP scores were significantly lower for individuals who were rated higher during practicum (using the PEF) for their competency in professionalism (τb = -0.15, p = .008, τb2 = 0.02), reflective practice/self-assessment/self-care (τb = -0.12, p = .030, τb2 = 0.01), relationships (τb = -0.14, p = .016, τb2 = 0.02), individual and cultural diversity (τb = -0.16, p = .006, τb2 = 0.03), and ethical legal standards policy (τb = -0.14, p = .016, τb2 = 0.02). No significant associations between EPPP scores and internship competency ratings on the IEF were identified. Associations between early career self-appraisals of competency and EPPP scores were also non-significant.

With respect to the discriminant emotional and cognitive abilities, findings are mixed. Associations between EPPP scores and emotional intelligence scores were found to be nonsignificant. In contrast, among the cognitive abilities evaluated, only the Digit Span scores were found not to significantly correlate with EPPP scores. Individuals with higher scores on TMB Matrix Reasoning, Trail Making Test A and B, and Verbal Paired Associates exhibited higher EPPP scores. All examinations between EPPP scores and Openness to Experience Facet scores were nonsignificant, with the singular exception of the Artistic Interests facet. Analyses indicated individuals with higher Artistic Interests scored significantly lower on the EPPP, r(30) = -0.32, p = .035, r2 = .10.

Tests of differences between correlations revealed significantly strongly correlations between EPPP scores and discriminant measures (TMB Matrix Reasoning, Trail Making Test A and B, Digit Span Backward, and Artistic Interests) compared to correlations between exam scores and convergent measures (PEF Professionalism, Reflective Practice, Scientific Methods, and Relationships). Analyses are summarized in the Table below.

 

T-tests of Significant Difference for Discriminant and Convergent Measures.

EPPP Scores

Convergent Measures

 

                   

PEF   Professionalism

 

PEF Reflective Practice

 

PEF Scientific Methods

 

PEF Relationships

 

Discriminant Measures

t

p

Finding

t

p

Finding

t

p

Finding

t

p

Finding

 

TMB Matrix Reasoning

76.60

.008

Divergent measure stronger

75.08

.012

Divergent measure stronger

74.24

.015

Divergent measure stronger

75.97

.009

Divergent measure stronger

 

TMB Trails A

71.11

.035

Divergent measure stronger

69.61

.049

Divergent measure stronger

68.76

.061

Divergent measure stronger

70.50

.040

Divergent measure stronger

 

TMB Trails B

70.96

.036

Divergent measure stronger

69.45

.052

Divergent measure stronger

68.61

.063

Divergent measure stronger

70.35

.042

Divergent measure stronger

 

TMB Digit Span Backward

70.81

.037

Divergent measure stronger

69.30

.054

Divergent measure stronger

68.46

.065

Divergent measure stronger

70.20

.043

Divergent measure stronger

 

Openness Artistic Interests

69.62

.050

Divergent measure stronger

56.37

.524

Not

significant

57.25

.468

Not significant

55.54

.579

Not significant

 

                                               

Implications:

Findings from the current study strongly suggest that factors other than professional competency influence candidates’ scores on the gatekeeping exam required for licensure as a psychologist. In the current sample, measures of discriminant constructs demonstrated significant positive correlations with obtained scores on the national licensure exam, the EPPP. More specifically, scores on multiple neuropsychological tests gathered during the study (i.e., Matrix Reasoning, Trails A and B, Verbal Paired Associates) significantly positively correlated with EPPP exam scores. The current study offers further support for the premise offered by Sharpless and Barber (2013) that knowledge of psychology may not be as important to EPPP performance as test taking or other abilities.

Based on the extant literature demonstrating a positive association between clinician emotional intelligence and positive client outcomes (Rieck & Callahan, 2013), the current findings were expected to reveal a positive association between professional competencies and emotional intelligence to provide convergent validity support for the EPPP. Emotional intelligence scores in the current sample are similar, somewhat higher, to those found in Rieck and Callahan (2013). Yet, in this sample, EPPP scores did not significantly correlate with any of the MSCEIT’s four emotional intelligence domains or the total score. In contrast, multiple correlations between MSCEIT scores were significantly positively correlated with PEF scores. Approximately 32% of correlations between the two constructs (MSCEIT and PEF) were significantly related and indicated that individuals with higher emotional intelligence were rated higher by supervisors on multiple scales of competency. This observation indicates support of the MSCEIT as a convergent measure of competency and we should have seen some relationship to EPPP scores. In contrast to expectations, the strength of association between EPPP scores and discriminant measures were typically (82% of comparisons) significantly stronger than the associations between EPPP scores and convergent measures.

Important to realize, the EPPP serves to reflect candidates’ readiness for professional practice (ASPPB, 2017). As stated in the background description, there is speculation that the exam is not sufficiently relevant to practice (Sharpless & Barber, 2009). Theoretically candidates for licensure should possess competency across broad swaths of scientific orientation, assessment and intervention, relational, professionalism, ethnical practice, and collaboration, consultation, and supervision domains of foundational knowledge and functioning (ASPPB, 2017). Each of these areas are captured by the competency form (PEF) used in the current study (Price et al., 2017) and the measure was utilized to answer the question if the exam captures relevance to the practice of psychology. However, results indicate that ratings of competency did not influence exam scores. Individuals rated as more competent by supervisors and higher self-appraisals of competency did not exhibit better performance on the licensing exam. In other words, results indicated that participants’ exam scores were not significantly positively related to competency ratings.

Significantly, it is worth noting that pre-internship competency ratings for professionalism, reflective practice/self-assessment/self-care, relationships, individual and cultural diversity, and ethical legal standards and policy were significantly negatively related to exam scores. In other words, participants rated higher by supervisors on these subscales of competency scored lower on the EPPP. Foundational professionalism captures basic integrity, deportment, accountability, concern for others’ welfare, and professional identity (Price et al., 2017; Fouad et al., 2009); reflective practice/self-assessment/self-care captures practice conducted with personal and professional self-awareness and reflection; relationships captures ability to relate effectively and meaningfully with individuals, groups, and/or communities; individual and cultural diversity measures awareness, sensitivity, and skills in working professionally with diverse individuals, groups, and communities while ethical legal standards policy captures application of ethical concepts and awareness of legal issues (Price et al., 2017; Fouad et a., 2009).

Recent research has supported use of the PEF capturing functional competency development across training subsets (Price et al., 2017; Yanouri et al.). Additionally, Dimmick and colleagues demonstrated concurrent, convergent, and external validity of PEF ratings as measures of competency. The study found that PEF ratings (i.e., clinician competency) were significant predictors of treatment outcomes and attrition among clients at a large outpatient clinic (Dimmick et al., 2022), seemingly emphasizing the importance of aligning competency determination with external referents.

Additionally, in the current study sample, White participants exhibited significantly higher exam scores than other races/ethnicities.  Notably, ethnicity was not correlated to measures of pre-internship, internship, and early career appraisals of competency during preliminary exploratory analyses. Thus, while minority students are likely meeting program and professional standards to practice psychology, the EPPP is likely posing an additional, needless barrier and facilitating possible systematic discrimination to professional practice (Sharpless & Barber, 2009; Ortiz, 2021). With demographic information not gathered prior to admission of the exam, information regarding variance in EPPP scores and demographic data is of upmost importance to reflect if there is a significant barrier impeding inclusion in the field of professional psychology. Considering the national and worldwide challenges affecting mental health currently, the need for mental health professionals will only grow in the upcoming years.

For More Information:

Additional details are provided and discussed in a manuscript currently under review for consideration of publication in a peer-reviewed journal. 

 

 

[1] Where validity is associated with accuracy, validation is concerned with appropriateness (Callahan et al., 2020).