Cite as: Archiv EuroMedica. 2025. 15; 6. DOI 10.35630/2025/15/Iss.6.611
Background: Construct validity is essential for psychiatric measurement tools that inform diagnostic decisions, treatment planning, and forensic evaluations. The four clinician rated instruments examined in this review MacCAT-T, PCL-R, PANSS, and CAPS represent distinct theoretical constructs that require a clear correspondence between their conceptual domains and empirical performance.
Aims: To critically evaluate the construct validity of MacCAT-T, PCL-R, PANSS, and CAPS by examining their theoretical grounding, consistency of empirical findings, stability of factor structures, and contextual limitations relevant to clinical use.
Methods: A structured narrative review was performed. One hundred forty five publications were identified and sixty two met predefined inclusion criteria. Studies were screened for psychometric relevance and appraised qualitatively with focus on construct representation, factor structure, reliability, and methodological limitations. Evidence was synthesized within two domains cognitive and behavioral assessment and psychiatric symptom rating.
Results: MacCAT-T demonstrates construct validity for decisional capacity, although age related cognitive variation restricts its applicability across the lifespan. PCL-R shows support for its core construct across genders and ethnic groups, yet factor instability and context dependent score variation limit its theoretical coherence. PANSS retains core symptom domains and supports shortened forms, although inconsistencies in factor structure weaken its construct definition. CAPS displays robust construct validity across trauma exposed groups, while symptom overlap with other psychiatric disorders and trauma specific variability impose interpretive constraints.
Conclusions: All four instruments show empirical support for construct validity, but the validity of each construct is conditional and shaped by population characteristics, contextual influences, and variability in factor structure. Accurate interpretation requires explicit attention to theoretical boundaries, methodological limitations, and the clinical conditions under which each construct remains stable.
Keywords: construct validity, psychiatric assessment, MacCAT-T, PCL-R, PANSS, CAPS
The measurement of psychopathological phenomena in psychiatry is based on scales that are used to diagnose patients. According to Max Hamilton, a clear division of the scales was proposed: intensity scales (which “measure severity of illness and also response to treatment”), prognostic scales, scales for selection of treatment by means of differential indicators, and scales for diagnosis and classification [1]. This statement is further supported by other publications which prove that the scales serve a purpose of indicating severity of a disease, frequency of occurring symptoms, intensity of effects on mental health and effects of treatment [2–4]. Moreover, the scales are also of help when “illness activity and response to treatment” are concerned [3]. The scales can be used to diagnose a variety of diseases such as eating, mood, anxiety and substance use disorders or “assessment of symptoms associated with psychoses” [2]. Psychiatric scales are generally classified as self-report or clinician-administered instruments [5].
Moreover, psychiatric assessment must account for age-related differences in both symptom presentation and diagnostic priorities. Children and adolescents, for instance, are often assessed not only for psychopathology but also for indicators of familial dysfunction, such as abuse or neglect, with clinicians emphasizing early detection and prevention. In contrast, adult psychiatric care increasingly follows the principles of precision medicine, aiming to tailor interventions to the individual's unique clinical and psychosocial profile [6,7]. Importantly, psychiatric measurement scales are not limited to individuals with diagnosed mental illness; they are also crucial in evaluating cognitive and emotional functioning in populations such as children and the elderly, who may experience age-related limitations or vulnerabilities that affect decision-making capacity and psychological well-being.
This review examines the construct validity of four influential psychiatric assessment instruments: the MacArthur Competence Assessment Tool for Treatment (MacCAT-T), the Psychopathy Checklist–Revised (PCL-R), the Positive and Negative Syndrome Scale (PANSS), and the Clinician-Administered PTSD Scale (CAPS). These tools were selected based on their broad clinical and forensic relevance, their conceptual diversity, and their prominent role in high-stakes decision-making across psychiatric and legal domains.
Construct validity refers to the degree to which a psychological measure accurately reflects the theoretical construct it is intended to assess. It involves both the quality of the measurement tool and the validity of the conceptual framework it draws from [8,9]. Cronbach and Meehl (1955) introduced the term to address the inadequacies of earlier, exclusively empirical validation methods, emphasizing the need to embed test development within solid theoretical foundations [10]. Contemporary perspectives, following Messick (1995), emphasize that construct validity entails the extent to which empirical evidence and theoretical rationale support the interpretation and use of test scores [11]. Each use of a measure is not only a test of the tool’s accuracy but also a test of the theory it is based on. This makes construct validity an ongoing, iterative process of validation rather than a fixed outcome.
Each instrument captures a distinct dimension of psychopathology or mental capacity.
MacCAT-T has been established as an authoritative reference and leading benchmark in clinical psychiatry [12]. The MacArthur Competence Assessment Tool for Treatment is employed to assess decisional capacity in treatment contexts. Thanks to the MacCAT-T doctors can state whether a patient is able to make an informed consent for treatment based on four fields. Specialists take into consideration if the patients understand relevant information regarding their disease and the most accurate treatment; analyze pros and cons of their decision; “appreciate the nature of one’s situation and the consequences of one’s choices”, and lastly if they are able to express a choice [12,13]. According to Breden and Vollmann answers to questions regarding patients’ competence to provide an informed consent are marked on a scale from 0 to 2 points [12]. The scale is described as: “2 points for adequate, 1 point for partially sufficient and 0 points for insufficient responses” [12].
The Psychopathy Checklist–Revised (PCL-R) is a psychiatric assessment instrument developed by Robert D. Hare. In 1991 this tool was used to “measure the clinical construct of psychopathy” but now, it also enables assessing the danger of possibly occurring violent behavior [14]. It means that nowadays this scale is used to operationalize psychopathy, however, long-term prediction of violent recidivism cannot be obtained yet [15]. PCL-R is a 20-item assessment tool. Each assesses the examinee across various interpersonal and affective traits. The test takes into account case history information and is usually conducted in a form of an interview. Each item in PCL-R is rated on a scale from 1 to 3 (“0 – clearly not present, 1 – may be present, 3 – clearly present”) resulting in a total score ranging from 0 to 40. According to most publications, when patients obtain a score above 30, they are classified as a psychopath [14]. The current structure of the PCL R consists of four factors (Interpersonal, Affective Traits, Lifestyle, Antisocial Behavior) and was introduced by Robert D. Hare in 2003. However, according to Hare et.al., who confronted the topic in 1990, the PCL scale was as well a 20-item scale, nonetheless it was composed of only 2 factors. The aforementioned factors indicated “Emotional Detachment (e.g., superficial charm, manipulativeness, shallow affectivity, absence of guilt or empathy)” – Factor 1 and “Antisocial Behavior (deviance from an early age, aggression, impulsivity, irresponsibility, proneness to boredom” – Factor 2 [16,17].
The PANSS – Positive and Negative Syndrome Scale is widely used for defining the severity of symptoms of schizophrenia [18]. The scale is divided into subgroups: PANSS-8 and PANSS-14. They are used to define positive and negative symptoms which imply remission of disease [19]. PANSS is a combination of eighteen items of the Brief Psychiatric Rating scale and twelve items of the Psychopathology Rating Schedule [18,20]. Thus, it consists of 7 items measuring positive symptoms, 7 items measuring negative symptoms and 16 items which measure general psychopathology. In addition, this scale is usually used to measure the outcome of non-psychopharmacological and psychopharmacological treatment [20]. Even though the scale was originally developed for diagnosing the severity of schizophrenia, it also found a use for examining treatment response in bipolar and schizoaffective disorder [21].
The Clinician-Administered PTSD Scale (CAPS) is the gold standard for evaluating posttraumatic stress disorder (PTSD). It is a psychological test used to determine severity and symptoms of PTSD, administered by a trained clinician [22]. Each symptom is assessed using a 5-point scale. The final score represents the sum of the individual symptom scores. The higher the score, the more severe the manifestations of PTSD [23]. Main aspects that are paid attention to are: a) evaluation of all PTSD criteria along with related features like dissociation; b) overall ratings of distress, impairment, response validity, symptom severity, and progress since the last assessment; c) both dichotomous (present/absent) and scale-based ratings for specific symptoms and the disorder as a whole; d) independent evaluation of symptom severity and frequency; e) prompts and rating scales with specific behavioral indicators; and finally – f) evaluation of the trauma connection for individual symptoms that are not inherently tied to the trauma (e.g., loss of interest, feelings of estrangement, difficulty concentrating) [22].
All four scales exert considerable influence in real-world applications – shaping diagnostic decisions, competence evaluations, criminal responsibility determinations, and risk assessments. Their practical importance underscores the need for rigorous scrutiny of their construct validity. Taken together, these scales provide a rich and varied foundation for critically evaluating the standards and limitations of construct validity in contemporary psychiatric measurement.
Moreover, each tool has prompted substantial methodological and theoretical debate. Questions persist regarding factorial structure (PANSS, PCL-R), sensitivity to contextual variation (CAPS), and the precise constructs being measured (MacCAT-T: decisional competence vs. acquiescence). Analyzing these instruments together enables a broader discussion of the conceptual and empirical challenges inherent in validating psychiatric constructs.
This review aims to critically assess the construct validity of four clinician-rated psychiatric scales – MacCAT-T, PCL-R, PANSS, and CAPS – by synthesizing current evidence on their theoretical clarity, empirical support, and clinical applicability. In doing so, it highlights the importance of aligning measurement tools with well-defined psychopathological constructs to support accurate diagnosis, ethical practice, and appropriate treatment planning.
Research questions corresponding to the stated aim:
This narrative review was conducted using a structured and predefined approach to identify, select, and analyze peer reviewed publications concerning the construct validity of four clinician rated psychiatric scales. The analysis focused on the MacArthur Competence Assessment Tool for Treatment, the Psychopathy Checklist Revised, the Positive and Negative Syndrome Scale, and the Clinician Administered PTSD Scale.
The literature search was performed in PubMed and Google Scholar from the inception of each database until August 2025. Both foundational psychometric works and recent validation studies were considered.
The search strategy combined predefined terms using Boolean operators. The following terms were applied in separate and combined searches:
Screening proceeded in three steps.
Sixty two publications met all criteria and were included in the final analysis.
Inclusion criteria:
Exclusion criteria:
Each included study underwent qualitative appraisal with attention to clarity of psychometric methodology, adequacy of sample size for factor analyses, appropriateness of statistical methods, reporting of reliability coefficients, and transparency of limitations. Elements derived from COSMIN guidance were applied in a narrative form. Studies with insufficient methodological detail were retained only when they contributed essential conceptual information relevant to construct level interpretation.
For each study information was extracted regarding study design, population characteristics, diagnostic groups, psychometric indicators, factor structures, reported strengths, reported limitations, and relevance to construct validity.
To ensure conceptual coherence the scales were analyzed within two domains:
Within each domain evidence was synthesized with attention to theoretical grounding, empirical support, factor structure, reliability across populations, applicability in clinical and forensic settings, and contextual limitations relevant to diagnostic or evaluative decisions.
The test is used to determine ability to give informed consent of patients of different age and suffering from wide and extensive range of diseases such as schizophrenia, anorexia nervosa and dementia [13,24–28]. Establishing whether a child is competent and mature enough some key aspects are taken into consideration, since prescribing drugs and treatments to youth arises debates concerning ethical dilemmas [29]. The issue primarily origins in insufficient amount of clinical test which results in greater chance of experiencing adverse reactions [29]. According to a systematic review published by Parmigiani et al., in a child-centered approach, in order to solicit an indication of the patient's consent to undergo the proposed care, specialist ought to inform patients about key aspects proposed by The Committee on Bioethics of the American Academy of Pediatrics. The aspects include: the nature of the disease, outcomes, and possible adverse side effects [29]. Moreover, according to Alderson et al. indicating patients’ maturity and ability to decide is also based on a duration of a disease [29]. Thus, deciding whether a minor shows capacity and ability to give an informed consent for treatment should not only be based on the MacCAT-T test, but also on other crucial factors.
No medical procedure can be performed without an autonomously given consent [12,30]. According to Breden and Vollmann giving an informed consent for treatment should be based on PDMC (Patient Decision Making Competence) [12]. The definition of PDMC, according to Hermann et al. is “the gatekeeper for a patient's right to self- determination defining whether the patient him- or herself or a surrogate has decisional authority regarding the medical decision at hand” [31]. When mental health is impaired, concerns may be posed whether an ill patient can provide a reliable and valid consent [12].
According to National Institute of Mental Health, schizophrenia is a disease classified as a complex mental disorder characterized by profound disturbances in cognition, perception, emotional regulation, and social functioning [32]. While the trajectory of the condition varies among individuals, it is typically chronic and can be both debilitating and profoundly impairing [32]. In Grisso’s et al. research dedicated to establishing capacity to make healthcare choices, comparison of results of psychiatric inpatients and civilians of MacCAT-T test was obtained [13]. The scale was divided into three main subgroups, each in accordance with what the MacCAT-T tests – understanding, reasoning and appreciation skills [13]. It has been demonstrated that only 33% of inpatients obtained the greatest score in understanding which indicates competence to give an informed consent [13]. Moreover, 90% of civilians scored the highest mark [13]. On the contrary, reasoning capabilities were rather low in both subjects – 20% of inpatients and 30% of civilians scored the best grade [13]. Ratings of appreciation were not conducted for civilians; thus, they will not be included in this comparison. Grisso’s et al. results imply that the MacCAT-T scale has its usage to test understanding skills when patients suffering from schizophrenia are concerned; however, it does not present an expected outcome for cognitive abilities. The same result is presented in Raffard’s et al. work regarding French population – the MacCAT-T scale applies to checking competence of schizophrenic patients to give informed consents [24]. The delineated results are additionally endorsed by Morena’s et al. research paper [25].
Anorexia nervosa is a serious psychiatric illness related to eating disorder resulting from distorted body image [26,33,34]. It is crucial to establish whether a patient is decisive when it comes to their medical fate due to severe threats that the illness poses, considering the fact that Anorexia nervosa has the highest mortality rate among psychiatric diseases [35]. Anorexia nervosa is associated with its negative consequences such as malnutrition, depression, anxiety, obsessive-compulsive disorder (OCD), and moreover, it is a crucial cause of “physical and psychosocial morbidity” [33,36,37]. In 2010 a study was performed on teenagers who either suffered from Anorexia nervosa or were chosen from a local community as a sample. It served a purpose of testing differences in various types of reasoning in the two abovementioned samples. Several components of reasoning were considered – consequential, comparative, generating consequences, logical consistency and overall reasoning. It turned out that only in terms of comparative reasoning unhealthy participants outperformed the healthy ones. Moreover, the sick performed worse when reasoning about health impacts (scoliosis and depression) was conducted. The study showed that patients who suffer from Anorexia nervosa should be regarded as capable of making an informed consent concerning their lot [26].
Dementia is a medical, neurological disorder that impairs patients’ normal functioning. Usually, elderly people are affected when Alzheimer’s disease and dementia are taken into consideration, nevertheless, younger adults can also suffer from it [38]. Several studies were conducted, and a clear conclusion was proposed – in order for the sick to give an informed consent, they need to be addressed in a simple language [27,28]. Doctors may provide details regarding the complexity of a procedure; however, they should adjust it to the level of neurological impairment [28]. Moreover, it is advised to limit reliance on attention and memory to the greatest extent feasible [27]. Keeping those limitations in mind, the MacCAT-T scale may be used to determine whether a patient who suffers from dementia is capable of giving an informed consent regarding medical treatments.
In conclusion, the MacArthur Competence Assessment Tool for Treatment is a relevant scale when it comes to stating a person’s ability to give an informed consent while suffering from schizophrenia, anorexia nervosa and dementia.
Numerous studies have been conducted on the construct validity of the Psychopathy Checklist–Revised (PCL-R), examining various aspects of this scale.
According to Jaber and Mahmound (2015), the PCL-R test is the leading questionnaire in the clinical field – it measures the clinical construct of psychopathy and predicts recidivism, violence, and treatment results [14].
In 2008, Flórez‐Mendoza et al. analyzed the factor structure and validity of the Psychopathy Checklist–Revised on a sample of 124 Brazilian male inmates [39]. All participants were assessed using the PCL R, the Personality Factorial Inventory to measure normal personality traits, the Standard Progressive Matrices to assess intelligence, and a DSM IV–based clinical interview. Criminal histories were extracted from prison records. The authors examined and compared multiple structural models of the PCL R. Total PCL R scores significantly correlated with the number of criminal offenses, while no meaningful associations emerged with personality traits or intelligence. In this study, the PCL R proved to be a valid and reliable tool for assessing psychopathy in this sample.
The study by Kennealy et al. (2007) analysed the construct validity of the PCL-R on a sample of female inmates [40]. The study included 226 volunteers. In the initial part of the study, participants underwent a diagnostic interview along with thorough reviews of their prison file records. Afterwards, each volunteer was subjected to the PCL-R, Cleckley's criteria, tests for antisocial personality disorder, as well as analyses of their criminal and social histories. Additionally, participants filled out self-report forms on substance abuse, underwent a normal-range personality assessment, and evaluation of intellectual functioning. The findings of this study confirmed the PCL-R two-factor and four-facet models applicability within female populations. However, the study also noted possible gender differences in the expression of psychopathic traits. For instance, certain items within the PCL-R can be exhibited by women in a different way, which shows the need for caution when assessing female offenders. This highlights the importance of considering gender-specific factors in the evaluation of psychopathy using the PCL-R. In summary, this study provides evidence for the construct validity of the PCL-R in female populations, while also emphasizing the need for gender-sensitive approaches in psychopathy assessment.
In 2006, Sullivan et al. studied the reliability and construct validity of the PCL-R across different populations [41]. A total of 83 Latino inmates participated in the study; their outcomes were compared with matched samples of African American and European American inmates. The investigation found that the PCL-R is a reliable tool for measuring psychopathy in Latino individuals, with similar patterns observed in external correlates. However, the authors of the study noted that there are some ethnic group differences in the relationships between psychopathy indicators and certain external correlates. This emphasizes the importance of taking a culturally sensitive approach.
The 2007 study by Laurell and Dåderman examined the reliability of retrospective PCL-R assessments conducted without interviews, using detailed psychiatric files and court records from 35 male homicide offenders [42]. Their findings showed that the PCL-R can be used for retrospective evaluations and may be administered without an interview for research purposes; however, it is not recommended for clinical diagnosis.
Despite its widespread use, several authors have raised methodological and practical concerns regarding the PCL-R. A study conducted by Jeandarme et al. (2017) presents lack of validity of the PCL-R scale on the sample of 531 offenders who had to perform well on the test in order to get positive diagnosis [43]. The aforementioned diagnosis was crucial for convicted criminals as it determined their transfer for treatment, thus, simultaneously leaving prison where they were held before transferring to the place of therapy. The Psychopathy Checklist-Revised did not turn out to be useful in real life due to falsified answers of offenders' raters – due to shortage of high security forensic psychiatric beds the raters decided to counterfeit (lower) PCL-R scores in order to ease transfer of prisoners to a medium security psychiatric ward. These findings indicate that the PCL-R’s validity may be compromised due to deliberate score manipulation by raters when its outcomes directly influence decisions regarding the assessed individual, suggesting that the tool may not be reliable in all contexts or settings.
In 2018, Flórez et al. studied the psychometric features and validity of the PCL R on a sample of 204 Spanish convicts [44]. Their confirmatory factor analysis and correlational data showed that the interpersonal, affective, and lifestyle aspects of the PCL R were significantly more reliable in measuring psychopathy than the antisocial construct. These findings reinforce concerns - initially raised by Skeem and Cooke - that including antisocial behavior items within the PCL R may introduce construct-irrelevant variance, potentially weakening its ability to measure fundamental psychopathic traits [45]. By elevating the role of criminal behavior, the test risks labeling individuals mainly based on antisocial actions rather than on the central interpersonal and affective characteristics of psychopathy.
In conclusion, when applied with appropriate caution and in light of the known limitations of the PCL R, the Psychopathy Checklist–Revised can be a reliable instrument that effectively measures the psychopathic traits it was designed to assess. Moreover, its consistent construct validity across genders, ethnic groups, and cultural contexts reinforces its generalizability.
The Positive and Negative Syndrome Scale is used to measure severity of syndromes in patients suffering from schizophrenia [19,46]. A crucial factor which is taken into account when assessing the scale is reliability. Emsley et al. state that the negative factor in PANSS scale achieved a reliability of 0.89 in early onset patients [47]. On the other hand, anxiety and depression factor turned out to have low reliability (0.66) according to Rabinowitz and Davidson [48]. A review by Levine et al. concluded PANSS to be a relatively reliable scale in terms of measuring severity of schizoprenic symptoms, however, highlited a gap regarding chronic manifestation of the illness.
Østergaard et al. conducted a study based on 497 inpatients who were already diagnosed as schizophrenics [19]. Authors sticked to checking whether a shorter version of PANSS-30 questionary also serves its purpose due to lack of time to perform full PANSS in clinical field [19]. The 14-item version suggested by Levine et al., the 8-item version proposed by Andreasen et al. and the 6-item version were tested, and they found out that PANSS-6 is the most useful and efficient scale used to diagnose patients in hospitals [19,49]. The most scalable version, PANSS-6, focused on “P1-Delusions, P2-Conceptual disorganization, P3-Hallucinations, N1-Blunted Affect, N4-Social withdrawal, N6-Lack of spontaneity and flow of conversation” and was discovered to be the most applicable and appropriate option [19]. It can assess schizophrenia severity over time, help define short-term remission and potentially aid in the development of new drugs for psychosis [19]. Thus, the Positive and Negative Syndrome Scale, particularly PANSS-6 measures psychopathological phenomena in psychiatry [19]. According to Østegaard et al. the initial studies measuring the validity of PANNS-6 scale were based on acutely ill patients. As a result, it was unclear whether the PANSS-6 is valid for use with patients in the chronic stage of schizophrenia. Therefore, Østegaard et al. performed another study, which incorporated a reanalysis of the data from the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) study, which is the largest randomized clinical effectiveness trial in chronic schizophrenia [49]. PANSS-6 turned out to be scalable, whilst PANSS-30 not. Moreover, “the PANSS-6 item rank order” was consistent over time, and across gender, age, and antipsychotic treatments. The fact that a rating scale keeps the same order of symptom severity (item rank order), regardless of factors like age, gender, or treatment type, is crucial. It ensures that the scale can fairly and accurately compare different patients or treatment effects over time, whether in research or real-world clinical setting [49].
PANSS-6 has proved that it is a great tool for bonding research purposes with clinical practice, due to its time efficiency compared to PANSS-30. Moreover, this scale captures the core symptoms of schizophrenia disorder [50]. Studies were conducted among the population of children and teens, since the 30-item scale or 6-item scale were used using only adult patient data. A 10-item scale has been developed specifically for the use among patients under 18. Findling et al. conducted an 8-week trial comparing the safety and efficiency of the antipsychotics olanzapine and risperidone with the older drug molindone among younger patients [51]. A group of 116 participants whose age varied from 8 to 19 years suffering from schizophrenia was examined with the 30-item PANSS along with other tests every week for eight weeks. The group of researchers assessed different combinations of symptoms using the patient data in order to find a scale which is coherent with the full-length PANSS and provides time savings. They did not choose any specific items beforehand, since they wanted to “let the data speak for itself”. The PANSS for pediatric use ultimately included 5 factors with 2 symptoms each: delusions and unusual thoughts, emotional withdrawal and apathy, hostility and poor impulse control, inattention and disorganized thinking, anxiety and feelings of guilt. The 10-item scale turned out to perform very well and matched the 30-item PANSS 88% of the time [52].
A recent study from May 2025 concluded that even though PANNS-30 is still considered a standard of assessment in schizophrenia and related disorders, PANNS-6 demonstrates sufficient assessment of symptom severity and change to be used [53].
Clinician-Administered PTSD Scale serves a purpose of diagnosing PTSD (Post- traumatic stress disorder) by performing an interview composed out of 30 structured questions [23]. A study referring to this scale was conducted on 167 military veterans [22]. According to Weathers et al. CAPS is a valid tool that can be used to determine severity of PTSD symptoms and disorder considering that the test “reflect relatively little measurement error due to items, raters, or occasions” [22]. On one hand, the obtained result implies that, indeed, Clinician-Administered PTSD Scale is a great tool to be used; however, on the other hand, several limitations might have skewed the outcome. Weathers admitted presence of aforementioned limitations such as gender, geographical region or occupation, nonetheless, he concluded that the result still proved to be valid [22]. This demonstrates that, despite the scale’s inherent limitations, its selection is well justified.
Moreover, evaluating the validity of the CAPS is essential, as its status as a reliable, randomized scale influences the selection of medications used in trials for treating chronic PTSD [54]. The study by Feder et al. evaluated the efficacy of ketamine infusion (0.5 mg/kg) in altering the severity of PTSD symptoms. Two weeks after drug administration, the CAPS-5 score was 11.88 points lower than that of the control group. The author emphasized the need for further research in this area [54]. The scale was also employed to assess the impact of practicing yoga on enhancing emotional tolerance and reducing PTSD symptomatology [55]. The three-year study, utilizing tools such as the CAPS scale, demonstrated improvements in functioning among individuals with trauma, as well as in their capacity to tolerate affect [55].
Another example covering the issue of PTSD is experiencing such syndrome as a result of being abused in childhood [56]. Rameckers et al. performed a study based on 147 adults (both women and men) who experienced trauma while being kids – their PTSD resulting from the aforementioned trauma was measured by the Childhood Trauma Questionnaire-short form and by, among others, CAPS-5 [57]. The aim was to measure PTSD symptoms and syndrome resulting from five main types of maltreatment in childhood – sexual, physical, and emotional abuse, and physical and emotional neglect [57]. It was found out that sexual, physical and emotional abuse caused the biggest PTSD severity in adulthood [57]. In this case there was no limitation in case of gender (both females and males were included) or geographic region (patients were recruited from Australia, Germany and the Netherlands) [57]. Moreover, it is safe to assume that occupation variety was also met considering the fact that participants originated from three countries and were picked from ten different mental health facilities across each country [57]. Another example is the study by Loos et al., which examined 159 children aged 7 to 16 [58]. Participants were assessed using the clinician-administered CAPS-CA scale. The study found that the most common potential traumatic events were physical violence without the use of a weapon, the death of a close person and sexual abuse [58]. The CAPS scale can also be used to assess the presence of PTSD in parents of preterm infants in the intensive care unit [59]. The study by Chiara et al. revealed that the gestational age of newborns admitted to such a unit significantly influences the occurrence of PTSD [59]. The presence of PTSD in parents of preterm infants is a pathological condition that, if detected early using tools such as the CAPS scale, can be properly treated.
The assessment of construct validity for psychiatric measurement scales requires the analysis of reliable randomized studies. Wojujutari et al. evaluated the reliability of the CAPS-5 scale across different populations and clinical contexts [60]. A meta-analysis encompassing 15 studies demonstrated the overall reliability of the test, along with stable test-retest results [60]. Furthermore, Kruger-Gottschalk et al. conducted a factor analysis confirming the PTSD CAPS-5 scale based on the DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, 5th Edition) and ICD-11 (International Classification of Diseases) criteria [61]. The study on 345 individuals replicated the internal consistency of the CAPS-5 and demonstrated that it is the best-fitting scale among all the DSM-5 tested scales [61]. However, the author points out better model fit results in certain aspects ICD-11. An additional example is the study by Pupo et al. from Brazil, in which the scale we have described was used to examine factors related to the prognosis and effectiveness of PTSD interventions in civilian populations [62]. Here as well, the CAPS proved to be both an accurate and reliable research tool for identifying cases of PTSD in the civilian population [62].
Thus, based on explained and evaluated examples, the CAPS diagnostic method can be credited as successful and recognized as a measure that actually exists. It can be used with high accuracy and reliability to diagnose and assess the severity of PTSD symptoms, according to DSM-5 criteria, as well as to monitor treatment progress.
A comparative analysis of the reviewed instruments shows that each scale relies on distinct theoretical foundations and captures different domains of psychopathology or clinical decision making. MacCAT-T focuses on decisional capacity and demonstrates sensitivity to impairments in understanding, appreciation, reasoning, and expression of choice in clinical populations, as shown in studies involving schizophrenia [13], anorexia nervosa [26], and dementia [27,28]. PCL-R reflects interpersonal, affective, and behavioral traits associated with psychopathy and presents a multifactor structure that varies across samples and cultural contexts, as demonstrated in male [39], female [40], and ethnically diverse groups [41]. Additional work also highlights the influence of contextual factors during assessment [43]. PANSS provides a broad assessment of positive symptoms, negative symptoms, and general psychopathology, with evidence supporting the use of shortened forms such as PANSS-6 and PANSS-14 [19] and confirmation of symptom structure in large scale research including the CATIE (Clinical Antipsychotic Trials of Intervention Effectiveness) study [49]. CAPS offers a detailed evaluation of posttraumatic stress symptomatology and shows consistent psychometric performance across trauma exposed groups, including veterans [22], individuals with childhood trauma [55], samples undergoing strenuous physical activity [54], and parents of patients in intensive care settings [59]. Differences in construct validity emerge from the reviewed evidence. MacCAT-T shows conceptual specificity but limited applicability in populations with marked cognitive impairment, as indicated by studies on dementia [27,28]. PCL-R demonstrates strong reliability across groups [39], [40], [41], although variation in factor models and the susceptibility of scores to contextual influences remain documented concerns [43]. PANSS exhibits stable core symptom domains across studies [19], [49], yet inconsistency in factor structures is noted in several analyses. CAPS shows robust validity and reliability [22], [54], [55], [59], although its sensitivity to trauma context and the overlap of PTSD symptomatology with other psychiatric syndromes may influence interpretation.
Taken together these findings allow a transition from individual studies to a broader synthesis. The reviewed literature shows that all four scales possess empirical support for construct validity, although the degree of this support differs across clinical and forensic settings. The evidence indicates that theoretical clarity, population specific characteristics, and contextual influences must be taken into account when interpreting results and when evaluating the practical applicability of each instrument.
Table 1 compares key psychometric instruments by outlining what each scale measures, how it is structured, which aspects were evaluated in published studies, and what methodological limitations were identified for each tool.
Table 1. Key psychometric characteristics of each scale
| Scale name | Measured domains | Main structural features | Elements examined in the cited studies | Identified limitations |
| MacCAT-T | Decisional capacity in treatment contexts | Specialists consider four fields based on which a scale from 0 to 2 points indicate patient’s capacity to give an informed consent |
|
|
| PCL-R | The danger of possibly occurring violent behavior | It is a 20-item assessment tool (each item is rated on a scale from 1 to 3). When more than 30 points are obtained, the patient is classified as a psychopath |
|
|
| PANSS | The severity of symptoms of schizophrenia and the possibility of remission of the disorder | It is a 30-item scale (seven items measure positive symptoms, seven items measure negative symptoms, and sixteen items measure general psychopathology) | The choice which variant of the PANSS scale meets the standard of being both applicable and time sparing. A conclusion was drawn that PANSS-6 is the best version considering that it was consistent over time, across antipsychotic treatments, gender and age | Underage patients need to be diagnosed with a 10-item scale because trials were performed resulting in matching outcome 88% of the time between PANSS-10 and the gold standard (PANSS-30) |
| CAPS | Severity and symptoms of PTSD | The scale consists of 30 structured questions; each symptom of PTSD is ranked on a 5-point scale. The higher the score (added variables of each symptom), the more acute the disorder is |
|
|
Table 2 contrasts four psychometric instruments by outlining their theoretical basis, evidence of validity, population sensitivity, clinical application and key risks in interpretation.
Table 2. Summary comparison of the construct validity of the four instruments
| Scale name | The theoretical foundation | Evidence of validity according to the referenced studies | Population related sensitivity | Context of application | Noted interpretation risks |
| MacCAT-T | MacCAT-T is a test which measures psychologic capacity of people to give an informed consent for treatment. Each question is marked on a scale from 0 to 2 points | The MacCAT-T test proved to be applicable when diagnosing underage patients, the ones who suffer from schizophrenia, Anorexia Nervosa or dementia | Examined studies were based on both adolescents and mature people | Getting to know whether a patient is able to give an informed consent regarding their hypothetical treatment | It is crucial to bear in mind that when patients suffering from diseases are considered, special treatment is necessary (speaking in a certain, eased way, refraining from talking about triggering subjects) |
| PCL-R | A test used to determine whether a person is a psychopath and to estimate hypothetical chance of occurring violent behavior | In the examined studies a positive correlation between PCL-R score and criminal activities of inmates was deduced. It was also shown as a useful tool to detect psychopathy in a population | Abovementioned studies were based on both genders and were conducted on different populations who embrace different cultures | Detecting a correlation between porridge and obtaining a certain score on the test; detecting signs of psychopathy in an examined population | It is significant to apply a culturally sensitive approach and to keep in mind that some may intentionally lie in order to skew the results to gain certain benefits |
| PANSS | A 30-item scale test used to measure solemnity of symptoms of schizophrenia and possible remission of the disorder | The original PANSS-30 was time-consuming, thus tests were performed to find a scale that is more time efficient simultaneously maintaining diagnostic values. That way the best version was found to be PANSS-6 (a 6-item scale test) | The studies were based on people of different age and gender. Moreover, the use or lack of use of antipsychotic treatments was also considered | Measuring severity and hypothetical possibility of schizophrenia remission | Underage patients should be tested with a 10-item scale because in 88% of scenarios PANSS-10 resulted in obtaining ideal results |
| CAPS | A test composed of 30 structured questions. The goal is to establish severity and symptoms of PTSD | The test was proven to be useful for diagnosing PTSD of military veterans, people who experienced childhood abuse and parents of preterm infants. Moreover, the purpose of doing sports (yoga) and intaking drugs to ease PTSD was evaluated. | Discussed studies were based on data derived from populations of different gender, geographical region or occupation | Measuring severity and symptoms of PTSD | Conclusion was derived that in order to treat PTSD of parents of preterm infants they need to be diagnosed early |
This review analyzed four clinical scales used in the diagnosis and assessment of mental states, namely MacCAT-T, PCL-R, PANSS, and CAPS. A comparison of their theoretical foundations and the available empirical data made it possible to evaluate the construct validity of each instrument.
The analysis of MacCAT-T showed that the scale maintains stable sensitivity to impairments in understanding and evaluation of information in patients with various psychiatric disorders, including schizophrenia [13], anorexia nervosa [26], and dementia [27,28]. The applicability of the scale depends on age and the degree of cognitive limitation, which is supported by evidence indicating difficulties in its use in children and adolescents [29]. These observations highlight the need to consider individual differences in cognitive abilities when interpreting results.
The PCL-R scale demonstrates confirmed construct validity across different samples, which is reflected in studies involving men [39], women [40], and multiethnic groups [41]. At the same time, research data indicate variability in factor structures, particularly in cross cultural analyses [41], as well as limitations related to the use of the scale in settings where results may depend on external circumstances of assessment, as shown in the work by Jeandarme and colleagues [43].
The PANSS scale demonstrated stability of its factor structure and the possibility of applying shortened versions. The literature reports reliable data for PANSS-6 and PANSS-14 [19], as well as confirmation of symptom structure reproducibility in the large CATIE (Clinical Antipsychotic Trials of Intervention Effectiveness) study [49]. Shortened forms are useful in time limited settings, although the question of preserving the completeness of assessment requires further investigation.
The analysis of CAPS confirmed its high reliability and diagnostic validity. The study by Weathers and colleagues demonstrated stable psychometric properties of CAPS-5 in veterans [22], and the works of Feder [54], Van der Kolk [55], Rameckers [57], and other authors showed broad applicability of the scale in diverse clinical contexts, including childhood trauma assessment, the impact of physical exertion, and evaluation of the condition of parents of intensive care patients [59]. These findings confirm the universality of CAPS while also emphasizing the need to consider the specific characteristics of individual traumatic experiences.
Overall, the review shows that all four scales have empirical support for construct validity, although the degree of this validity varies depending on the population, the context of use, and the methodological features of the studies. The findings emphasize the importance of carefully considering the theoretical basis of each scale, the limitations of the instrument, and the conditions of its clinical application.
The review demonstrates that all four clinician rated psychiatric instruments show empirical support for construct validity, yet the strength, coherence, and theoretical alignment of this evidence differ substantially across tools. The conclusions directly address the research questions concerning theoretical grounding, empirical consistency, limitations, and contextual applicability.
MacCAT-T shows that its measured domains understanding, appreciation, reasoning, and choice correspond to the theoretical construct of decisional capacity. Evidence from studies involving schizophrenia, anorexia nervosa, and dementia confirms that the scale detects clinically relevant deficits. At the same time construct validity is constrained by age related cognitive variation, which limits theoretical clarity in children, adolescents, and older adults. This indicates that the scale’s construct is valid within a specific cognitive range but does not generalize across all age groups.
PCL-R demonstrates construct validity for interpersonal, affective, lifestyle, and antisocial traits associated with psychopathy. Studies across male, female, and multiethnic groups confirm the presence of the core construct. However variability of factor models and context dependent shifts in item performance indicate that the theoretical construct is only partially stable across populations. This shows that construct validity is supported, but limited by population sensitivity and contextual influences.
PANSS provides evidence of construct validity for positive symptoms, negative symptoms, and general psychopathology. The validity of shortened versions such as PANSS-6 and PANSS-14 confirms that essential symptom dimensions retain their structure even under reduced item sets. Nonetheless inconsistency of factor structures across studies indicates that the theoretical construct of schizophrenia symptom domains is not uniform. Construct validity is therefore supported but not theoretically consolidated across all populations.
CAPS shows strong construct validity for PTSD symptomatology. Its structure demonstrates consistency across different trauma exposed groups, including veterans, individuals with childhood trauma, and physically stressed populations. Despite this, the overlap of PTSD symptoms with other psychiatric conditions and the variability of trauma related presentations indicate that the construct retains context sensitivity.
Overall, the findings indicate that construct validity is present for all four scales, but in each case it is conditional and bounded by population characteristics, contextual factors, and variability in factor structure. The review therefore emphasizes the need to interpret each instrument within its theoretical framework, to consider empirical limitations, and to apply the scales with attention to the specific conditions under which their constructs remain valid.
Conceptualization: Kamila Roman
Methodology: Magdalena Łuba, Kamila Roman
Investigation and data collection: Kamila Roman, Magdalena Łuba, Alicja Solecka, Kamil Kuc
Formal analysis: Katarzyna Obidzińska, Amelia Sztangierska, Ewa Janina Łaska
Writing – original draft: Kamila Roman, Magdalena Łuba, Alicja Solecka, Kamil Kuc, Katarzyna Obidzińska, Amelia Sztangierska, Ewa Janina Łaska
Writing – review and editing: Anna Roman, Julia Brzostowska, Emilia Grzonka
Supervision: Anna Roman
All authors read and approved the final version of the manuscript and agree to be accountable for all aspects of the work.
The authors declare no conflict of interest.
No funding was received to conduct this study.
Artificial Intelligence had been used for the purpose of style and language correction.