Grit originates from positive psychology (Schmidt et al., 2021) and was initially proposed by Duckworth et al. (2007), as a fusion of passion and perseverance in the pursuit of long-term goals. This construct consistis of two components: perseverance of effort (PE) and consistency of interests (CI) (Duckworth & Quinn, 2009). Essentially, PE refers to maintaining effort and reaching long-term goals as well as tendencies to continue working (Duckworth et al., 2021). When relating to CI, it refers to the tendency to maintain focus over a long period of time when achieving goals, successfully capturing the notion of passion. It is important to know that CI reflects a disposition concerning a particular topic that is not limited to a particular situation (Muenks et al., 2017). Based upon the authors’ model of two factors, PE and CI are interrelated, but distinct (Kuruveettissery et al., 2023).
Over the past years, grit has gained exponential interest across various fields (e.g., politics, healthcare, research and media) (Muenks et al., 2017) and a rigorous measurement of grit has been encouraged in fields like academics, careers, and lifestyle (Duckworth & Quinn, 2009). Recent studies indicated that grit plays an important role in predicting success, years of schooling, work performance, failure, health outcomes, sports success, and school performance (Park et al., 2020; Pendyala & Vyas, 2023). Furthermore, grit has been studied in the context of military service, relationships, workplace, and professional skills (Reysen et al., 2019), and it seems to be predictor of happiness in study situations or subjective well-being (Zhang et al., 2023), particularly through the PE dimension. Grit has been shown to be a malleable and promising personality trait, it can also be developed through deliberate practice (Schimschal et al., 2022), which suggests that it may be appropriate to build efforts towards the development and improvement of instruments to support interventions.
This model has been widely studied in different educational settings (Duckworth et al., 2021; Jiang et al., 2021; Morell et al., 2021; Muenks et al., 2017). Investigations have found associations between PE and the use of learning strategies (Jiang et al., 2021), self-efficacy levels (Muenks et al., 2017), grades (Morell et al., 2021), and years of education completed by adults (Duckworth et al., 2021). High levels of CI are associated with positive effects on student organization (Jiang et al., 2021). Moreover, grit is a relevant predictor of academic and vocational outcomes, such as fewer career changes and greater educational attainment (Schmidt et al., 2021).
In educational context, CI explains why some individuals spend a substantial amount of time, effort, and hard work to reach an achievement, even if the results are not immediate (Neroni et al., 2022). The concept of PE may be understood as a commitment, involvement and effort that drives us towards achieving something (Sigmundosson et al., 2020). Due to this specificity of meaning of the two dimensions, PE, but not CI, is shown to be a predictor of students’ academic results (Neroni et al., 2022). Moreover, grit is considered a performance character strength by scholars, showing moderate correlations with educational variables like academic performance and positive variables such as hope, positive affect, and family relationships (Hasan et al., 2022).
The environment plays an important role in developing grit, just as personality traits do. In this regard, Teimouri et al. (2021) assert that researchers have sought to measure this capability across a wide range of areas, especially in the fields of sport and education. Given that grit is considered a specific domain, individuals can demonstrate different levels of passion (or consistency of interest) and perseverance (Cormier et al., 2019). Thus, a person may display grit in his professional life, but not in his personal life. Furthermore, the authors point out that, although there may be some stability between domains, certain variables, such as cultural context, social role, gender, and language tend to differentiate levels of competence (Cormier et al., 2019).
As a result, it cannot be denied that culture plays an important role in influencing individual behavior, especially in its relation to passion, and more generally in its relation to perseverance (Kuruveettissery et al., 2023). Its manifestation may be affected by cultural nuances of the individuals in a particular cultural context (Hasan et al., 2022). It is crucial to understand the cultural interpretations about what means to have a passion, sustain it for long periods, and persevere to succeed (Pendyala & Vyas, 2023).
Given its relevance, some grit assessment instruments have been developed. In view of this, Hasan et al. (2022) revised the literature on grit scales that presents psychometric studies. After the literature review, they pointed the existence of several grit scales: the Original Grit Scale (Grit-O), the Short Grit Scale (Grit-S), the Triarchic Model of the Grit Scale (TMGS), the Grit Psychological Resources Scale (GPRS) (Schimschal et al., 2022), the Grit Scale for Children and Adults (GSCA), the Academic Grit Scale (AGS), the L2-Teacher Grit Scale (Sudina et al., 2020), the Oviedo Grit Scale (EGO) (Postigo et al., 2021), the Clinical Nurses Grit Scale (CN-Grit) (Park et al., 2020), the 12-item Multidimensional Grit Scale (Pajestka & Poraj-Weder, 2023), 3-D Grit Scale (Kuruveettissery et al., 2023), Physical Education Grit Scale (PE-Grit Scale) (Guelmami et al., 2022), Domain-specific Grit Scale for College Athletic Students (DGSCAS) (Gao et al., 2024), English as a Foreign Language Grit Scale (EFL) (Ebadi et al., 2018), Sports Grit Scale (Fruchart & Rulence-Pâques, 2024), Grit Scales and the Long-term Grit Scales (L2 Grit Scale) (Li & Yang, 2023), and the Metacognitive Awareness of Grit Scale (MCAGS).
Although there are many instruments on grit, in recent years, two scales have gained significant popularity, the Grit Scale (Grit-O) (Duckworth et al., 2007) and the Grit Scale Short (Grit-S) (Duckworth et al., 2021). The first one included 12 items, six related to consistency of interests (such as “New ideas and projects sometimes distract me from previous ones”, a negative item) and six to perseverance of effort (such as “I am a hard worker”). The short version of the scale (the Grit-S) has eight-item (four fewer items than the Grit-O) and improved psychometric properties (Duckworth et al., 2021).
Considering that grit manifestation may not be universal, varying among settings and cultures (Hasan et al., 2022), it is necessary to create or validate grit scales across diverse cultures (Li et al., 2018). Faced with that, many studies have translated the Grit-O into different languages and contexts (e.g., Areepattamannil & Khine, 2018; Schmidt et al., 2021; Tyumeneva et al., 2014; and others), including cross-cultural studies (e.g., van Zyl et al., 2022). However, there are no versions of this scale that can be applied to the Brazilian context.
Considering instruments in Portuguese (suitable for assessing Brazilians and other speakers of the language, such as Portugal) there is only one measure for assess grit in Portuguese, the Oviedo Grit Scale. However, limited research has been conducted on its psychometric properties (only internal consistency, factor structure, and invariance considering gender and curricular year) (Monteiro et al., 2023). Also, the fact that the Oviedo Grit Scale was only tested with people from Portugal (not with Brazilians) limits its use in Brazil.
Considering this lack of measures to evaluate grit in the Brazilian context, Noronha and Almeida (2022) developed the Grit Assessment Scale - International Version in Portuguese Language (EAGrit-LP), based on Brazilian and Portuguese data. Given this sample composition, the measure enables grit evaluation in Brazil and Portugal, making it possible to conduct intercultural evaluations on the topic. In addition to these advantages, the measure was developed considering international results on other grit instruments, which indicated inconsistencies in its internal structure (as can be seen in the results of these studies: Arco-Tirado et al., 2018; Areepattamannil & Khine, 2017; Beri & Sharma; 2019; Marentes-Castillo et al., 2019; Schmidt et al., 2021; Sordia, 2020; Tan et al., 2019).
To demonstrate their psychometric qualities, psychological instruments used in research and intervention must undergo a rigorous construction process. Among these processes are the estimation of reliability and the investigation of validity evidence (American Educational Research Association [AERA] et al., 2014). Identifying whether the test measures a psychological construct and whether it replicates the theoretical framework that underlies its development is one of the most challenging and critical tasks (Nakano et al., 2015). In this sense, during the development of an instrument, item analysis plays a crucial role since it allows each item to be evaluated separately, contributing significantly to the interpretation of its scores (Bock & Gibbons, 2021; Wright & Masters, 1982). To verify evidence of this nature, estimates derived from Item Response Theory (IRT) are relevant. IRT can provide more informative parameters (like item-difficulty, person ability, and item-discrimination parameters) that are note sample-dependent, reflecting accurate estimates of misfit items (Alhadabi, 2023). Also, the differential item functioning can help to understand the variance among different groups (for example, two different cultures).
In this context, several studies have adopted IRT-based approaches to investigate the psychometric properties of Grit scales. Analyses of this nature were carried out, for example, for the Grit-S Scale (Alhadabi, 2023; Areepattamannil & Khine, 2017; Farmer et al., 2025; Gonzalez et al., 2020; Midkiff et al., 2017; Morell et al., 2021; Muenks et al., 2017; Tyumeneva et al., 2019; Yupanqui-Lorenzo et al., 2024), the NL-Grit scale (Dijksma et al., 2023), and the Grit scale - GS-12 (Jie et al., 2024). A reason for this interest can be explained by Credé et al. (2017) who proposed that grit literature may benefit significantly from the refinement of its scales based on item response theory. In addition, IRT provide useful information about the contribution of the items in the measurement of the latent construct, its quality, and at which points the ability scale performs the best measurement, after places examinees and items on the same scale (Yiğiter & Boduroğlu, 2024).
Farmer et al. (2025) assert that the IRT provides further information as it calculates the accuracy of a scale within a range of a latent trait, “describe the relationship between an individual’s level on the construct (i.e., theta) and the probability of a particular response occurring, as well as describing the difficulty of each item, and it can incorporate this information in scaling” (p. 2). It has been suggested by Jie et al. (2024) that IRT provides a more accurate assessment of an individual’s ability level, allowing a more precise description of measurement characteristics.
Considering that the EAGrit-LP is the only grit measure developed for Brazilians, it is important to understand its psychometric qualities. Preliminary studies have already verified validity evidence based on its content (Noronha & Almeida, 2022) as well as based on its internal structure (Noronha et al., 2024). Results indicated that the measure presents a multidimensional internal structure, composed of two factors (Pe and CI), with excellent reliability (Noronha et al., 2024). However, the quality of its items, individually, have not yet been evaluated. Thus, the aim of the present study was to analyze the EAGrit-LP items to gain a deeper insight into the relationship between the ability (latent traits) and the test results (Bock & Gibbons, 2021). For that purpose, a Rasch model (Bond & Fox, 2015) was used to assess fit indices (infit and outfit) and item difficulty, as well differential item functioning based on participants’ gender and country of origin. There have been many studies conducted on the use of DIF in Grit scale analysis, aiming to compare two cultures, including an analysis of grit scale by Alhadabi (2023), across racial groups by Gladstone et al. (2022), across genders and grade levels by Areepattamannil and Khine (2017), and gender-based by Lin and Shih (2021).
Method
Participants
A total of 1.706 participants from a convenience sample were included in the study: 1.050 Brazilians and 656 Portuguese, aged between 17 and 71 years old (M = 24.1 years; SD = 9.2). The sample consisted mainly of females (72.1%) and of students attending private universities (51.0%). Also, most students were in the initial semesters of their studies (22.0%), followed by 48.5% in the middle semesters, and 29.5% in the final semesters.
Instrument
Sociodemographic questionnaire. The questionnaire was developed for the present study. It aimed to access sociodemographic information including country of residence, gender and age.
Grit Assessment Scale - EAGrit-LP (Noronha & Almeida, 2022; Noronha et al., 2024). A 12-item scale that evaluates two factors: consistency in interests and perseverance of effort, each with six items. The instrument is answered on a Likert scale with four possible responses: 1 = completely disagree, 2 = sometimes disagree, 3 = sometimes agree, and 4 = completely agree. Based on the responses to the items that make up each factor, scores may vary between six and twenty-four, with higher scores indicating higher levels of grit. Developed based on Brazilian and Portuguese data, the EAGrit-LP has been the subject of studies focusing on its construction and evidence of content validity (Noronha & Almeida, 2022), as well as a study investigating validity evidence based on the factorial structure, with adequate results (Noronha et al., 2024).
Procedures
The cross-sectional study was performed between May and November of 2022 via google forms, including the Free and Informed Consent Form on the first page, the Sociodemographic Questionnaire and the EAGrit-LP. In Brazil, the questionnaire was disseminated through social networks (Instagram, Facebook, and LinkedIn), as well as via email. In Portugal, the questionnaire was shared in classrooms at Higher Education institutions.
Data analysis
Initially, descriptive statistics were calculated. To comply with the most popular assumption of IRT models, the unidimensionality, as a way to guarantee that all items of the test measure the same latent trait (Gyamfi & Acquaye, 2023) and only one characteristic of the subject (Hu et al., 2021), item analysis was conducted separately for each of the factors - perseverance of effort (PE) and consistency of interests (CI). To address the polytomous nature of the items, the Rasch-Andrich Rating Scale Model (Andrich, 1978) has been used on JAMOVI software 2.6.44. The result tables are estimated by Marginal Maximum likelihood estimation (MMLE) and the eRm, a R Package was used for the person-item map.
For the purposes of evaluating the items, item difficulty (parameter b) and fit indices (infit and outfit) were estimated. The item difficulty index is related to the probability of endorsing an item and reflects the level of ability required to agree with the item’s content (Gyamfi & Acquaye, 2023). To calculate the average amount of ability (theta) required for the item to be endorsed, the reliability of individuals was assessed, and maps of the item/construct were constructed. The Item-People Map procedure was employed to determine the relationship between the difficulty of the items and the ability levels presented by the participants (Embretson & Reise, 2000). This value usually ranges between -2.5 and +2.5, in theory (Ahangari & Semiyari, 2022).
Items suitability for the model were also evaluated. This analysis was carried out by estimating fit indices, a procedure that allows detecting differences between what was predicted by the model and what was empirically observed in the data (Smith, 2004; Wright & Masters, 1982). These differences are called residuals, and their evaluation includes two main indices: infit (to check for discrepancies in items whose difficulties are close to person’s abilities level) and outfit (to check for discrepancies in extreme items).
Assessing item fit is a central point in IRT, considering that a good fit is necessary to draw valid inferences from estimated model parameters (Diaz et al., 2022). This is true because the fit indexes can identify examinees that exhibit patterns of unexpected responses. Infit and outfit values can range between zero and infinity (Jones et al., 2023). According to Linacre and Wright (2002), however, values below and above 0.5 and 1.5 (especially for infit) indicate a mismatch between the model and observed data.
Subsequently, we estimated the differential item functioning (DIF), based on the gender and country of origin of the participants. This type of analysis is used to verify the existence of items with different difficulty between groups, favoring one of them. In this sense, depending on the group where the person is inserted, the item may be more easily endorsed or require a higher level of skill to be endorsed (Yiğiter & Boduroğlu, 2024). Items that present DIF indicate that individuals who have the same level of ability can perform differently on a test, revealing that these items cannot assess the construct fairly, because the assessment would depend on the group to which the person belongs (Ahangari & Semiyari, 2022). In the present study, as a criterion for assessing DIF, values of probability below 0.05 or DIF contrast greater than 0.42 was adopted, following literature guidelines (Aguerri et al., 2007).
Results
Initially, the results are presented for Factor 1 (Perseverance of Effort), consisting of six items. Descriptive statistics indicated that participants’ scores varied between 7 and 24 points (M = 19.0; SD = 3.42). The person reliability was 0.80. Regarding the difficulty of each item, as well as infit and outfit, the results are presented in Table 1.
Table 1 Factor 1 - Perseverance of effort: Item difficulty and fit index

Note. SD = standard deviation; b = item difficult; SE = standard error; MNSQ = mean square error.
The data indicate that the difficulty of the items was quite low, so that a high level of ability (grit) is not necessary for the person to endorse the highest response alternatives. None of the items in this factor showed a mismatch in the residuals, as values for infit and outfit are adequate.
In accordance with the construct map (Figure 1 on the left), most participants have an ability level of -1, ranging from -6.5 to +3.0. Regarding the items, the difficulty analysis indicates that item 2 and item 11 are the easiest to endorse, while item 8 and item 10 require more ability to endorse, despite the fact that the difficulty of all items are quite low. When ability level and response options are analyzed for all items, individuals with theta of one already choose alternative 2 of the response scale (Figure 1, on the right). As can be seen in the curve showing the distribution of participants at the top of the figure, participants have a greater tendency to mark alternative 3.
To identify differences due to participants’ gender and country of origin, differential item functioning (DIF) was estimated. Table 2 presents the results for Factor 1 (Perseverance of Effort).
The results indicated that, in relation to the country of origin, only one item (item 8: “I follow my plans, even if I have adversity to execute them”) presented DIF, requiring a lower level of ability to be endorsed by Brazilian students (compared to Portuguese students). In relation to gender, the same item (item 8) presented DIF, favoring female participants (compared to male).
The same analyzes were then conducted with the six items that made up Factor 2 (Consistency of Interests). Results are presented in Table 3, regarding items difficulty and adjustment indices. The person reliability for this factor was 0.80.
Table 3 Factor 2 - Consistency of interests: Item difficulty and fit index

Note. SD = standard deviation; b = item difficult; SE = standard error; MNSQ = mean square error.
As with Factor 1 (Perseverance of Effort), the results for CI factor indicate that the difficulty of the items is not particularly high, showing that little ability (grit) is required by the person to endorse the content of the items. Regarding fit indices, both infit and outfit results were adequate (Table 3).
The results of participants’ ability level and item difficulty are presented in Figure 2 (left figure). It is possible to verify that most participants have ability level close to theta of -1.0. In terms of the items, the item 3 is the easiest to endorse (less ability required), while item 9 requires the highest level of ability for its endorsement.
As in Factor 1, the relationship between ability level and response options indicates that, for all items, a person with theta of one already selects alternative 2 of the response scale (as can be seen in Figure 1 on the right). According to the curve that illustrates the distribution of people at the top of the figure, participants tended to choose alternative 3 more frequently.
Additionally, the differential item functioning (DIF) was examined for the second factor, considering the variables gender and country of origin. Results are presented in Table 4.
Among the items relating to the CI factor, there were differences in performance by country of origin for item 3 (“I try to commit to achieving my goals”), favoring Brazilian respondents, and for item 5 (“I have defined what I intend to do in my life”), favoring Portuguese respondents. These results indicate that both items have different probabilities of endorsement among individuals who have the same level of ability but belong to a different group (in this case, country of origin).
Regarding gender, four out of six items presented DIF. Item 1 (“I have my life goals set for the next few years “) and item 4 (“My interests are stable”) showed to be easier to endorse by female participants (compared to male). By their turn, item 3 (“I try to commit achieving my goals”) and item 7 (“I maintain a coherent line of goals over time”) required a lower level of ability to be endorsed by male participants (compared to female).
Discussion
The present study intended to investigate the psychometric properties of the EAGrit-LP items (Noronha & Almeida, 2022; Noronha et al., 2024), particularly assessing the relationship between latent trait levels of the individuals and test results (Bock & Gibbons, 2021), by analyzing fit indices (infit and outfit), item difficulty, and differential item functioning based on participants’ gender and country of origin. IRT-based estimates can be useful for identifying areas with potential for improvement in psychometric instruments, such as poorly performing items that cannot distinguish between response levels, items with low reliability, and items that present increased redundancy (Farmer et al., 2025).
The EAGrit-LP was developed based on Duckworth et al. (2007)’s grit comprehension. The results of an exploratory and confirmatory factor analysis indicated that the scale had a two-factor structure and that its factor structure was invariant between country of origin (Portugal and Brazil), as well as between respondents’ gender (male and female) (Noronha et al., 2024). Although additional analyses were still needed (such as the assessment of the relationship between latent trait of individuals and their responses to the instrument), these results already provided a valuable contribution regarding the psychometric properties of the instrument (Nakano et al., 2015) - indicating the potential of the EAGrit-LP to assess grit in Portuguese language. A rigorous measure of grit is essential for quantifying perseverance of effort and consistency of interest in various areas, as Kuruveettissery et al. (2023) highlight.
In the present study, by using Item Response Theory, we have been able to better understand the functioning of the scale and its items. Since most psychometric studies of grit scales investigate validity (especially based on the internal structural of the measure and on the relationship with external variables), reliability estimates, and the association with success outcomes, this study is distinguished by providing different evidence. For example, when items are analyzed on other scales, only the item total correlation is considered (Hasan et al., 2022). Thus, Tynan (2021) recommended the use of Item Response Theory for the development and evaluation of measures. As determined by the analysis of residuals (infit and outfit), all items that constitute the EAGrit-LP presented adequate results. Thus, the model can predict both the results of individuals with ability levels close to the difficulty of the item and those with extreme values.
Regarding the items difficulty (i.e., the level of latent trait required for a given response to be given on the scale), results on the EAGrit-LP showed that, in general, the values on the latent trait required to endorse the items are low. These results imply that the items present content (or situations) that can be readily endorsed by individuals who possess low levels of ability (for the present study, grit). This result was shared by both factors of the instrument. Regarding item difficulty, item 11 (“I persist to achieve my goals”; b = -4.44) in PE and item 3 (“I try to commit to achieving my goals”; b = -4.76) in CI represented the easiest items. Considering (a) the difficulty results of all items (ranging from -3.99 to -4.44 for the PE and from 3.08 to -4.76 for the CI) and (b) that the average ability level of individuals in the present sample was theta -1, we can verify that this average ability level probably causes all items to be endorsed by the participants, selecting the higher response alternatives on the EAGrit-LP’s four points Likert scale (3 - I” sometimes agree” or 4 - “I totally agree”).
Gonzalez et al. (2020) reported similar findings regarding the Grit-S, showing that the instrument mostly distinguished participants that were at or below the mean trait on the grit construct. As a rule, a range of -2,00 to +2,00 is considered optimal for item difficulty (Alhadabi, 2023). In the current study, the values are lower than this criteria, and one item displayed underfitting to the Rasch model based on Areepattamannil and Khine (2017). The results of Areepattamannil and Khine (2017) study using the Original Grit Scale were similar with all items showing good fit indices, except for one item.
Furthermore, we determined the differential item functioning (DIF) based on the gender and country of origin of the participants. Based on the DIF results, it is possible to identify items that behave differently in different groups. Results revealed that three items presented DIF for country of origin and five items for gender, these items being more frequent for the Consistency of Interests factor. It is understood that the probability of endorsing a given item presenting DIF is different between the groups analyzed (Fidalgo & Scalon, 2012) - in the present case, gender and country of origin. These results indicate that other variables are influencing the results of the test, beyond the respondent’s grit levels.
In studies using IRT in other grit scales, mixed results have been reported. In Tyumeneva et al. (2014) study, the items of Grit Scale did not demonstrate significant DIF when comparing different genders. No significant DIF was also found in Areepattamannil and Khine (2017) study, when comparing participants’ grade level and gender. In contrast, DIF was found in some studies. Alhadabi (2023), for example, found DIF in two items of the Grit Scale, when comparing Omani and American samples. Midkiff et al. (2017) also found DIF in one item when comparing first-generation college students with non-first-generation college students. Finally, Matore et al. (2023) found DIF in one item in their comparison between gender, favoring males over females.
When significant DIF results are found, it can be derived that results on the item can favor or harm the performance of one group over another. This happens because, for these items, individuals with the same level of ability in the construct have different probabilities of endorsing them simply because they belong to groups with distinct characteristics (Nakano et al., 2015). That is, in these cases, differences in results for the item are due to characteristics other than the level of the latent trait being assessed. Thus, according to Nakano et al. (2015), studies that assess DIF can identify influences that a second dimension, related to characteristics of subgroups, exerts on the instrument’s items, altering their difficulty.
Regarding results for the EAGrit-LP, it is noted the significant number of items with DIF related to gender. These results indicate the importance of taking measures so that test results can be interpreted fairly for both groups (male and female). Considering that DIF is an analysis that aims to ensure that items do not favor one subgroup when compared to another (Ackerman & Ma, 2024), the results can help to ensure fairness and equity for all examinees (Jafaripour et al., 2024). Because of this, it is important that DIF is verified during the test construction process, helping researchers to support more accurate decisions related to items (Ahangari & Semiyari, 2022), considering possible disparities caused by variables unrelated to the construct being assessed.
In the case of the present study, when scoring and interpreting items that present DIF, there is a chance that the interpretation of grit based on the results of different groups (men or women and Brazilian and Portuguese individuals) is misleading, as certain items do not have the same probability of reaction (Fitriani & Situmorang, 2023). To avoid misinterpretation, the authors suggested that the instrument should be revised first, particularly in terms of content to further improve this reliability and trustworthiness.
In this sense, different procedures are proposed in the literature to minimize the influence of items that present DIF on the instrument score - even if there is no consensus on the most appropriate one (Sass, 2011). One possible action is the elimination of items that are subject to bias (Byrne et al., 1989). This option is indicated mainly when there are sufficient items to assess the construct of interest, covering an adequate range of variation of the latent trait (Peixoto et al., 2019). Considering the results of the present study, it is noted that this exclusion could harm the assessment of the latent trait (even more so when considering that of the items with DIF for gender, four of them are from the CI factor, and the exclusion would be unfeasible considering the number of items in the factor).
Therefore, it is suggested that the source of this bias be overcome before establishing normative tables for the interpretation of EAGrit-LP scores. It appears that the results of investigations of gender differences in the grit assessment are contradictory, sometimes suggesting differences in favor of one gender over the other, and sometimes suggesting no differences at all (Postigo et al., 2021). There may be gender differences in the dimensions of grit; however, these differences cannot be influenced by differences in responses to the test items. With this normative action, possible differences would not result in an assessment that would disfavor some of the evaluated groups.
The importance of grit is attributed to its potential for use in human resource management, recruitment, development, and training, as well as its implications for educators and practitioners when designing interventions for noncognitive development. Therefore, it is imperative to develop measures to adequately evaluate grit. In this sense, the results of the present study, on the psychometric properties of the EAGrit-LP ‘s items, added to the previous results on its evidence of validity, attesting the potential use of the instrument. In this context, this Portuguese language grit scale (EAGrit-LP) can contribute to reducing the scarcity of measures of the construct for Brazil and Portugal.
Final considerations
This study provides additional evidence regarding the psychometric characteristics of the Grit Assessment Scale - International version in Portuguese (EAGrit-LP). Based on the analysis of the instrument items, results of the present study revealed the relevance of the measure, as well as some characteristics that could be modified or improved. For example, depending on the interest in using the instrument, items with a greater level of difficulty could be included to adequately assess the different levels of grit in the adult populations of both countries (Brazil and Portugal). Furthermore, the existence of differential item functioning by gender and country of origin indicates the importance of developing separate normative tables for each of the groups, when preparing the standardization of the scale. That would be indicated to avoid possible biases in the correction and interpretation of data derived from the EAGrit-LP use.
Despite all these conclusions, this investigation entails certain limitations that must be addressed. Due to the differences in sample sizes between the two countries, with a sample primarily composed of Brazilians, caution should be exercised when interpreting the present results, particularly when considering the findings regarding DIF based on country of origin. For the instrument to be useful in psychological assessment in both countries, future studies should investigate other psychometric properties of the measure, considering a more representative sample of both countries. Also, it would also be interesting to investigate other psychometric characteristics of the EAGrit-LP, especially those usually investigated for other grit scales, like its relationship with success outcomes. In addition, future studies may investigate grit in clinical populations given that most studies have examined student populations (Datu, 2020).


















