Psychometric software

Psychometric software

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.[citation needed]

Contents

Sources

Because only a few commercial businesses (most notably Assessment Systems Corporation and Scientific Software International) develop specialized psychometric tools, there exist many free tools developed by researchers and educators. Important websites for free psychometric software include:

Classical test theory

Classical test theory is an approach to psychometric analysis that has weaker assumptions than item response theory and is more applicable to smaller sample sizes.

CITAS

CITAS (Classical Item and Test Analysis Spreadsheet) is a free Excel workbook designed to provide scoring and statistical analysis of classroom tests. Item responses (ABCD) and keys are typed or pasted into the workbook, and the output automatically populates; unlike other programs, CITAS does not require any "running" or experience in psychometric analysis, making it accessible to school teachers and professors. It is available for free download here.

jMetrik

jMetrik [3] is free and open source software for conducting a comprehensive psychometric analysis. It was developed by J. Patrick Meyer at the University of Virginia. Current methods include classical item analysis, differential item functioning (DIF) analysis, confirmatory factor analysis, item response theory, IRT equating, and nonparametric item response theory. The item analysis includes proportion, point biserial, and biserial statistics for all response options. Reliability coefficients include Cronbach's alpha, Guttman's lambda, the Feldt-Gilmer Coefficient, the Feldt-Brennan coefficient, decision consistency indices, the conditional standard error of measurement, and reliability if item deleted. The DIF analysis is based on nonparametric item characteristic curves and the Mantel-Haenszel procedure. DIF effect sizes and ETS DIF classifications are included in the output. Confirmatory factor analysis is limited to the common factor model for congeneric, tau-equivalent, and parallel measures. Fit statistics are reported along with factor loadings and error variances. IRT methods include the Rasch, partial credit, and rating scale models. IRT equating methods include mean/mean, mean/sigma, Haebara, and Stocking-Lord procedures.

jMetrik also include basic descriptive statistics and a graphics facility that produces bar charts, pie chart, histograms, kernel density estimates, and line plots.

jMetrik is a pure Java application that runs on 32-bit and 64-bit versions of Windows, Mac, and Linux operating systems. jMetrik requires Java 1.6 on the host computer. jMetrik is available as a free download from www.ItemAnalysis.com.

Iteman

Iteman is a commercial program specifically designed for classical test analysis, producing rich text (RTF) reports with graphics, narratives, and embedded tables. It calculates the proportion and point biserial of each item, as well as high/low subgroup proportions, and detailed graphics of item performance. It also calculates typical descriptive statistics, including the mean, standard deviation, reliability, and standard error of measurement, for each domain and the overall tests. It is only available from Assessment Systems Corporation [4].

Lertap

Lertap (Laboratory of Educational Research Test Analysis Program) is a comprehensive software package for classical test analysis developed for use with Microsoft Excel. It includes test, item, and option statistics, classification consistency and mastery test analysis, procedures for cheating detection, and extensive graphics (e.g., trace lines for item options, conditional standard errors of measurement, scree plots, boxplots of group differences, histograms, scatterplots).

DIF, differential item functioning, is supported in the Excel 2007 and Excel 2010 versions of Lertap. Mantel-Haenszel methods are used; graphs of results are provided.

Lertap will produce ASCII data files ready for input to Xcalibre and Bilog MG.

Several sample datasets for use with Lertap and/or other item and test analysis programs are available [5]; these involve both cognitive tests, and affective (or rating) scales. Technical papers related to the application of Lertap are also available [6].

Lertap was developed by Larry Nelson at Curtin University; commercial versions are available from Assessment Systems Corporation [7].

TAP

TAP (the Test Analysis Program) is a free program for basic classical analysis developed by Gordon Brooks at the University of Ohio. It is available here.

ViSta-CITA

ViSta-CITA (Classical Item and Test Analysis) is a module included in the Visual Statistics System (ViSta) that focuses on graphical-oriented methods applied to psychometric analysis. It is freely available at [8]. It was developed by Ruben Ledesma, J. Gabriel Molina, Pedro M. Valero-Mora, and Forrest W. Young.

Item response theory calibration

Item response theory (IRT) is a psychometric approach which assumes that the probability of a certain response is a direct function of an underlying trait or traits. Various functions have been proposed to model this relationship, and the different calibration packages reflect this. Several software packages have been developed for additional analysis such as equating; they are listed in the next section.

BILOG-MG

BILOG-MG is a software program for IRT analysis of dichotomous (correct/incorrect) data, including fit and differential item functioning. It is commercial, and only available from Scientific Software International [9] or Assessment Systems Corporation [10].

ICL

ICL (IRT Command Language) performs IRT calibrations, including the 1, 2, and 3 parameter logistic models as well as the partial credit model and generalized partial credit model. It can also generate response data. As the name implies, it is completely command code driven, with no graphical user interface. It is available for free download here.

jMetrik

jMetrik [11] is free and open source software for conducting a comprehensive psychometric analysis. It was developed by J. Patrick Meyer at the University of Virginia. Current methods include classical item analysis, differential item functioning (DIF) analysis, confirmatory factor analysis, item response theory, IRT equating, and nonparametric item response theory. The item analysis includes proportion, point biserial, and biserial statistics for all response options. Reliability coefficients include Cronbach's alpha, Guttman's lambda, the Feldt-Gilmer Coefficient, the Feldt-Brennan coefficient, decision consistency indices, the conditional standard error of measurement, and reliability if item deleted. The DIF analysis is based on nonparametric item characteristic curves and the Mantel-Haenszel procedure. DIF effect sizes and ETS DIF classifications are included in the output. Confirmatory factor analysis is limited to the common factor model for congeneric, tau-equivalent, and parallel measures. Fit statistics are reported along with factor loadings and error variances. IRT methods include the Rasch, partial credit, and rating scale models. IRT equating methods include mean/mean, mean/sigma, Haebara, and Stocking-Lord procedures.

jMetrik also include basic descriptive statistics and a graphics facility that produces bar charts, pie chart, histograms, kernel density estimates, and line plots.

jMetrik is a pure Java application that runs on 32-bit and 64-bit versions of Windows, Mac, and Linux operating systems. jMetrik requires Java 1.6 on the host computer. jMetrik is available as a free download from www.ItemAnalysis.com.

MULTILOG

MULTILOG is an extension of BILOG to data with polytomous (multiple) responses. It is commercial, and only available from Scientific Software International [12] or Assessment Systems Corporation [13].

PARSCALE

PARSCALE is a program designed specifically for polytomous IRT analysis. It is commercial, and only available from Scientific Software International [14] or Assessment Systems Corporation [15].

PARAM-3PL

PARAM-3PL [16] is a free program for the calibration of the 3-parameter logistic IRT model. It was developed by Lawrence Rudner at the Education Resources Information Center (ERIC). The latest release was version 0.89 in June 2007. It is available from ERIC here.

TESTFact

Testfact features [17] - Marginal maximum likelihood (MML) exploratory factor analysis and classical item analysis of binary data - Computes tetrachoric correlations, principal factor solution, classical item descriptive statistics, fractile tables and plots - Handles up to 10 factors using numerical quadrature: up to 5 for non-adaptive and up to 10 for adaptive quadrature - Handles up to 15 factors using Monte Carlo integration techniques - Varimax (orthogonal) and PROMAX (oblique) rotation of factor loadings - Handles an important form of confirmatory factor analysis known as "bifactor" analysis: Factor pattern consists of one main factor plus group factors - Simulation of responses to items based on user specified parameters - Correction for guessing and not-reached items - Allows imposition of constraints on item parameter estimates - Handles omitted and not-presented items - Detailed online HELP documentation includes syntax and annotated examples.

WINMIRA 2001

WINMIRA 2001 is a program for analyses with the Rasch model for dichotomous and polytomous ordinal responses, with the latent class analysis, and with the Mixture Distribution Rasch model for dichotomous [1] and polytomous item responses [2]. The software provides conditional maximum likelihood (CML) estimation of item parameters, as well as MLE and WLE estimates of person parameters, and person- and item-fit statistics as well as information criteria (AIC, BIC, CAIC) for model selection. The software also performs a parametric bootstrap procedure for the selection of the number of mixture components. A free student version is available from Matthias von Davier's webpage at http://www.von-davier.com/[18], a commercial version is available through ASSESS.COM at [19].

Winsteps

Winsteps is a program designed for analysis with the Rasch model, a one-parameter item response theory model which differs from the 1PL model in that each individual in the person sample is parameterized for item estimation and it is prescriptive and criterion-referenced, rather than descriptive and norm-referenced in nature.[3] It is commercially available from Winsteps, Inc. [20]. A previous DOS-based version, BIGSTEPS, is also available.

Xcalibre

XCalibre is a commercial program that performs marginal maximum likelihood estimation of both dichotomous (1PL-Rasch, 2PL, 3PL) and polytomous IRT models, utilizing text files for both input and output. The interface is point-and-click; no command code required. It is only available from Assessment Systems Corporation [21].

Additional item response theory software

Because of the complexity of IRT, there exist few software packages capable of calibration. However, many software programs exist for specific ancillary IRT analyses such as equating and scaling. Examples of such software follow.

eqboot

eqboot is an open source syntax-based Java application for conducting IRT equating and computing the bootstrap standard error of equating developed by J. Patrick Meyer. The program runs on any 32- or 64-bit operating system that has the Java Runtime Environment (JRE) version 1.6 or higher installed. At the moment, the programs only support equating with binary items. EQBOOT will compute equating constants using the mean/mean, mean/sigma, Haebara,[4] and Stocking-Lord[5] procedures. It will also compute the standard error of equating if the user provides a comma delimited file of bootstrapped item parameter estimates from both forms, a comma delimited file of bootstrapped ability estimates for Form X examinees, and a comma delimited file of bootstrapped ability estimates for Form Y examinees. Options allow the user to specify the criterion function for the Haebara and Stocking-Lord methods.[6] In addition, the examinee distribution over which the criterion function is minimized may be set to the observed theta estimates, a histogram of theta estimates, a kernel density estimate of theta estimates, or uniformly spaced values on the theta scale. The software is a free download from www.ItemAnalysis.com.

IRTEQ

IRTEQ [22] is a freeware Windows GUI application that implements IRT scaling and equating developed by Kyung (Chris) T. Han. It implements IRT scaling/equating methods that are widely used with the “Non-Equivalent Groups Anchor Test” design: Mean/Mean,[7] Mean/Sigma,[8] Robust Mean/Sigma,[9] and TCC methods.[10][11] For TCC methods, IRTEQ provides the user with the option to choose various score distributions for incorporation into the loss function. IRTEQ supports various popular unidimensional IRT models: Logistic models for dichotomous responses (with 1, 2, or 3 parameters) and the Generalized Partial Credit Model (GPCM) (including Partial Credit Model (PCM), which is a special case of GPCM) and Graded Response Model (GRM) for polytomous responses. IRTEQ can also equate test scores on the scale of a test to the scale of another test using IRT true score equating.[12]

ResidPlots-2

ResidPlots-2 [23] is a free program for IRT graphical residual analysis. It was developed by Tie Liang, Kyung (Chris) T. Han, and Ronald K. Hambleton at the University of Massachusetts.

WinGen

WinGen [24] is a free Windows-based program that generates IRT parameters and item responses. Kyung (Chris) T. Han at the University of Massachusetts.[13]

ST

ST [25] conducts item response theory (IRT) scale transformations for dichotomously scored tests.

POLYST

POLYST [26] conducts IRT scale transformations for dichotomously and polytomously scored tests.

STUIRT

STUIRT [27] conducts IRT scale transformations for mixed-format tests (tests that include some multiple choice items and some polytomous items).

Decision Consistency

Decision consistency methods are applicable to criterion-referenced tests such as licensure exams and academic mastery testing.

jMetrik

jMetrik [28] is free and open source software for conducting a comprehensive psychometric analysis. Detailed information is listed above. jMetrik includes Huynh's decision consistency estimates if cut-scores are provided in the item analysis.

General statistical analysis software

Software designed for general statistical analysis can often be used for certain types of psychometric analysis. Moreover, code for more advanced types of psychometric analysis is often available.

R

R is a programming environment designed for statistical computing and production of graphics. It is freely available at [29].

SPSS

SPSS, originally called the Statistical Package for the Social Sciences, is a commercial general statistical analysis program where the data is presented in a spreadsheet layout and common analyses are menu driven.

S-Plus

S-Plus is a commercial analysis package based on the programming language S.

SAS

SAS is a commercially available package for statistical analysis and manipulation of data. It is also command-based.

References

  1. ^ Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271-282.
  2. ^ von Davier, M., & Rost, J. (1995). Polytomous mixed Rasch models. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models, foundations, recent developments, and applications (pp. 371-382). New York: Springer.
  3. ^ Rasch dichotomous model vs. One-parameter Logistic Model [1]. Rasch Measurement Transactions [2], 2005, 19:3 p. 1032
  4. ^ Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144‐149.
  5. ^ Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
  6. ^ Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods.Journal of Educational and Behavioral Statistics, 32, 371-397.
  7. ^ Loyd & Hoover, 1980
  8. ^ Marco, 1977
  9. ^ Linn, Levine, Hastings, & Wardrop, 1981
  10. ^ Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144‐149.
  11. ^ Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.
  12. ^ Lord, F.M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
  13. ^ Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31, 457-459.

Wikimedia Foundation. 2010.

См. также в других словарях:

  • test — The event of a price movement that approaches a support level or a resistance level established earlier by the market. A test is passed if prices do not go below the support or resistance level, and the test is failed if prices go on to new lows… …   Financial and business terms

  • HSL and HSV — Fig. 1. HSL (a–d) and HSV (e–h). Above (a, e): cut away 3D models of each. Below: two dimensional plots showing two of a model’s three parameters at once, holding the other constant: cylindrical shells (b, f) of constant saturation, in this case… …   Wikipedia

  • Item response theory — In psychometrics, item response theory (IRT) is a body of theory describing the application of mathematical models to data from questionnaires and tests as a basis for measuring abilities, attitudes, or other variables. It is used for statistical …   Wikipedia

  • beta-test — test test 1 [test] noun [countable] 1. a set of questions, exercises, or practical activities to measure someone s skill, ability, or knowledge ˈaptitude test HUMAN RESOURCES a test to find out if you have the necessary skills to do a particular… …   Financial and business terms

  • testing — test‧ing [ˈtestɪŋ] noun [uncountable] 1. the process of checking something to see if it works, if it is suitable etc: • The company specializes in software testing and software inspection. • All our desktop computers undergo rigorous testing. •… …   Financial and business terms

  • alpha-test — test test 1 [test] noun [countable] 1. a set of questions, exercises, or practical activities to measure someone s skill, ability, or knowledge ˈaptitude test HUMAN RESOURCES a test to find out if you have the necessary skills to do a particular… …   Financial and business terms

  • Structural equation modeling — (SEM) is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions. This definition of SEM was articulated by the geneticist Sewall Wright (1921),[1] the… …   Wikipedia

  • Computerized adaptive testing — (CAT) is a form of computer based test that adapts to the examinee s ability level. For this reason, it has also been called tailored testing. Contents 1 How CAT works 2 Advantages 3 Disadvantages …   Wikipedia

  • Industrial and organizational psychology — Psychology …   Wikipedia

  • Psychology — (from Greek gr. ψῡχή, psȳkhē , breath, life, soul ; and gr. λογία, logia ) is an academic and applied discipline involving the scientific study of mental processes and behavior. Psychologists study such phenomena as perception, cognition, emotion …   Wikipedia


Поделиться ссылкой на выделенное

Прямая ссылка:
Нажмите правой клавишей мыши и выберите «Копировать ссылку»