Features

 Linear regression

regression  •  censored outcomes  •  endogenous regressors  •  bootstrap, jackknife, and robust and cluster–robust variance  •  instrumental variables  •  three-stage least squares  •  constraints  •  quantile regression  •  GLS  •  more

Panel / Longitudinal

random and fixed effects with robust standard errors  •  linear mixed models  •  random-effects probit  •  GEE  •  random- and fixed-effects Poisson  •  dynamic panel-data models  •  instrumental variables  •  panel unit-root tests  •  more 

Time series

ARIMA  •  ARFIMA  •  ARCH/GARCH  •  VAR  •  VECM  •  multivariate GARCH  •  unobserved-components model  •  dynamic factors  •  state-space models  •  Markov-switching models  •  business calendars  •  tests for structural breaks  •  threshold regression  •  forecasts  •  impulse–response functions  •  unit-root tests  •  filters and smoothers  •  rolling and recursive estimation  •  more

Data management

data transformations  •  match-merge  •  import/export data  •  ODBC  •  SQL  •  Unicode  •  by-group processing  •  append files  •  sort  •  row–column transposition  •  labeling  •  save results •  more

Survival analysis

Kaplan–Meier and Nelson–Aalen estimators,  •  Cox regression (frailty)  •  parametric models (frailty, random effects)  •  competing risks  •  hazards  •  time-varying covariates  •  left-, right-, and interval-censoring  •  Weibull, exponential, and Gompertz models  •  more

Multi-level mixed effects models

continuous, binary, count, and survival outcomes  •  two-, three-, and higher-level models  •  generalized linear models  •  nonlinear models  •  random intercepts  •  random slopes  •  crossed random effects  •  BLUPs of effects and fitted values  •  hierarchical models  •  residual error structures  •  DDF adjustments  •  support for survey data  •  more

Graphics

lines  •  bars  •  areas  •  ranges  •  contours  •  confidence intervals  •  interaction plots  •  survival plots  •  publication quality  •  customise anything  •  Graph Editor  •  more

Graphical user interface

menus and dialogs for all features  •  Data Editor  •  Variables Manager  •  Graph Editor  •  Project Manager  •  Do-file Editor  •  Clipboard Preview Tool  •  multiple preference sets  •  more

Documentation

31 manuals  •  15,000+ pages  •  seamless navigation  •  thousands of worked examples  •  quick starts  •  methods and formulas  •  references  •  more

 Bayesian analysis

thousands of built-in models  •  univariate and multivariate models  •  linear and nonlinear models  •  multilevel models  •  continuous, binary, ordinal, and count outcomes  •  bayes: prefix for 45 estimation commands  •  continuous univariate, multivariate, and discrete priors  •  add your own models  •  convergence diagnostics  •  posterior summaries  •  hypothesis testing  •  model comparison  •  more

Binary count and limited outcomes

logistic, probit, tobit  •  Poisson and negative binomial  •  conditional, multinomial, nested, ordered, rank-ordered, and stereotype logistic  •  multinomial probit  •  zero-inflated and left-truncated count models  •  selection models  •  marginal effects  •  more

Power and sample size

power  •  sample size  •  effect size  •  minimum detectable effect  •  means  •  proportions  •  variances  •  correlations  •  ANOVA  •  regression  •  cluster randomized designs  •  case–control studies  •  cohort studies  •  contingency tables  •  survival analysis  •  balanced or unbalanced designs  •  results in tables or graphs  •  more

Basic statistics

summaries  •  cross-tabulations  •  correlations  •  z and ttests  •  equality-of-variance tests  •  tests of proportions  •  confidence intervals  •  factor variables  •  more

Extended regression models

combine endogenous covariates •  sample selection •  nonrandom treatment in models for continuous, interval-censored, binary, and ordinal outcomes  •  more

Epidemiology

standardization of rates  •  case–control  •  cohort  •  matched case–control  •  Mantel–Haenszel  •  pharmacokinetics  •  ROC analysis  •  ICD-10  •  more

Non-parametric methods

nonparametric regression  •  Wilcoxon–Mann–Whitney, Wilcoxon signed ranks, and Kruskal–Wallis tests  •  Spearman and Kendall correlations  •  Kolmogorov–Smirnov tests  •  exact binomial CIs  •  survival data  •  ROC analysis  •  smoothing  •  bootstrapping  •  more

Generalised linear models (GLM)

ten link functions  •  user-defined links  •  seven distributions  •  ML and IRLS estimation  •  nine variance estimators  •  seven residuals  •  more

 Treatment effects / Causal inference

inverse probability weight (IPW)  •  doubly robust methods  •  propensity-score matching  •  regression adjustment  •  covariate matching  •  multilevel treatments  •  endogenous treatments  •  average treatment effects (ATEs)  •  ATEs on the treated (ATETs)  •  potential-outcome means (POMs)  •  continuous, binary, count, fractional, and survival outcomes  •  more

Finite mixture models

fmm: prefix for 17 estimators  •  mixtures of a single estimator  •  mixtures combining multiple estimators or distributions  •  continuous, binary, count, ordinal, categorical, censored, truncated, and survival outcomes   •  more

Structural Equation Modelling (SEM)

graphical path diagram builder  •  standardized and unstandardized estimates  •  modification indices  •  direct and indirect effects  •  continuous, binary, count, ordinal, and survival outcomes  •  multilevel models  •  random slopes and intercepts  •  factor scores, empirical Bayes, and other predictions  •  groups and tests of invariance  •  goodness of fit  •  handles MAR data by FIML  •  correlated data  •  survey data  •  more

Latent class analysis

binary, ordinal, continuous, count, categorical, fractional, and survival items  •  add covariates to model class membership  •  combine with SEM path models  •  expected class proportions  •  goodness of fit  •  predictions of class membership  •  more

Other statistical methods

kappa measure of interrater agreement  •  Cronbach’s alpha  •  stepwise regression  •  tests of normality  •  more

Functions

statistical  •  random-number  •  mathematical  •  string  •  date and time  •  more

ANOVA / MANOVA

balanced and unbalanced designs  •  factorial, nested, and mixed designs  •  repeated measures  •  marginal means  •  contrasts  •  more

Internet enabled

ability to install new commands  •  web updating  •  web file sharing  •  latest Stata news  •  more

Exact statistics

exact logistic and Poisson regression  •  exact case–control statistics  •  binomial tests  •  Fisher’s exact test for r × c tables  •  more 

Linearised DSGE models

specify models algebraically  •  solve models  •  estimate parameters  •  identification diagnostics  •  policy and transition matrices  •  IRFs  •  dynamic forecasts  •  more

Mata – Stata’s serious programming language

interactive sessions  •  large-scale development projects  •  optimization  •  matrix inversions  •  decompositions  •  eigenvalues and eigenvectors  •  LAPACK engine  •  real and complex numbers  •  string matrices  •  interface to Stata datasets and matrices  •  numerical derivatives  •  object-oriented programming  •  more

Multiple imputation

nine univariate imputation methods  •  multivariate normal imputation  •  chained equations  •  explore pattern of missingness  •  manage imputed datasets  •  fit model and pool results  •  transform parameters  •  joint tests of parameter estimates  •  predictions  •  more

Programming features

adding new commands  •  command scripting  •  object-oriented programming  •  menu and dialog-box programming  •  dynamic documents  •  Markdown  •  Project Manager  •  plugins  •  more

Tests, predictions and effects

Wald tests  •  LR tests  •  linear and nonlinear combinations  •  predictions and generalized predictions  •  marginal means  •  least-squares means  •  adjusted means  •  marginal and partial effects  •  forecast models  •  Hausman tests  •  more

Survey methods

multistage designs  •  bootstrap, BRR, jackknife, linearized, and SDR variance estimation  •  poststratification  •  DEFF  •  predictive margins  •  means, proportions, ratios, totals  •  summary tables  •  almost all estimators supported  •  more

Community-contributed commands

community-contributed commands  •  meta-analysis •  data management •  survival •  econometrics  •  more

Contrasts, pairwise comparisons, margins

compare means, intercepts, or slopes  •  compare with reference category, adjacent category, grand mean, etc.  •  orthogonal polynomials  •  multiple-comparison adjustments  •  graph estimated means and contrasts  •  interaction plots  •  more

Cluster analysis

hierarchical clustering  •  kmeans and kmedian nonhierarchical clustering  •  dendrograms  •  stopping rules  •  user-extensible analyses  •  more

Simple maximum likelihood

specify likelihood using simple expressions  •  no programming required  •  survey data  •  standard, robust, bootstrap, and jackknife SEs  •  matrix estimators  •  more

Item response theory (IRT)

binary (1PL, 2PL, 3PL), ordinal, and categorical response models  •  item characteristic curves  •  test characteristic curves  •  item information functions  •  test information functions  •  differential item functioning (DIF)  •  more

Programmable maximum likelihood

user-specified functions  •  NR, DFP, BFGS, BHHH  •  OIM, OPG, robust, bootstrap, and jackknife SEs  •  Wald tests  •  survey data  •  numeric or analytic derivatives  •  more

Multivariate methods

factor analysis  •  principal components  •  discriminant analysis  •  rotation  •  multidimensional scaling  •  Procrustean analysis  •  correspondence analysis  •  biplots  •  dendrograms  •  user-extensible analyses  •  more

Resampling and simulation methods

bootstrap  •  jackknife  •  Monte Carlo simulation  •  permutation tests  •  more

Installation qualification

IQ report for regulatory agencies such as the FDA  •  installation verification  •  more

Accessibility

Section 508 compliance •  accessibility for persons with disabilities •  more

Sample session

A sample session of Stata for MacUnix, or Windows.

New in Stata 17 — Tables  •  Bayesian econometrics  •  PyStata  •  Jupyter Notebooks  •  Faster Stata  •  DID  •  Interval-censored Cox  •  Multivariate meta-analysis  •  Bayesian VAR, DSGE, and panel-data models  •  Treatment-effects lasso  •  Panel-data multinomial logit  •  Zero-inflated logit  •  Nonparametric trend tests  •  MKL  •  JDBC  •  Java integration  •  H2O integration  •   more