Buy Stata 16

Buy online or contact our sales team for a customised quote based on:

  • Single and volume licenses.
  • Academic or commercial use.
  • Upgrades from previous versions.

Stata training

Save yourself valuable time. Find out about available training courses and resources to become proficient in Stata.

Stata mailing list

Subscribe to our Stata mailing list to get notified of new releases and training.

Stata News

Stata 16 announced

Stata webinars in March 2019

Upcoming Stata webinars in September and October

Stata release 15 logo

New in Stata release 15

Stata 15 has something for everyone. Below we list the highlights of the release. This release is unique because most of the new features can be used by researchers in every discipline.

Extended regression models (ERMs)

ERMs is our name for regression models that can account for the following:

  1. Endogenous covariates
  2. Nonrandom treatment assignment
  3. Heckman-style endogenous sample selection

While Stata already had commands such as heckman and ivregress that can address these problems individually, ERMs can account for the problems in any combination. And ERMs don’t just address these problems in linear models. There are four ERM commands:

  • eregress fits linear regression models for continuous outcomes.
  • eintreg fits interval regression, including tobit, for interval-measured and censored outcomes.
  • eprobit fits probit regression models for binary outcomes.
  • eoprobit fits ordered probit regression for ordinal outcomes.

You can now fit models that were previously unavailable, even if you need only one of the new features, such as

  • interval regression with endogenous covariates
  • probit regression with a binary endogenous covariate
  • probit regression with endogenous ordinal treatment
  • ordered probit regression with endogenous treatment
  • linear regression with tobit endogenous sample selection

Latent class analysis

Stata's gsem command now supports latent class analysis (LCA).

Latent class models use categorical latent variables. Categorical means group. Latent means unobserved. Categorical latent variables can be used, for instance,

  • in marketing or management to represent consumers with different buying preferences;
  • in health to represent patients in different risk groups; and
  • in education or psychology to represent students with different patterns of behavior.

Unobserved are the buying preferences, risk groups, and behavior patterns. These unobserved categories are the latent classes, and LCA is used to identify and understand them.

If we have observed variables that are indicators of unobserved groups of consumers, we could fit a latent class model and then

  • estimate the proportion of consumers belonging to each class;
  • estimate the probability of a positive response to observed variables in each consumer group;
  • evaluate the goodness of fit; and
  • predict the probability of belonging to each consumer group for individuals with a specific pattern of observed responses.

Stata’s LCA features also allow you to fit latent profile models (with continuous observed outcomes), path models with latent categorical variables, and finite mixture models (FMMs). But for FMMs, see item 6 below.

bayes prefix

The new bayes: prefix command lets you fit Bayesian regression models more easily and fit more models. You could fit a Bayesian linear regression using bayesmh. But now you can fit it by typing

. bayes: regress y x1 x2

That is convenient. What you could not previously do was fit a Bayesian survival model. Now you can with bayes: streg. You can also fit multilevel models with, for instance, bayes: mixed and bayes: melogit. The new bayes: prefix can be used with 45 Stata maximum-likelihood commands.

All of Stata's Bayesian features are supported by the new bayes: prefix command. You can select from many prior distributions for model parameters or use default priors. You can use the default adaptive Metropolis-Hastings sampling, or Gibbs sampling, or a combination of the two sampling methods, when available.

After estimation, you can use Stata's standard Bayesian postestimation tools such as bayesgraph to check convergence, bayesstats summary to estimate functions of model parameters, bayesstats ic and bayestest model to compute Bayes’s factors and compare Bayesian models, and bayestest interval to perform interval hypotheses testing.

Produce Word® and PDF documents embedding Stata results and graphs

It is now just as easy to produce Word® and PDF documents in Stata as it is to produce Excel® worksheets. Everybody loved putexcel in Stata 14. They will also love putdocx and putpdf.

The new commands work just like putexcel. That means you can write do-files to create entire Word or PDF reports containing the latest results, tables, and graphs. You can automate reproducible reports.

The new putdocx command writes paragraphs, images, and tables to a Word file or, to be precise about it, to Office Open XML (.docx) files. Just as with putpdf, images include Stata graphs, and you can format the objects.

The new putpdf command writes paragraphs, images, and tables to a PDF file. Images include Stata graphs and other images such as your organization's logo. You can format the objects, too -- bold face, italics, size, custom tables, etc.

Markdown and dynamic documents

Markdown is a standard markup language that provides text formatting from plain text input. It was designed to be easily converted into HTML, the language of the web. Stata now supports it.

You can create HTML files from your Stata output, including graphs. You will start with a plain text file containing Markdown-formatted text and dynamic tags specifying instruction to Stata, such as run this regression or produce that graph. You then use the new dyndoc command to convert the file to HTML.

Want to produce TeX documents? With the new dyntext command, you can produce any text-based


Linearized dynamic stochastic general equilibrium (DSGE) models

Stata now fits linearized DSGE models, which are time-series models used in economics and finance. These models are an alternative to traditional forecasting models. Both attempt to explain aggregate economic phenomena, but DSGE models do this on the basis of models derived from microeconomic theory.

Being based on microeconomic theory means lots of equations. The key feature of these equations is that expectations of future variables affect variables today. This is one feature that distinguishes DSGEs from a vector autoregression or a state-space model. The other feature is that, being derived from theory, the parameters can usually be interpreted in terms of that theory.

After fitting a DSGE model, estat policy and estat transition can report the policy and transition matrices. You can produce forecasts using Stata's existing forecast command, and you can graph impulse-response functions using Stata's existing irf command.

Finite mixture models

The new fmm: prefix command can be used with 17 Stata estimation commands to fit finite mixture models (FMMs). This means that with the fmm: prefix, we can now fit finite mixtures of regression models for continuous, binary, ordinal, count, categorical, and even survival-time outcomes.

The most typical use of fmm: is to fit one model and allow the parameters (coefficients, location, variance, scale, etc.) to vary across unobserved subpopulations. As with LCA, we call these unobserved subpopulations classes. Say we are interested in a linear regression and we believe there are three classes across which the parameters of the model might vary. Even though we have no variable recording the class membership, we can fit

. fmm 3: regress y x1 x2

Reported will be separate regression coefficients and intercepts for each class and a model for predicting membership in those classes.

In the same way, we can use fmm: logit for a binary outcome or fmm: streg for a survival-time outcome.

fmm: can even be used with multiple estimation commands simultaneously when the classes might follow different models. fmm: (regress y x1 x2) (poisson y x1 x2 x3) fits a linear regression in one class and a Poisson regression in another.

Postestimation commands are available to 1) estimate each class's proportion in the overall population; 2) report marginal means of the outcome variables within class; and 3) predict probabilities of class membership and predicted outcomes.

Spatial autoregressive (SAR) models

Stata now fits SAR models. SAR may stand for either spatial autoregressive or simultaneous autoregressive. Regardless of terminology, SAR models allow spatial lags of the dependent variable, spatial lags of the independent variables, and spatial autoregressive errors. Spatial lags are the spatial analog of time-series lags. Time-series lags are values of variables from recent times. Spatial lags are values from nearby areas.

SAR models are fit with the new commands spregress, spivregress (for endogenous covariates), and spxtregress (for panel data).

The models are appropriate for area (also known as areal) data. Observations are called spatial units and might be countries, states, districts, counties, cities, postal codes, or city blocks. Or they might not be geographically based at all. They could be nodes of social network. Spatial models estimate direct effects -- the effects of areas on themselves -- and estimate indirect or spillover effects -- effects from nearby areas.

Stata provides a suite of commands for working with spatial data and a new [SP] manual to accompany them. When spatial units are geographically based, you can download standard-format shapefiles from the web that define the map. With a single command, you can make spillover effects proportional to the inverse distance between areas or restrict them to be just from neighboring areas. And you can create your own custom definitions of proximity.

Interval-censored parametric survival-time models

Stata's new stintreg command joins streg for fitting parametric survival models. stintreg fits models to interval-censored data. In interval-censored data, the time of failure is not exactly known. What is known, subject by subject, is a time when the subject had not yet failed and a later time when the subject already had failed.

stintreg can fit exponential, Weibull, Gompertz, lognormal, loglogistic, and generalized gamma survival-time models. Both proportional-hazards and accelerated failure-time metrics are supported. After fitting a model with stintreg, you can plot survivor, hazard, and cumulative hazard functions, predict mean and median times, obtain Cox-Snell and martingale-like residuals, and more.

Nonlinear mixed-effects models

Stata’s new menl command fits nonlinear mixed-effects models, also known as nonlinear multilevel models and nonlinear hierarchical models. These models can be thought of two ways. You can think of them as nonlinear models containing random effects. Or you can think of them as linear mixed-effects models in which some or all fixed and random effects enter nonlinearly. However you think of them, the overall error distribution is assumed to be Gaussian.

These models are popular because some problems are not, says their science, linear in the parameters. These models are popular in population pharmacokinetics, bioassays, and studies of biological and agricultural growth processes.

For example, nonlinear mixed-effects models have been used to model

  • drug absorption in the body,
  • intensity of earthquakes, and
  • growth of plants.

Mixed logit models

Stata fits discrete choice models. Stata 15 will fit them with random coefficients. Discrete choice is another way of saying multinomial or conditional logistic regression. The word "mixed" is used by statisticians whenever some coefficients are random and others are fixed. Therefore, Stata 15 fits mixed logit models. These models are fit with the new asmixlogit command.

Random coefficients arise for many reasons, but there is a special reason researchers analyzing discrete choices might be interested in them. Random coefficients are a way around the IIA assumption. If you have a choice among walking, public transportation, or a car and you choose walking, the other two alternatives are irrelevant. Take one of them away, and you would still choose walking. Human beings sometimes violate this assumption, at least judged by their behavior.

Mathematically speaking, IIA makes alternatives independent after conditioning on covariates. If IIA is violated, then the alternatives would be correlated. Random coefficients allow that.


Please visit the Stata website for complete descriptions of all Stata features and new Stata 15 features.