## Buy Stata 15

**Contact our sales team** for a customised quote based on:

- Single and volume licenses.
- Academic or commercial use.
- Upgrades from previous versions.

Stata 15 has something for everyone. Below we list the highlights of the release. This release is unique because most of the new features can be used by researchers in every discipline.

ERMs is our name for regression models that can account for the following:

- Endogenous covariates
- Nonrandom treatment assignment
- Heckman-style endogenous sample selection

While Stata already had commands such as **heckman** and **ivregress** that can address these problems individually, ERMs can account for the problems in any combination. And ERMs don’t just address these problems in linear models. There are four ERM commands:

**eregress**fits linear regression models for continuous outcomes.**eintreg**fits interval regression, including tobit, for interval-measured and censored outcomes.**eprobit**fits probit regression models for binary outcomes.**eoprobit**fits ordered probit regression for ordinal outcomes.

You can now fit models that were previously unavailable, even if you need only one of the new features, such as

- interval regression with endogenous covariates
- probit regression with a binary endogenous covariate
- probit regression with endogenous ordinal treatment
- ordered probit regression with endogenous treatment
- linear regression with tobit endogenous sample selection

Stata's **gsem** command now supports latent class analysis (LCA).

Latent class models use categorical latent variables. Categorical means group. Latent means unobserved. Categorical latent variables can be used, for instance,

- in marketing or management to represent consumers with different buying preferences;
- in health to represent patients in different risk groups; and
- in education or psychology to represent students with different patterns of behavior.

Unobserved are the buying preferences, risk groups, and behavior patterns. These unobserved categories are the latent classes, and LCA is used to identify and understand them.

If we have observed variables that are indicators of unobserved groups of consumers, we could fit a latent class model and then

- estimate the proportion of consumers belonging to each class;
- estimate the probability of a positive response to observed variables in each consumer group;
- evaluate the goodness of fit; and
- predict the probability of belonging to each consumer group for individuals with a specific pattern of observed responses.

Stata’s LCA features also allow you to fit latent profile models (with continuous observed outcomes), path models with latent categorical variables, and finite mixture models (FMMs). But for FMMs, see item 6 below.

The new **bayes:** prefix command lets you fit Bayesian regression models more easily and fit more models. You could fit a Bayesian linear regression using **bayesmh.** But now you can fit it by typing

**. bayes: regress y x1 x2**

That is convenient. What you could not previously do was fit a Bayesian survival model. Now you can with bayes: streg. You can also fit multilevel models with, for instance, **bayes: mixed** and **bayes: melogit.** The new **bayes:** prefix can be used with 45 Stata maximum-likelihood commands.

All of Stata's Bayesian features are supported by the new **bayes:** prefix command. You can select from many prior distributions for model parameters or use default priors. You can use the default adaptive Metropolis-Hastings sampling, or Gibbs sampling, or a combination of the two sampling methods, when available.

After estimation, you can use Stata's standard Bayesian postestimation tools such as **bayesgraph** to check convergence, **bayesstats summary** to estimate functions of model parameters, **bayesstats ic** and **bayestest model** to compute Bayes’s factors and compare Bayesian models, and **bayestest** interval to perform interval hypotheses testing.

It is now just as easy to produce Word® and PDF documents in Stata as it is to produce Excel® worksheets. Everybody loved **putexcel** in Stata 14. They will also love **putdocx** and **putpdf**.

The new commands work just like **putexcel**. That means you can write do-files to create entire Word or PDF reports containing the latest results, tables, and graphs. You can automate reproducible reports.

The new **putdocx** command writes paragraphs, images, and tables to a Word file or, to be precise about it, to Office Open XML (.docx) files. Just as with **putpdf**, images include Stata graphs, and you can format the objects.

The new **putpdf** command writes paragraphs, images, and tables to a PDF file. Images include Stata graphs and other images such as your organization's logo. You can format the objects, too -- bold face, italics, size, custom tables, etc.

Markdown is a standard markup language that provides text formatting from plain text input. It was designed to be easily converted into HTML, the language of the web. Stata now supports it.

You can create HTML files from your Stata output, including graphs. You will start with a plain text file containing Markdown-formatted text and dynamic tags specifying instruction to Stata, such as run this regression or produce that graph. You then use the new **dyndoc** command to convert the file to HTML.

Want to produce TeX documents? With the new **dyntext** command, you can produce any text-based

document!

Stata now fits linearized DSGE models, which are time-series models used in economics and finance. These models are an alternative to traditional forecasting models. Both attempt to explain aggregate economic phenomena, but DSGE models do this on the basis of models derived from microeconomic theory.

Being based on microeconomic theory means lots of equations. The key feature of these equations is that expectations of future variables affect variables today. This is one feature that distinguishes DSGEs from a vector autoregression or a state-space model. The other feature is that, being derived from theory, the parameters can usually be interpreted in terms of that theory.

After fitting a DSGE model, **estat policy** and **estat transition** can report the policy and transition matrices. You can produce forecasts using Stata's existing **forecast** command, and you can graph impulse-response functions using Stata's existing **irf** command.

The new **fmm:** prefix command can be used with 17 Stata estimation commands to fit finite mixture models (FMMs). This means that with the **fmm:** prefix, we can now fit finite mixtures of regression models for continuous, binary, ordinal, count, categorical, and even survival-time outcomes.

The most typical use of **fmm:** is to fit one model and allow the parameters (coefficients, location, variance, scale, etc.) to vary across unobserved subpopulations. As with LCA, we call these unobserved subpopulations classes. Say we are interested in a linear regression and we believe there are three classes across which the parameters of the model might vary. Even though we have no variable recording the class membership, we can fit

**. fmm 3: regress y x1 x2**

Reported will be separate regression coefficients and intercepts for each class and a model for predicting membership in those classes.

In the same way, we can use **fmm: logit** for a binary outcome or **fmm: streg** for a survival-time outcome.

**fmm:** can even be used with multiple estimation commands simultaneously when the classes might follow different models. **fmm: (regress y x1 x2) (poisson y x1 x2 x3)** fits a linear regression in one class and a Poisson regression in another.

Postestimation commands are available to 1) estimate each class's proportion in the overall population; 2) report marginal means of the outcome variables within class; and 3) predict probabilities of class membership and predicted outcomes.

Stata now fits SAR models. SAR may stand for either spatial autoregressive or simultaneous autoregressive. Regardless of terminology, SAR models allow spatial lags of the dependent variable, spatial lags of the independent variables, and spatial autoregressive errors. Spatial lags are the spatial analog of time-series lags. Time-series lags are values of variables from recent times. Spatial lags are values from nearby areas.

SAR models are fit with the new commands **spregress, spivregress** (for endogenous covariates), and **spxtregress** (for panel data).

The models are appropriate for area (also known as areal) data. Observations are called spatial units and might be countries, states, districts, counties, cities, postal codes, or city blocks. Or they might not be geographically based at all. They could be nodes of social network. Spatial models estimate direct effects -- the effects of areas on themselves -- and estimate indirect or spillover effects -- effects from nearby areas.

Stata provides a suite of commands for working with spatial data and a new [SP] manual to accompany them. When spatial units are geographically based, you can download standard-format shapefiles from the web that define the map. With a single command, you can make spillover effects proportional to the inverse distance between areas or restrict them to be just from neighboring areas. And you can create your own custom definitions of proximity.

Stata's new **stintreg** command joins **streg** for fitting parametric survival models. **stintreg** fits models to interval-censored data. In interval-censored data, the time of failure is not exactly known. What is known, subject by subject, is a time when the subject had not yet failed and a later time when the subject already had failed.

**stintreg** can fit exponential, Weibull, Gompertz, lognormal, loglogistic, and generalized gamma survival-time models. Both proportional-hazards and accelerated failure-time metrics are supported. After fitting a model with **stintreg**, you can plot survivor, hazard, and cumulative hazard functions, predict mean and median times, obtain Cox-Snell and martingale-like residuals, and more.

Stata’s new **menl** command fits nonlinear mixed-effects models, also known as nonlinear multilevel models and nonlinear hierarchical models. These models can be thought of two ways. You can think of them as nonlinear models containing random effects. Or you can think of them as linear mixed-effects models in which some or all fixed and random effects enter nonlinearly. However you think of them, the overall error distribution is assumed to be Gaussian.

These models are popular because some problems are not, says their science, linear in the parameters. These models are popular in population pharmacokinetics, bioassays, and studies of biological and agricultural growth processes.

For example, nonlinear mixed-effects models have been used to model

- drug absorption in the body,
- intensity of earthquakes, and
- growth of plants.

Stata fits discrete choice models. Stata 15 will fit them with random coefficients. Discrete choice is another way of saying multinomial or conditional logistic regression. The word "mixed" is used by statisticians whenever some coefficients are random and others are fixed. Therefore, Stata 15 fits mixed logit models. These models are fit with the new **asmixlogit** command.

Random coefficients arise for many reasons, but there is a special reason researchers analyzing discrete choices might be interested in them. Random coefficients are a way around the IIA assumption. If you have a choice among walking, public transportation, or a car and you choose walking, the other two alternatives are irrelevant. Take one of them away, and you would still choose walking. Human beings sometimes violate this assumption, at least judged by their behavior.

Mathematically speaking, IIA makes alternatives independent after conditioning on covariates. If IIA is violated, then the alternatives would be correlated. Random coefficients allow that.

Please visit the Stata website for complete descriptions of all Stata features and new Stata 15 features.