Statistical tools for researchers

Stata is a general-purpose statistical package for researchers of all disciplines.

Stata is easy to use

  • Stata has a full GUI with over 700 dialogs providing point-and-click access to Stata's commands.
  • Stata has a simple, consistent command syntax and may be used either as a point-and-click GUI application or as a command-driven application.
  • Here's how you do an OLS regression:

. regress mvalue gender age income educ1-educ6

  • Most commands have the same syntax - whether the command fits a model, produces descriptive statistics, or performs a data-management task:

. logit outcome gender status exp if age>39
. graph income educ if state=="Texas"
. drop if select > 10

. by gender: tab case exposure
. by agegrp: summarize income yrswk

Stata is fast

  • An OLS regression (regress command) takes 0.05 seconds with 10,000 observations and 10 covariates. Change the command to logit (maximum-likelihood logistic regression), and it still takes less than 0.09 seconds to estimate 11 coefficients (10 covariates) on 10,000 observations.
  • Increase the number of observations to 100,000. The linear regression takes 0.34 seconds, and the logistic regression takes 0.65 seconds.
  • Now increase the number of observations to 1,000,000. The OLS regression takes only 3.35 seconds, and the logistic regression takes just 6.57 seconds.

Note: All timings were performed on a 2.8 GHz P4 running Intercooled Stata for Windows.

Data management is one of Stata's strengths

  • With a handful of basic commands, you can perform just about any data-management task or data transformation.
  • Here's how you match-merge two datasets, merging newdata onto the current dataset matching on the variable id:

. merge id using newdata

  • Here's how you create a lagged variable:

. sort year
. gen ylag = y[_n-1]

  • Here's how you create a lagged variable for each subject's data:

. sort subject year
. by subject: gen ylag = y[_n-1]

  • Stata has a spreadsheet editor , which you can use to enter, change, or view data. The editor also has a protected mode to prevent accidental changes to your data.

Stata handles large datasets

  • Stata/SE can perform analyses on datasets with up to 32,767 variables and more than 2,000,000,000 observations.

Stata is designed for researchers who must be able to document their analyses

  • Stata can be used interactively and in batch mode.
  • Log files of interactive sections can be rerun as batch files. This makes it easy to duplicate analyses and to document fully your data-management steps.
  • Log files even contain a record of changes done interactively in the spreadsheet editor!

Stata is extensible

  • Stata is completely programmable.
  • In addition, Stata has its own matrix-programming language.
  • Users write and exchange new Stata commands that are used just like official commands.