Statistical tools for researchers
Stata is a general-purpose statistical package for researchers of all disciplines.
Stata is easy to use
- Stata has a full GUI with over 700 dialogs providing point-and-click access to Stata's commands.
- Stata has a simple, consistent command syntax and may be used either as a point-and-click GUI application or as a command-driven application.
- Here's how you do an OLS regression:
. regress mvalue gender age income educ1-educ6
- Most commands have the same syntax - whether the command fits a model, produces descriptive statistics, or performs a data-management task:
. logit outcome gender status exp if age>39
. graph income educ if state=="Texas"
. drop if select > 10
. by gender: tab case exposure
. by agegrp: summarize income yrswk
Stata is fast
- An OLS regression (regress command) takes 0.05 seconds with 10,000 observations and 10 covariates. Change the command to logit (maximum-likelihood logistic regression), and it still takes less than 0.09 seconds to estimate 11 coefficients (10 covariates) on 10,000 observations.
- Increase the number of observations to 100,000. The linear regression takes 0.34 seconds, and the logistic regression takes 0.65 seconds.
- Now increase the number of observations to 1,000,000. The OLS regression takes only 3.35 seconds, and the logistic regression takes just 6.57 seconds.
Note: All timings were performed on a 2.8 GHz P4 running Intercooled Stata for Windows.
Data management is one of Stata's strengths
- With a handful of basic commands, you can perform just about any data-management task or data transformation.
- Here's how you match-merge two datasets, merging newdata onto the current dataset matching on the variable id:
. merge id using newdata
- Here's how you create a lagged variable:
. sort year
. gen ylag = y[_n-1]
- Here's how you create a lagged variable for each subject's data:
. sort subject year
. by subject: gen ylag = y[_n-1]
- Stata has a spreadsheet editor , which you can use to enter, change, or view data. The editor also has a protected mode to prevent accidental changes to your data.
Stata handles large datasets
- Stata/SE can perform analyses on datasets with up to 32,767 variables and more than 2,000,000,000 observations.
Stata is designed for researchers who must be able to document their analyses
- Stata can be used interactively and in batch mode.
- Log files of interactive sections can be rerun as batch files. This makes it easy to duplicate analyses and to document fully your data-management steps.
- Log files even contain a record of changes done interactively in the spreadsheet editor!
Stata is extensible
- Stata is completely programmable.
- In addition, Stata has its own matrix-programming language.
- Users write and exchange new Stata commands that are used just like official commands.
