Stata introduction, Answers
* Read in data if stored on C (you must use your own file adress here!)
use "C:\Users\hest\Documents\Courses\Regular courses\Stata course UiO\birth1.dta", clear
* Background
* birth1.dta is a (simulated) dataset of 583 mother-child pairs.
* The outcome variable i birth weight (weight, gr)
* The exposure variable is gestational age (gest, days) and mother's age (mage, years)
* Exercise 1
* Describe the variables and list some of the data
describe /* describe all variables and labels */
list id weight sex mage in 1/10 /* list the 10 first observations of the 3 variables */
* Exercise 2
* Summarize, tabulate, recode and convert to internal date format
summarize mage /* N, Mean, std dev, min and max */
tab sex /* frequency table */
tab sex, nolab
recode sex (2=1)(1=0),gen(sex0) /* recode values 2 into 1 and 1 into 0, generating new variable sex0 */
tab sex sex0 /* compare old and new */
gen gestW=gest/7 /* gestational age in weeks */
gen birth=mdy(month,day,year) /* generate a new date variable */
format birth %td /* format as date */
list day month year birth in 1/10 /* list */
* Exercise 3
* We want to summarize missing in three variables: weight sex gest
misstable summarize weight sex gest, all /* missing */
* gest4 is gestational age in 4 groups, we want to see the missing
tab gest4 sex, miss /* show missing in table, 46 missing in gest */
* Summarize mother's age for high gestational age and exclude missing cases
sum mage if gest >260 /* will include some subjects with missing values of gestational age (gest), N=545 */
sum mage if gest >260 & gest<. /* exclude subjects with missing values of gestational age (gest), N=499. 499+46=545 */
* Exercise 4
* We are interested in the effect of gestational age on birth weight
* In the scatterplot we look for outliers, linear effect and deviations from linearity
kdensity weight, scale(1.5) /* distribution of birth weight */
twoway (scatter weight gest) /* scatterplot of birth weight versus mother's age */
replace gest=. if gest>400 /* replace the extreme outlier with missing */
twoway (scatter weight gest)(lfit weight gest) ///
if gest>250, legend(off) /* scatter + line */
twoway (scatter weight gest)(fpfitci weight gest) ///
(lfit weight gest) if gest>250, legend(off) /* scatter +frac pol + line */
twoway (scatter weight gest)(fpfitci weight gest) ///
(lfit weight gest) if gest>250, title("Scatterplot") ///
ytitle("Birth weight") xtitle("Gestational age") legend(off) /* + titles */
* Exercise 5
* We are interested in the effect of motherâ€™s age on birth weight.
* We need motherâ€™s age in two categories (magegr2) and compare the weight distributions for young and old mothers.
* We look for difference in means and difference in variance (shape). First we need the coding of magegr2.
tab magegr2 /* Table with value label */
tab magegr2, nolab /* no value label, only values, to see codings */
label list /* list all value labels, useful to see codings */
tw (kdensity weight if magegr2==0) ///
(kdensity weight if magegr2==1, lcolor(red) ), legend(off) /* toway density plot */
* The densityplot of birthweight shows long tails to the left,
* may want to include only weights>2000 gr
ttest weight, by(magegr2) /* T-test comparing birth wight by 2 groups of mother's age */
ttest weight if weight>2000, by(magegr2) /* weight>2000 to have a normal distribution, 19 subjects excluded */
* Observe the large differences i p-values (due to reduction in variance in the second test ) even though we have 19 subjects less
gen highW=(weight>4500) if weight<. /* indicator for high birth weight */
tab highW gestgr2, nofreq col chi /* high birth weight by gest age(2 groups) with columns percent and chi-square test */
* Extra exercise
help tabstat /* click on "statname" to see the statistics options */
tabstat weight ,stat(N min p25 p50 p75 max) by(magegr2) /* show N, minimum, percentiles and maximum */
* ------------------------ Keep graphs on screen --------------------------------------------------------------------
* To keep graps on the screen: 1) set tabs on 2) and give a name to each graph
set autotabgraphs on /* keep graphs on separete tabs (pages), run once */
* Give a name (plot1) to a graph
scatter weight mage, name("plot1", replace)
* Give the next graph a different name (plot2)
kdensity weight, name("plot2", replace)