** Commands we have used so far
* summarize all the variables in the dataset
sum
* sorting data ascending or descending order and show the top 3 or bottom 3
gsort cars
list in 1/3
gsort -cars
list in 1/3
* generate a new variable
* too crowded is 1 if there's more than 400 cars in the highway
* it's 0 otherwise
gen toocrowded = (cars>400)
* this creates a squared version of cars
gen cars2 = cars^2
* histogram
hist traveltime
* tabulate data by groups, for example by highways. Useful for strings.
tab highway
tab highway if highway !="SqHill"
* look at correlation between two variables
pwcorr cars traveltime
* regress traveltime against cars
*(fit a line through it and give us the intercept and slope)
reg traveltime cars
reg traveltime cars if highway =="SqHill"
reg traveltime cars if highway =="SqHill" & cars>400
* make a scatter plot where traveltime is in the y axis and cars is in the x axis
scatter traveltime cars
scatter traveltime cars if toocrowded
scatter traveltime cars if !toocrowded
* compare two scatterplots
twoway (scatter traveltime cars if toocrowded) || (scatter traveltime cars if !toocrowded)
twoway (scatter traveltime cars if highway=="SqHill") || (scatter traveltime cars if highway=="Clarion")