Computation

Data sets are usually relatively large, and we use computers to do all of our calculations. In the course we examine how to make the calculations because we need to understand what the computer is doing for us, without that we just become mindless number crunchers with no understanding of what we are doing. Most of the issues in statistics are not how to compute the numbers, but how to understand and use the output.

Below I give a shortish rundown of different programs you can use for computational methods. In Ec120B and Ec120C you will use Stata, so if you have access to that it might be useful for you to start now. Stata is an expensive program, so do not buy it. You will have access through these courses. Stata is really good at doing statistics with datasets that are common in economics, where you want to use many of the tools available from econometrics. Another expensive program is Matlab. Matlab is better for those who develop tools in econometrics. It is extremely powerful and has an excellent programming environment, and is heavily used in engineering.

In addition to these costly methods, happily there are free programs that are extremely powerful. The main trade-off in using the free programs is that you will be using packages developed for free by users as open source, but often these are not as carefully checked for accuracy as say Stata or Matlab. But they are still heavily used and because they are both in widespread use and are free you are going to have access to them once you graduate.

A free program used heavily in the social sciences is R. I do not use R but will at some point add some details about how to get started.

Another free program that is heavily used in many fields is Python. Python is extremely versatile, and can be used from anything from data analysis to webscraping to setting up and running websites and much more. For basic data analysis you only need three or so packages to do an incredible variety of work with data (described below). In many office environments, using excel is often considered a 'must have' skill, but python can do everything excel can and more (with the excel worksheet).

For full disclosure - I use Matlab and Python for all my work. Also note that for this course you do not need to do any computation, there are no computational problem sets and there are no questions on exams. But I strongly recommend that you learn how to deal with data, many of you will need to be able to work with data in the future as part of your work. So these pages are for those interested in getting going with that now.

Stata

As noted above, if you are taking Econ 120B or 120C you will be required to use Stata. Whilst it is a very good program for doing the work in those classes, the downside is that it is expensive and many of you will not have access to it after graduation. If you do have access already through another course, then this becomes an option. I would assume that you have already started using Stata if you have access, so there is no reason to explain this. All of the methods we see in this course are straightforward applications of Stata commands.

Matlab

Because matlab is expensive, only use this if you have access already. Matlab is suggested for students who are more in the engineering direction in their careers. It is heavily used in both academia and industry. The really nice thing about it is that the math you write on the page and what you program into your files looks almost exactly the same, so it is quick and easy to program correctly if you are on the math side of things rather than simply programming up what others want.

Since I am assuming anyone using Matlab will already have access, there is no need here to explain how to get going. You really do need to be comfortable with matrix algebra (proper linear algebra in general) to make good use of the program.

Python

For those of you who want to do machine learning, Python is the best place to start. It is an open source programming platform with many packages to allow everything from basic statistics to machine learning with data and text. It is also far more versatile than spreadsheets like Excel but it is simple to use Excel based data and report as Excel based spreadsheets without ever opening them yourself.

There are lots of different environments for Python. I suggest using Anaconda, along with Jupyter notebooks, as an easy entry point. To get going (installation and learn about how to work with the notebooks) rather than reinvent the wheel, I suggest you look at this page. The other pages on this site are directed at graduate students in economics, but it is well worth poking around there.

Once you have set up your environment, work through the Jupyter notebook going through the things we saw in class. As you will see all the results for the Fidelity example were done in Python. The Notebook is in CANVAS (so only for enrolled students at this point).