HINTS FOR DOING STATISTICS WITH EXCEL

 

This Web page has been constructed by Professor Graham Elliott and Alphonso Dufour.

 

1. Introduction 2. Creating a Histogram 3. Computing Summary Statistics
4. Creating a Scatterplot 5. Covariance and Correlation 6. Regression and Scatterplots
7. Computing Binomial Probabilities

 

Introduction.

There are three main different features which make statistics easy in EXCEL. These can be used in addition to just simply using the formulas from class and doing regular worksheet manipulations.

The three features are:

1. The functions key fx. Click on this (it is on one of the toolbars) and choose statistics. Many simple statistics calculators are available for use (such as mean, median etc.). Click on the one you want and fill in the appropriate information.

2. The chart wizard (it has a picture of a histogram and a wand, and is also on one of the toolbars). This is useful for making all sorts of charts (it does not do boxplots as far as I can tell).

3. The Data Analysis tool in the TOOLS menu. This does much more sophisticated statistics than the functions key, including Histograms, Data Generation from various distributions and useful hypotheses tests (which you will get to in 120B).

Note: The Data Analysis tool is what EXCEL terms as an ADD-IN. This does not show up on the TOOLS menu automatically when the standard EXCEL program is downloaded. If it is not there, go to the tools menu, and click on the line marked ADD-INS. This pops up a window that lists the add-ins available. There should be one called ANALYSIS-TOOLPAK. Click on this and it will be added to the TOOLS menu. If you have problems with this in Econ100 let myself or one of the TA’s know so that we can fix it. I have tried a bunch of machines without problem so far.

Return to top

 

Creating a Histogram.

This is most easily done using the HISTOGRAM option in the Data Analysis Tool . The data analysis tool is an option in the TOOLS menu. See the above discussion if you cannot find ‘Data Analysis’ in the TOOLS menu.

This histogram tool requires you to

a) have the data set up (usually in a column)

b) have a column indicating the bins you want to use (these are the ranges on the x axis).

The easiest way to set up the bins is to take the largest and smallest value for your data (round the numbers so that you still have some observations below the lowest value). Then divide the segment you have left so that it is cut into about 10 or 15 pieces. The more data you have the more divisions you can make and still get a decent looking histogram. You can use the equations line and fill command or simply enter the numbers in a column.

Do the above first.

From the TOOLS menu, choose the Data Analysis Command and then choose the Histogram option in the window that pops up. This pops up another widow that prompts you for information.

INPUT RANGE: In this entry you put the data range, e.g. $A$1:$A$20 if your data is in column A rows 1 to 20. You can use the mouse to click on the data and enter it automatically.

BIN RANGE: This entry asks for the column you put the bins you want the program to use. Enter as you did for the data with the range for the bins.

LABELS: This should be checked if the first row in your included data range has labels.

OUTPUT RANGE: Enter the column and row for the upper left hand point where you want the output to appear (e.g. D1 would start the output from row 1 of column D).

PARETO: Leave this blank for what we are doing.

CUMULATIVE PERCENTAGE: If this is checked you will receive the cumulative frequencies as well as the individual frequencies.

CHART OPTION: If this is checked then EXCEL will automatically draw the histogram as well as give you the frequency distribution table. The alternative (which I do) is to use the frequency table and chart wizard to draw the histogram).

Click OK when all this is done, and the frequency table will be entered at the point where you asked the program to put it in the OUTPUT RANGE option.

Return to top

 

Computing summary statistics.

You have three ways to go here.

A. You can always use the formulas we have given in class, and use basic EXCEL manipulations. An example is computing the mean for a column, simply sum the values in the column using the summation key and then use the formulas bar to divide this by the number of rows in the data. It is not a bad idea to do this to check that you understand how the formula works (there will be no computers in the exam).

B. Use the functions key statistics option. This will allow you one by one to compute means (sample averages), median, mode, standard deviations etc. Just look through the list and see what is available. Actually doing this just involves answering the questions in the window that pops up. e.g. To compute the sample average. First click on the cell in the worksheet where you want the output to appear. Now click the fx button on the toolbar. A window will pop up. Choose ‘statistical’ from the Function category (left hand side) and on the right hand side a list of functions available appears (function name list). Choose average by clicking on it. A window pops up asking for you to input titled ‘Number 1’. Enter the range for the data here. One way to do this is the following. Suppose the data is in a column. Click on the number at the top of the column and drag the mouse to the bottom of the column (without letting go of the mouse button). A broken line will appear around the data from which you want the mean, and the pointers to this range will automatically appear in the box next to ‘Number 1’. When you have chosen the range, click OK and the sample mean will appear in the place you requested earlier. This click and drag method of specifying the range works with all of these pop up windows.

C. There is a summary statistics option in the Data Analysis tool. The method is like the histogram although there is nothing to set up first. Just click on Data Analysis in the TOOLS menu, select the summary statistics line (click on it) and fill in the requested information (as in the histogram). Note that you need to check the ‘descriptive statistics’ box at the bottom of the window. Other options here include either choosing a place to put the output (place the column and row indicator in the output range box) or having it put in a new worksheet page (choose New Worksheet ply). We can ignore the ‘confidence level for the mean’, this has to do with confidence intervals that you will see in Ec120B. The Kth largest and Kth smallest options allow you to find the Kth largest and smallest value in the data.

Return to top

Creating a Scatterplot

This is most simply done using the chart wizard.

1. First, arrange the data in columns so that the data for the x axis is the first column (left hand side) and the data measured on the y axis is in the second column (to the right of the x data).

2. Click on the chart wizard and bring the little chart picture to where you want the chart to appear.

3. A window pops up asking for the data range. You need to input the address of the data here. The easiest way to do this is to click on the cell at the top of the x data, and without letting go of the mouse button drag down to the bottom of the y data. A broken line should surround the data and the address of the data appears in the pop up box. Then click next.

4. A window will pop up, with pictures of different types of charts. Choose the one with the scatterplot.

4. A new window appears with many different versions of scatterplots, choose the one you want.

5. The next window gives a sample scatterplot and allows you to indicate which rows have labels for legends (you might have inlcuded these in the first row for example).

6. The next window asks for labels and a title etc. Choose accordingly. You can type labels directly in the boxes in this window.

7. Click done. The scatterplot will appear. You can click on the table and components of the table to see what other options you have for mdking the picture look a little better (e.g. try clicking on the x axis, it pops up a window that allows you to change the tick marks, upper and lower limits etc).

Return to top

Covariance and Correlation.

The first way to do this is via the functions key. The statistics option allows you to compute the covariance using the option COVAR and the correlation using the option CORREL. The method is the same for both.

1. Click on the cell where you want the output to appear.

2. Click on the fx button on the toolbar. A window will pop up. Choose ‘statistical’ from the Function category (left hand side) and on the right hand side a list of functions available appears (function name list). Choose either COVAR or CORREL by clicking on it.

3. Places for the input of the ranges of two data series appear (array 1 and array 2). Click on the first box so the cursor flashes in this box and use the ‘click and hold’ method on the data (you will see the address of the data appear in the box on the pop up window). Click on the box next to ‘array 2’ enter the data address here (again, you can use the ‘click and hold’ strategy again by clicking on the top of the column where the data is and hold down the mouse button, dragging the broken box that appears around the data to the bottom of the data column. As you do this the address for the data appears in the pop up window).

4. Click ‘finish’ and the covariance or correlation appears in the cell to chose in step 1.

We can also get these from the ‘data analysis’ option in the TOOLS menu. In this case the two columns (or rows of data) must be side by side in the worksheet.

1. Click on the ‘Data Analysis’ option in the TOOLS menu. A window pops up with the options ‘covariance’ and ‘correlation’.

2. Both covariance and correlation work the same way. Click on the one you want. A window pops up.

3. The first part asks for the data, this wants both columns of data (this is why they need to be side by side). Also, tell it if the data is in columns or not.

4. The second part asks where to put the ouput. You can give it a cell reference or let it put the output on another worksheet.

5. Click on OK and the output appears.

The output looks like the following:

For Covariance

Column 1

Column 2

Column 1

7843.536

Column 2

979.3571

137.9592

The covariance here is 979.3571 (the offdiagonal term). The other numbers are the variances of the data in column 1 and column 2 (the column 1 variance is in the cell where the row and column are denoted column 1). If you use labels for the data you will have the names of the data rather than the headings ‘Column 1’ etc.

For Correlation

Column 1

Column 2

Column 1

1

Column 2

0.941477

1

The correlation here is again the offdiagonal term, which here is 0.941477.

Return to top

Regression and Scatterplots with Regression Lines.

This is even simpler, as there is an option in the Data Analysis menu that does regression and automatically computes a scatterplot and fitted line if you ask it to.

1. Click on Data Analysis in the TOOLS menu.

2. Choose the ‘Regression’ option from the window that pops up.

3. Fill in the range for the x and y data. To do this, click on the box next to where it askes for the y variable so that the cursor flashes. Then use the click and hold method for selecting the y data. Do the same for the x data. You can ignore the confidence level question for now.

4. Choose the output option you want. Also, under residuals, you can choose to print out the di’s (click on the box next to the option ‘residuals’), and also a scatterplot with the line through it (click on the box next to the option ‘Line Fit Plots’). You can ignore the other options for now.

5. Click OK and the output will be printed where you wanted it.

We will focus only on the scatterplot and the line through it, you can ignore the other information except for the slope and intercept (b and a from the line), which are presented in a column called ‘COEFFICIENTS’ and a is the intercept and b is reported in the row labelled ‘X variable’ or if you have labels by the label for the X data.

Coefficients

Standard Error

t Stat

Intercept

-41.4304

7.412217

-5.58948

X Variable

0.124862

0.012905

9.675471

The above is part of the output you will see (you will get lots more columns, and learn what they mean in ec120B). The regression line is y = a + bx where a=-41.4304 and b=0.124862 (this data is for the Manatee example and X is the number of power boat registrations in 1000’s and Y is the number of manatee deaths).

The scatterplot usually needs a bit of work to make it look halfway decent. Try clicking and dragging on the corners to make it the size you want. Click on the ‘fitted y’ data in the plot to get options here (like making a smoothed line). Use the help command to learn what you can do.

Return to top

 

Computing Binomial Probabilities

The function wizard key (fx) has an option BINOMDIST which computes the probabilities P(S=s|n) for you. The steps are below, with the entries for the example we did in class,

P(S=0|10) from a Bin(0.001,10) distribution.

1. click on function wizard key

2. click on statistics

3. click on BINOMDIST

4. It asks for four things in order -

number_s = number of successes (here it is 0)

trials = number of trials (here it is 10)

probability = probability of a success, i.e. p (here 0.001)

cumulative = cumulative or not (here write 0 for probability or 1 for cumulative probability).

5. The probability is displayed in the top right hand corner of the dialog box and in a cell after you hit finish. It is also entered into the cell in the worksheet you are currently in.

Return to top