The NTEN Home Page The NTEN Home Page About the National Teachers Enhancement Network Read Answers to Frequently Asked Questions Log In To WebCT Resources for Effective Online Teaching
Testimonials By Our Students Tour A Sample Online Class
Go to the NTEN Science Resources Our Online Course Offerings Kit-Based Learning for Elementary School Teachers
   

Lesson Plan: Dealing With Missing Data
Subject: Technology/Computer Science/Mathematics/Environmental Science
Classtime: 100 minutes
Grade Level: 8-12

Materials/Technology:

  • Students are expected to have basic computer skills and to be able to follow directions.
  • Students will need an Internet accessible computer with Excel spreadsheet software loaded on the hard drive. The REAL version is required (not the "light" version), otherwise, results will not be consistent with this lesson.
  • Students need to have a calculator.
  • It is preferred, but not necessary, that computers have a mouse with both right and left click buttons.

Safety, Handling, Disposal:
Internet access should be monitored closely to prevent viewing of inappropriate sites.

Learner Outcomes:
Students should be able to:

  • Log onto, and collect data from, a specific site on the Internet.
  • Paste collected data in a spreadsheet.
  • Parse collected data in a spreadsheet.
  • Mathematically replace missing raw data.

Problem/Purpose:
Getting data from the Internet, parsing the data in a spreadsheet, and calculating the missing values are important tasks. This lesson covers the correct procedure for collecting data from a specific Internet site, pasting the data in a spreadsheet program, parsing the data collected, and filling in missing data.

Background/Inquiry:
In the scientific community the use of spreadsheets is critical. Scientists use spreadsheets to average, graph, compare, and analyze data to determine relationships not easily recognized by studying a string of numbers.

The Internet has a tremendous amount of government data available for use by anyone with a computer, modem, and Internet access. This data must be downloaded and made useable by the individual interested in the data. "Excel", a Microsoft spreadsheet program, provides an excellent tool for doing this.

Once the data has been downloaded it must be separated into columns so the spreadsheet program can accurately use the data. This is known as "parsing".

Sometimes data is incomplete for a variety of reasons (equipment failure, human error, etc.). This makes it necessary to manipulate the data and "fill in" the missing pieces in a mathematically appropriate manner. After the missing pieces have been filled in, it becomes possible to accurately use the data.

This lesson is site specific but the process is transferable.

The Internet site http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html will provide the data for this exercise. The following Important Notes come from this site, and describe the arrangement of data within the site:

Important Notes
This system contains historical monthly precipitation data for all U.S. cooperative and National Weather Service stations. The periods of record vary by station, with some stations having data back to about 1900, although most begin around 1948. The data files are complete through 1997. The directory has a data file for each state, along with one file for several stations outside the U.S., such as Guam and Puerto Rico (filename 'others.txt'). There are currently over 8000 NWS and cooperative weather stations active and included in this directory, though most of the entries are for inactive stations.

The files consist entirely of ASCII character data. A header record begins each file with the station number, station name, state abbreviation, year, 12 months (Jan - Dec), and 'annual' appearing from left to right. The monthly and annual precipitation amounts are in inches and hundredths (e.g., 1122 = 11.22 inches, 522 = 5.22 inches, 55 = .55 inches). A value of '9999' indicates missing or incomplete data for that month, with '99999' indicating missing for the yearly total. The monthly total is only presented when all data has been recorded and validated for that month. Likewise, the annual total is only presented when every monthly total for the year is available. This information is taken from NCDC digital database TD3220, which also contains additional elements such as temperature and snowfall. These additional elements are available off line, on tape or diskette. (http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html#NOTES)

Vocabulary List: Here.

Hypothesis: NA

Procedure:

  1. Get Data ­Access the following website:
    http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html.
    Now click on the tab labeled: Surface. Scroll down and click US Precipitation Data by State. Select the state you are interested in. Allow the data to collect for a minute or two, then click on Stop at the toolbar at the top of the screen. (There is a tremendous amount of data on these sites and the teacher should be advised that the browser’s memory could be surpassed.) Scroll through the data until you come to a location of interest. Highlight all the data for this location by holding down the left mouse button, and dragging the cursor down the screen. When all the data has been highlighted, click on the right mouse button then click on Copy. If the mouse does not have right/left buttons, highlight the information in the same manner. On the menu bar at the top of the screen select Copy, or use the keystroke <command c>.

  2. Parsing the Data. Open the spreadsheet Excel. Paste the data obtained from the website by clicking on the right mouse button and clicking on Paste. If there are not right/left mouse buttons, at the top of the computer screen select Paste, or use the keystroke <command v>. Highlight the data in column A by clicking and dragging in to the bottom of the column (this might be already done if all of column A is highlighted). Click Data at the top in the menu bar of the spreadsheet. Click Text to Columns. Click Fixed Width. Click Next. Click Next again. Finally, click Finish. The data now should be parsed. (Note: if you are unable to carry out these steps on your spreadsheet, most likely you do not have a full version of the software and will need to purchase it in order to satisfactorily complete this activity).

  3. Cleaning Up the Data: There are items that you do not need in the spreadsheet at this time. It is wise to eliminate them to avoid confusion. Also the data probably won’t have labels at the top of the column. To make labels, click on cell A1. Go to menu bar and select Insert. Choose Row. This gives a place for labels at the top of each column. Next, highlight the entire column containing the station number. Choose Edit from the menu bar, then choose Delete. A box will appear, select Entire Column. This eliminates the station number in each cell for the entire column. Note: if you make a mistake, immediately go to Edit and choose Undo Delete.

    Continue deleting columns until you have these items left: a column of years, a column for each month of the year (starts with Jan.) and a column for the total for the year. Label each cell above the column with the appropriate identifier. Click on the cell, type the label, then press Return. Example: cell A1 is labeled "year". See "Important Note" above for clarification of how data is listed.

  4. Filling in the Pieces- There might be several places displaying the number 9999. This means that the data is missing (solving this problem is one of the focuses of this activity). You will need to fill in the missing pieces of this puzzle using a technique called interpolation. This is done in one of two ways. The first way is to pick three other years that don’t have any data missing for the month in question (9999). Look at these three years and determine if the amounts for each month from all three sites are within 10% of each other. To do this, divide the middle-most number by 10. Then add and subtract this number to the original middle number. If the all three years are within this range then they are within 10%. The missing number can be determined by calculating the average of the three years and plugging it in for the missing piece.

    If the months are not within 10%, use this technique: Use the total precipitation listed on the spreadsheet (last column) for each of the three "good" years. The year we are unsure of will be referred to as year 1. Sum the precipitation for year 1 using a calculator (do NOT use the entries labeled 9999!). Take the sum of year 1 divided by the sum of year 2. Then take the sum of year 1 divided by the sum of year 3. Then take the sum of year 1 divided by the sum of year 4. There are now three ratios. Multiply each ratio by the amount of precipitation for the month. Example: precipitation for the specific month of year 2 times the number you got from: (sum of year 1 divided by the sum of year 2). You should now have three numbers for precipitation. Average these three numbers together and place the amount in the missing month. Do this for all missing pieces of information.

Results/Analysis:
The results for the missing data will be site specific and need to be calculated by the instructor before attempting this lesson.

Printed spreadsheet with missing data accurately corrected.

Conclusions:
Example: 'If an accurate technique is not used in determining the missing data, the data can be less than reliable.'

Assessments:

  • Determine if students correctly located the URL. (Assess this while student is actively online.)
  • Determine if students accurately collected data. (Specific data is linked to specific locations, teacher is expected to collect data before students attempt to do so.)
  • Have students print out the spreadsheet. If completed successfully, all data will be in proper columns with appropriate labels.
  • Determine if students accurately filled in missing pieces of data. (Teacher follows steps outlined above for specific data, and then compares answer to student values.)
  • How many inches of precipitation does the number 447 represent?
  • How are the numbers 9999 and 99999 similar? How are they different?
  • What does the phrase "parsing the data" mean?
  • What is the simplest means of accurately replacing missing data? How do you tell if using this method is appropriate?
  • Give a reason for replacing missing data.
  • Give a reason why data might be missing.

Integration: Computer skills, mathematics (averages, comparisons, percentages).

Reflections: Share your thoughts on this lesson - email the NTEN team.

Extensions:

  • Have students collect the precipitation for pre-determined months and locations. Compare data of similar months to that of other sites of interest to the student. This could be done over the course of a year, enabling the student to get a good indication of how precipitation near their specific town relates to other locations of interest.
  • Students could then compare the amount of precipitation to other factors, such as type of agriculture, economics, or specific social customs that depend on, or are influenced by amounts of precipitation.

References/Resources:
URLs used:

http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html
ESCI 517, Electronic Hydrology Course Materials

http://www.student.montana.edu/~uessc/esci517/syllabus.html
week one, and week four

Credits
Contributing Teacher: Bryan E. Mastin, Houghton Academy
NTEN Course: ESCI 517 Electronic Hydrology
Instructor: Steve Custer
Assistant Instructor: T.L. Buck Buchanan
Developing Team:
T.L. Buck Buchanan, Patti Harrison, Don Samuelson, John Usher, Don Wilson
Original HTML: Luke Clemens, Ryan Huddleston

Up to Top

   
www.scienceteacher.org