| |
|
Lesson Plan: Dealing
With Missing Data
Subject: Technology/Computer Science/Mathematics/Environmental
Science
Classtime: 100 minutes
Grade Level: 8-12
Materials/Technology:
- Students are expected to have basic computer skills and to be able
to follow directions.
- Students will need an Internet accessible computer with Excel spreadsheet
software loaded on the hard drive. The REAL version is required (not
the "light" version), otherwise, results will not be consistent
with this lesson.
- Students need to have a calculator.
- It is preferred, but not necessary, that computers have a mouse with
both right and left click buttons.
Safety, Handling, Disposal:
Internet access should be monitored closely to prevent viewing of inappropriate
sites.
Learner Outcomes:
Students should be able to:
- Log onto, and collect data from, a specific site on the Internet.
- Paste collected data in a spreadsheet.
- Parse collected data in a spreadsheet.
- Mathematically replace missing raw data.
Problem/Purpose:
Getting data from the Internet, parsing the data in a spreadsheet, and
calculating the missing values are important tasks. This lesson covers
the correct procedure for collecting data from a specific Internet site,
pasting the data in a spreadsheet program, parsing the data collected,
and filling in missing data.
Background/Inquiry:
In the scientific community the use of spreadsheets is critical. Scientists
use spreadsheets to average, graph, compare, and analyze data to determine
relationships not easily recognized by studying a string of numbers.
The Internet has a tremendous amount of government data available for
use by anyone with a computer, modem, and Internet access. This data must
be downloaded and made useable by the individual interested in the data.
"Excel", a Microsoft spreadsheet program, provides an excellent
tool for doing this.
Once the data has been downloaded it must be separated into columns so
the spreadsheet program can accurately use the data. This is known as
"parsing".
Sometimes data is incomplete for a variety of reasons (equipment failure,
human error, etc.). This makes it necessary to manipulate the data and
"fill in" the missing pieces in a mathematically appropriate
manner. After the missing pieces have been filled in, it becomes possible
to accurately use the data.
This lesson is site specific but the process is transferable.
The Internet site http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html
will provide the data for this exercise. The following Important Notes
come from this site, and describe the arrangement of data within the site:
Important Notes
This system contains historical monthly precipitation
data for all U.S. cooperative and National Weather Service stations.
The periods of record vary by station, with some stations having data
back to about 1900, although most begin around 1948. The data files
are complete through 1997. The directory has a data file for each state,
along with one file for several stations outside the U.S., such as Guam
and Puerto Rico (filename 'others.txt'). There are currently over 8000
NWS and cooperative weather stations active and included in this directory,
though most of the entries are for inactive stations.
The files consist entirely of ASCII character data. A header record
begins each file with the station number, station name, state abbreviation,
year, 12 months (Jan - Dec), and 'annual' appearing from left to right.
The monthly and annual precipitation amounts are in inches and hundredths
(e.g., 1122 = 11.22 inches, 522 = 5.22 inches, 55 = .55 inches). A value
of '9999' indicates missing or incomplete data for that month, with
'99999' indicating missing for the yearly total. The monthly total is
only presented when all data has been recorded and validated for that
month. Likewise, the annual total is only presented when every monthly
total for the year is available. This information is taken from NCDC
digital database TD3220, which also contains additional elements such
as temperature and snowfall. These additional elements are available
off line, on tape or diskette. (http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html#NOTES)
Vocabulary List: Here.
Hypothesis: NA
Procedure:
-
Get Data Access the following website:
http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html.
Now click on the tab labeled: Surface. Scroll down and click US Precipitation
Data by State. Select the state you are interested in. Allow the data
to collect for a minute or two, then click on Stop at the toolbar
at the top of the screen. (There is a tremendous amount of data on
these sites and the teacher should be advised that the browser’s memory
could be surpassed.) Scroll through the data until you come to a location
of interest. Highlight all the data for this location by holding down
the left mouse button, and dragging the cursor down the screen. When
all the data has been highlighted, click on the right mouse button
then click on Copy. If the mouse does not have right/left buttons,
highlight the information in the same manner. On the menu bar at the
top of the screen select Copy, or use the keystroke <command c>.
- Parsing the Data. Open the spreadsheet Excel. Paste the data obtained
from the website by clicking on the right mouse button and clicking
on Paste. If there are not right/left mouse buttons, at the top of
the computer screen select Paste, or use the keystroke <command
v>. Highlight the data in column A by clicking and dragging in
to the bottom of the column (this might be already done if all of
column A is highlighted). Click Data at the top in the menu bar of
the spreadsheet. Click Text to Columns. Click Fixed Width. Click Next.
Click Next again. Finally, click Finish. The data now should be parsed.
(Note: if you are unable to carry out these steps on your spreadsheet,
most likely you do not have a full version of the software and will
need to purchase it in order to satisfactorily complete this activity).
- Cleaning Up the Data: There are items that you do not need in the
spreadsheet at this time. It is wise to eliminate them to avoid confusion.
Also the data probably won’t have labels at the top of the column.
To make labels, click on cell A1. Go to menu bar and select Insert.
Choose Row. This gives a place for labels at the top of each column.
Next, highlight the entire column containing the station number. Choose
Edit from the menu bar, then choose Delete. A box will appear, select
Entire Column. This eliminates the station number in each cell for
the entire column. Note: if you make a mistake, immediately go to
Edit and choose Undo Delete.
Continue deleting columns until you have these items left: a column
of years, a column for each month of the year (starts with Jan.) and
a column for the total for the year. Label each cell above the column
with the appropriate identifier. Click on the cell, type the label,
then press Return. Example: cell A1 is labeled "year". See
"Important Note" above for clarification of how data is
listed.
-
Filling in the Pieces- There might be several places
displaying the number 9999. This means that the data is missing
(solving this problem is one of the focuses of this activity). You
will need to fill in the missing pieces of this puzzle using a technique
called interpolation. This is done in one of two ways. The first
way is to pick three other years that don’t have any data missing
for the month in question (9999). Look at these three years and
determine if the amounts for each month from all three sites are
within 10% of each other. To do this, divide the middle-most number
by 10. Then add and subtract this number to the original middle
number. If the all three years are within this range then they are
within 10%. The missing number can be determined by calculating
the average of the three years and plugging it in for the missing
piece.
If the months are not within 10%, use this technique:
Use the total precipitation listed on the spreadsheet (last column)
for each of the three "good" years. The year we are unsure
of will be referred to as year 1. Sum the precipitation for year
1 using a calculator (do NOT use the entries labeled 9999!). Take
the sum of year 1 divided by the sum of year 2. Then take the sum
of year 1 divided by the sum of year 3. Then take the sum of year
1 divided by the sum of year 4. There are now three ratios. Multiply
each ratio by the amount of precipitation for the month. Example:
precipitation for the specific month of year 2 times the number
you got from: (sum of year 1 divided by the sum of year 2). You
should now have three numbers for precipitation. Average these three
numbers together and place the amount in the missing month. Do this
for all missing pieces of information.
Results/Analysis:
The results for the missing data will be site specific and need to be
calculated by the instructor before attempting this lesson.
Printed spreadsheet with missing data accurately corrected.
Conclusions:
Example: 'If an accurate technique is not used in determining the missing
data, the data can be less than reliable.'
Assessments:
- Determine if students correctly located the URL. (Assess this
while student is actively online.)
- Determine if students accurately collected data. (Specific data
is linked to specific locations, teacher is expected to collect
data before students attempt to do so.)
- Have students print out the spreadsheet. If completed successfully,
all data will be in proper columns with appropriate labels.
- Determine if students accurately filled in missing pieces of
data. (Teacher follows steps outlined above for specific data,
and then compares answer to student values.)
- How many inches of precipitation does the number 447 represent?
- How are the numbers 9999 and 99999 similar? How are they different?
- What does the phrase "parsing the data" mean?
- What is the simplest means of accurately replacing missing data?
How do you tell if using this method is appropriate?
- Give a reason for replacing missing data.
- Give a reason why data might be missing.
Integration: Computer skills, mathematics
(averages, comparisons, percentages).
Reflections: Share your thoughts on this
lesson - email the NTEN
team.
Extensions:
- Have students collect the precipitation for pre-determined months
and locations. Compare data of similar months to that of other sites
of interest to the student. This could be done over the course of a
year, enabling the student to get a good indication of how precipitation
near their specific town relates to other locations of interest.
- Students could then compare the amount of precipitation to other
factors, such as type of agriculture, economics, or specific social
customs that depend on, or are influenced by amounts of precipitation.
References/Resources:
URLs used:
http://www.ncdc.noaa.gov/ol/climate/online/coop-precip.html
ESCI 517, Electronic Hydrology Course Materials
http://www.student.montana.edu/~uessc/esci517/syllabus.html
week one, and week four

Credits
Contributing Teacher: Bryan E. Mastin,
Houghton Academy
NTEN Course: ESCI 517 Electronic Hydrology
Instructor: Steve Custer
Assistant Instructor: T.L. Buck Buchanan
Developing Team:
T.L. Buck Buchanan, Patti Harrison, Don Samuelson, John Usher, Don Wilson
Original HTML: Luke Clemens, Ryan Huddleston
Copyright © 1998 - 2002, National Teachers
Enhancement Network
Comments: pattih@montana.edu
Phone: 1-800-282-6062
Last Updated: 30 March 2000
at the Burns Telecommunications
Center
on the campus of Montana State University-Bozeman


Up to Top
|
|
|
 |