Jump to main content or area navigation.

Contact Us

Water: Coastal Zone Act Reauthorization Amendments

II. Techniques for Assessing Water Quality and for Estimating Pollution Loads

Water quality monitoring is the most direct and defensible tool available to evaluate water quality and its response to management and other factors (Coffey and Smolen, 1990). This section describes monitoring methods that can be used to measure changes in pollutant loads and water quality. Due to the wide range of monitoring needs and environmental conditions throughout the coastal zone it is not possible to specify detailed monitoring plans that apply to all areas within the zone. The information in this section is intended merely to guide the development of monitoring efforts at the State and local levels.

This section begins with a brief discussion of the scope and nature of nonpoint source problems, followed by a discussion of monitoring objectives as they relate to section 6217. A lengthy discussion of monitoring approaches is next, with a focus on understanding the watershed to be studied, appropriate experimental designs, sample size and frequency, site locations, parameter selection, sampling methods, and quality assurance and quality control. The intent of this discussion is to provide the reader with basic information essential to the development of effective, tailored monitoring programs that will provide the necessary data for use in statistical tests that are appropriate for evaluating the success of management measures in reducing pollutant loads and improving water quality.

After a brief discussion of data needs, an overview of statistical considerations is presented. Variability and uncertainty are described first, followed by a lengthy overview of sampling and sampling designs. This discussion is at a greater level of detail than others in the section to emphasize the importance of adequate sampling within the framework of a sound experimental design. Hypothesis testing is described next, including some examples of hypotheses that may be appropriate for section 6217 monitoring efforts. An overview of data analysis techniques is given at the end of the section.

A. Nature and Scope of Nonpoint Source Problems

Nonpoint sources may generate both conventional and toxic pollutants, just as point sources do. Although nonpoint sources may contribute many of the same kinds of pollutants, these pollutants are generated in different volumes, combinations, and concentrations. Pollutants from nonpoint sources are mobilized primarily during storm events or snowmelt, but baseflow contributions can be the major source of nonpoint source contaminants in some systems. Thus, knowledge of the hydrology of a system is critical to the design of successful monitoring programs.

Nonpoint source problems are not just reflected in the chemistry of a water resource. Instead, nonpoint source problems are often more acutely manifested in the biology and habitat of the aquatic system. Such impacts include the destruction of spawning areas, impairments to the habitat for shellfish, changes to aquatic community structure, and fish mortality. Thus, any given nonpoint source monitoring program may have to include a combination of chemical, physical, and biological components to be effective.

B. Monitoring Objectives

Monitoring is usually performed in support of larger efforts such as nonpoint source pollution control programs within coastal watersheds. As such, monitoring objectives are generally established in a way that contributes toward achieving the broader program objectives. For example, program objectives may include restoring an impaired use or protecting or improving the ecological condition of a water resource. Supporting monitoring objectives, then, might include assessing trends in use support or in key biological parameters.

The following discussion identifies the overall monitoring objectives of section 6217 and gives some examples of specific objectives that may be developed at the State or local level in support of those overall objectives. Clearly, due to the prohibitive expense of monitoring the effectiveness of every management measure applied in the coastal zone, States will need to develop a strategy for using limited monitoring information to address the broad questions regarding the effectiveness of section 6217 implementation. A combination of watershed monitoring to track the cumulative benefits of systems of management measures and demonstrations of selected management measures of key importance in the State may be one way in which the overall section 6217 monitoring objectives can be met within the constraints imposed by limited State monitoring budgets.

1. Section 6217 Objectives

The overall management objective of section 6217 is to develop and implement management measures for nonpoint source pollution to restore and protect coastal waters. The principal monitoring objective under section 6217(g) is to assess over time the success of the management measures in reducing pollution loads and improving water quality. A careful reading of this monitoring objective reveals that there are two subobjectives: (1) to assess changes in pollution loads over time and (2) to assess changes in water quality over time.

A pollutant load is determined by multiplying the total runoff volume times the average concentration of the pollutant in the runoff. Loads are typically estimated only for chemical and some physical (e.g., total suspended solids) parameters. Water quality, however, is determined on the basis of the chemical, physical, and biological conditions of the water resource. Section 6217(g), therefore, calls for a description of pollutant load estimation techniques for chemical and physical parameters, plus a description of techniques to assess water quality on the basis of chemical, physical, and biological conditions. This section focuses on those needs.

2. Formulating Monitoring Objectives

A monitoring objective should be narrowly and clearly defined to address a specific problem at an appropriate level of detail (Coffey and Smolen, 1990). Ideally, the monitoring objective specifies the primary parameter(s), location of monitoring (and perhaps the timing), the degree of causality or other relationship, and the anticipated result of the management action. The magnitude of the change may also be expressed in the objective. Example monitoring objectives include:

  • To determine the change in trends in the total nitrogen concentration in Beautiful Sound due to the implementation of nutrient management on cropland in all tributary watersheds.
  • To determine the sediment removal efficiency of an urban detention basin in New City.
  • To evaluate the effects of improved marina management on metals loadings from the repair and maintenance areas of Stellar Marina.
  • To assess the change in weekly mean total suspended solids concentrations due to forestry harvest activities in Clean River.

C. Monitoring Approaches

1. General

a. Types of Monitoring

The monitoring program design is the framework for sampling, data analysis, and the interpretation of results (Coffey and Smolen, 1990). MacDonald (1991) identifies seven types of monitoring:

  1. Trend monitoring;
  2. Baseline monitoring;
  3. Implementation monitoring;
  4. Effectiveness monitoring;
  5. Project monitoring;
  6. Validation monitoring; and
  7. Compliance monitoring.

Trend, baseline, implementation, effectiveness, and project monitoring all relate to the monitoring objectives of section 6217. These types of monitoring, in fact, are not mutually exclusive. The distinction between effectiveness monitoring and project monitoring, for example, is often simply one of scale, with effectiveness monitoring primarily directed at individual practices and project monitoring directed at entire sets of practices or activities implemented over a larger area. Since one cannot evaluate the effectiveness of a project or management measure (i.e., achievement of the desired effect) without knowing the status of implementation, implementation monitoring is an essential element of both project and effectiveness monitoring. In addition, a test for trend is typically included in the evaluation of projects and management measures, and baseline monitoring is performed prior to the implementation of pollution controls.

Meals (1991a) discussed five major points to consider in developing a monitoring system that would provide a suitable data base for watershed trend detection: (1) understand the system you want to monitor, (2) design the monitoring system to meet objectives, (3) pay attention to details at the beginning, (4) monitor source activities, and (5) build in feedback loops. These five points apply equally to both load estimation and water quality assessment monitoring efforts.

b. Section 6217 Monitoring Needs

The basic monitoring objective for section 6217 is to assess over time the success of the measures in reducing pollution loads and improving water quality. This objective would seem to indicate a need for establishing cause-effect relationships between management measure implementation and water quality. Although desirable, monitoring to establish such cause-effect relationships is typically beyond the scope of affordable program monitoring activities.

Mosteller and Tukey (1977) identified four criteria that must be met to show cause and effect: association, consistency, responsiveness, and a mechanism.

  • Association is shown by demonstrating a relationship between two parameters (e.g., a correlation between the extent of management measure implementation and the level of pollutant loading).
  • Consistency can be confirmed by observation only and implies that the association holds in different populations (e.g., management measures were implemented in several areas and pollutant loading was reduced, depending on the effect of treatment, in each case).
  • Responsiveness can be confirmed by an experiment and is shown when the dependent variable (e.g., pollutant loading) changes predictably in response to changes in the independent variable (e.g., extent of management measure implementation).
  • mechanism is a plausible step-by-step explanation of the statistical relationship. For example, conservation tillage reduced the edge-of-field losses of sediment, thereby removing a known fraction of pollutant source from the stream or lake. The result was decreased suspended sediment concentration in the water column.

Clearly, the cost of monitoring needed to establish cause-effect relationships throughout the coastal zone far exceeds available resources. It may be suitable, however, to document associations between management measure implementation and trends in pollutant loads or water quality and then account for such associations with a general description of the primary mechanisms that are believed to come into play.

c. Scale, Local Conditions, and Variability

There are several approaches that can be taken to assess the effectiveness of measures in reducing loads and improving water quality. There are also several levels of scale that could be selected: individual practices, individual measures, field scale, watershed scale, basin scale, regional scale, etc. With any given monitoring objective, the specific monitoring approach to use at any specific site is a function of the local conditions (e.g., geography, climate, water resource type) and the type of management measures implemented.

The detection and estimation of trends is complicated by problems associated with the characteristics of pollution data (Gilbert, 1987). Physical, chemical, and biological parameters in the receiving water may undergo extreme changes without the influence of human activity. Understanding and monitoring the factors responsible for variability in a local system are essential for detecting the improvements expected from the implementation of management measures.

Simple point estimates taken before and after treatment will not confirm an effect if the natural variability is typically greater than the changes due to treatment (Coffey and Smolen, 1990). Therefore, knowledge of the variability and the distribution of the parameter is important for statistical testing. Greater variability requires a larger change to imply that the observed change is not due solely to random events (Spooner et al., 1987b). Examination of a historical data set can help to identify the magnitude of natural variability and possible sources.

The impact of management actions may not be detectable as a change in a mean value but rather as a change in variability (Coffey and Smolen, 1990). Platts and Nelson (1988) found that a carefully designed study was required to isolate the large natural fluctuations in trout populations to distinguish the effects of land use management. They assumed that normal fluctuation patterns were similar between the control and the treatment area and that treatment-induced effect could be distinguished as a deviation from the historical pattern.

Meals (1991a) calls for the collection and evaluation of existing data as the first step in a monitoring effort, recognizing that additional background data may be needed to identify hot spots or fill information gaps. The results of such initial efforts should include established stage-discharge ratings and an understanding of patterns not associated with the pollution control effort.

2. Understanding the System to Be Monitored

a. The Water Resource

Options for tracking water quality vary with the type of water resource. For example, a monitoring program for ephemeral streams can be different from that for perennial streams or large rivers. Lakes, wetlands, riparian zones, estuaries, and near-shore coastal waters all present different monitoring considerations. Whereas upstream-downstream designs work on rivers and streams, they are generally less effective on natural lakes where linear flow is not so prevalent. Likewise, estuaries present difficulties in monitoring loads because of the shifting flows and changing salinity caused by the tides. A successful monitoring program recognizes the unique features of the water resources involved and is structured to either adapt to those features or avoid them.

Streams. Freshwater streams can be classified on the basis of flow attributes as intermittent or perennial streams. Intermittent streams do not flow at all times and serve as conveyance systems for runoff. Perennial streams always flow and usually have significant inputs from ground water or interflow.

For intermittent streams, seasonal variability is a very significant factor in determining pollutant loads and water quality. During some periods sampling may be impossible due to no flow. Seasonal flow variability in perennial streams can be caused by seasonal patterns in precipitation or snowmelt, reservoir discharges, or irrigation practices.

For many streams the greatest concentrations of suspended sediment and other pollutants occur during spring runoff or snowmelt periods. Concentrations of both particulate and soluble chemical parameters have been shown to vary throughout the course of a rainfall event in many studies across the Nation. This short-term variability should be considered in developing monitoring programs for flowing (lotic) waterbodies.

Spatial variability is largely lateral for both intermittent and perennial streams. Vertical variability does exist, however, and can be very important in both stream types (e.g., during runoff events, in tidal waters, and in deep, slow-moving streams). Intake depth is often a key factor in stream sampling. For example, slow-moving, larger streams may show considerable water quality variability with depth, particularly for parameters such as suspended solids, dissolved oxygen, and algal productivity. Suspended sediment samples must be taken with an understanding of the vertical distribution of both sediment concentration and flow velocity (Brakensiek et al., 1979). When sampling bed sediment or monitoring biological parameters, it is important to recognize the potential for significant lateral and vertical variation in the toxicity and contaminant levels of bed sediments (USEPA, 1987).

Lakes. Lakes can be categorized in several ways, but a useful grouping for monitoring guidance is related to the extent of vertical and lateral mixing of the waterbody. Therefore, lakes are considered to be either mixed or stratified for the purpose of this guidance. Mixed lakes are those lakes in which water quality (as determined by measurement of the parameters and attributes of interest) is homogenous throughout, and stratified lakes are considered to be those lakes which have lateral or vertical water quality differentials in the lake parameters and attributes of interest. Totally mixed lakes, if they exist, are certainly few in number, but it may be useful to perform monitoring in selected homogenous portions of stratified lakes to simplify data interpretation. Similarly, for lakes that exhibit significant seasonal mixing, it may be beneficial to monitor during a time period in which they are mixed. For some monitoring objectives, however, it may be best to monitor during periods of peak stratification.

Temporal variability concerns are similar for mixed and stratified lakes. Seasonal changes are often obvious, but should not be assumed to be similar for all lakes or even the same for different parts of any individual lake. Due to the importance of factors such as precipitation characteristics, climate, lake basin morphology, and hydraulic retention characteristics, seasonal variability should be at least qualitatively assessed before any lake monitoring program is initiated.

Short-term variability is also an inherent characteristic of most still (lentic) waterbodies. Parameters such as pH, dissolved oxygen, and temperature can vary considerably over the course of a day. Monitoring programs targeted toward biological parameters should be structured to account for this short-term variability. It is often the case that small lakes and reservoirs respond rapidly to runoff events. This factor can be very important in cases where lake water quality will be correlated to land treatment activities or stream water quality.

In stratified lakes spatial variability can be lateral or vertical. The classic stratified lake is one in which there is an epilimnion and a hypolimnion (Wetzel, 1975). Water quality can vary considerably between the two strata, so sampling depth is an important consideration when monitoring vertically stratified lakes.

Lateral variability is probably as common as vertical variability, particularly in lakes and ponds receiving inflow of varying quality. Figure 8-1 illustrates the types of factors that contribute to lateral variability in lake water quality. In reservoir systems, storm plumes can cause significant lateral variability.

Davenport and Kelly (1984) explained the lateral variability in chlorophyll a concentrations in an Illinois lake based on water depth and the time period that phytoplankters spend in the photic zone. A horizontal gradient of sediment, nutrient, and chlorophyll a concentrations in St. Albans Bay, Vermont, was related to mixing between Lake Champlain and the Bay (Clausen, 1985). It is important to note that there frequently exists significant lateral and vertical variation in the toxicity and contaminant levels of bed sediments (USEPA, 1987).

Despite the distinction made between mixed and stratified lakes, there is considerable gray area between these groups. For example, thermally stratified lakes may be assumed to be mixed during periods of overturn, and laterally stratified lakes can sometimes be treated as if the different lateral segments are sublakes. In any case, it is important that the monitoring team knows what parcel of water is being sampled when the program is implemented. It would be inappropriate, for example, to assign the attributes of a surface sample to the hypolimnion of a stratified lake due to the differences in temperature and other parameters between the upper and lower waters.

Estuaries. Estuaries can be very complex systems, particularly large ones such as the Chesapeake Bay. Estuaries exhibit temporal and spatial variability just as streams and lakes do. Physically, the major differences between estuaries and fresh waterbodies are related to the mixing of fresh water with salt water and the influence of tides. These factors increase the complexity of spatial and temporal variability within an estuary.

Short-term variability in estuaries is related directly to the tidal cycles, which can have an effect on both the mixing of the fresh and saline waters and the position of the freshwater-saltwater interface (USEPA, 1982a). The same considerations made for lakes regarding short-term variability of parameters such as temperature, dissolved oxygen, and pH should also be made for estuaries.

Temperature profiles such as those found in stratified lakes can also change with season in estuaries. The resulting circulation dynamics must be considered when developing monitoring programs. The effects of season on the quantity of freshwater runoff to an estuary can be profound. In the Chesapeake Bay, for example, salinity is generally lower in the spring and higher in the fall due to the changes in freshwater runoff from such sources as snowmelt runoff and rainfall (USEPA, 1982a).

Spatial variability in estuaries has both significant vertical and lateral components. The vertical variability is related to both temperature and chemical differentials. In the Chesapeake Bay thermal stratification occurs during the summer, and chemical stratification occurs at all times, but in different areas at different times (USEPA, 1982a). Chemical stratification can be the result of the saltwater wedge flowing into and under the freshwater outflow or the accumulation or channeling of freshwater and saltwater flows to opposite shores of the estuary. The latter situation can be caused by a combination of tributary location, the earth's rotation, and the barometric pressure. In addition, lateral variability in salinity can be caused by different levels of mixing between saltwater and freshwater inputs. As noted for streams and lakes, the lateral and vertical variation in the toxicity and contaminant levels of bed sediments should be considered (EPA, 1987).

Coastal Waters. Researchers and government agencies are collectively devoid of significant experience in evaluating the effectiveness of nonpoint source pollution control efforts through the monitoring of near-shore and off-shore coastal waters. Our understanding of the factors to consider when performing such monitoring is therefore very limited.

As for other waterbody types, it is important to understand the hydrology, chemistry, and biology of the system in order to develop an effective monitoring program. Of particular importance is the ability to identify discrete populations to sample from. For trend analysis it is essential that the researcher is able to track over time the conditions of a clearly identifiable segment or unit of coastal water. This may be accomplished by monitoring a semienclosed near-shore embayment or similar system. Knowledge of salinity and circulation patterns should be useful in identifying such areas.

Secondly, monitoring should be focused on those segments or units of coastal water for which there is a reasonable likelihood that changes in water quality will result from the implementation of management measures. Segment size, circulation patterns, and freshwater inflows should be considered when estimating the chances for such water quality improvements.

Near-shore coastal waters may exhibit salinity gradients similar to those of estuaries due to the mixing of fresh water with salt water. Currents and circulation patterns can create temperature gradients as well. Farther from shore, salinity gradients are less likely, but gradients in temperature may occur. In addition, vertical gradients in temperature and light may be significant. These and other biological, chemical, and physical factors should be considered in the development of monitoring programs for coastal waters.

b. The Management Measures to Be Implemented

An integral part of the system to be monitored is the set of management measures to be implemented. Management measures can generally be classified with respect to their modes of control: (1) source reduction, (2) delivery reduction, or (3) the reduction of direct impacts. For example, source-reduction measures may include nutrient management, pesticide management, and marine pump-out facilities. These measures all rely on the prevention of nonpoint source pollution; trapping and treatment mechanisms are not relied upon for control. Delivery-reduction measures include those that rely on detention basins, filter strips, constructed wetlands, and similar practices for trapping or treatment prior to release or discharge to receiving waters. Measures that reduce direct impacts include wetland and riparian area protection, habitat protection, the preservation of natural stream channel characteristics, the provision of fish passage, and the provision of suitable dissolved oxygen levels below dams.

Delivery Reduction. Delivery-reduction measures lend themselves to inflow-outflow, or process, monitoring to estimate the effectiveness in reducing loads. The simple experimental approach is to take samples of inflow and outflow at appropriate time intervals to measure differences in the water quality between the two points. An example is the analysis of totals suspended solids (TSS) concentrations at the inflow and outflow of a sediment retention basin to determine the percentage of TSS removed.

Source Reduction. Source-reduction measures generally cannot be monitored using a process design because there are usually no discrete inflow and outflow points. The effectiveness of these measures will generally be determined by applying approaches such as paired-watershed studies and upstream-downstream studies.

Reduction of Direct Impacts. The effectiveness of measures intended to prevent direct impacts cannot be determined through the monitoring of loads since pollutant loads are not generated. Instead, monitoring might include reference site approaches where the conditions (e.g., habitat or macroinvertebrates) at the affected (or potentially affected) area are compared over time (as management measures are implemented) versus conditions at a representative unimpacted site or sites nearby (Ohio EPA, 1988). This approach can be taken to the point of being a paired-watershed study if the monitoring timing and protocols are the same at the impacted and reference sites.

Combinations of Management Measures. Management measures are systems of practices, technologies, processes, siting criteria, operating methods, or other alternatives. Pollution control programs generally consist of systems of management measures applied over well-defined geographic areas. Combinations of the three types of measures described above are likely to be found in any given area to be monitored. Monitoring programs, therefore, must often be directed at measuring the cumulative effectiveness of a range of different measures applied in different areas at different times within a specified geographic area. Under these conditions, the monitoring approaches for source-reduction and direct-impact-reduction measures are typically used, while process monitoring is not generally used other than to track the effectiveness of specific delivery-reduction measures implemented in the area.

c. Point Sources and Other Significant Activities

There is often a need to isolate the effects of other activities that occur independently of the planned implementation of management measures but that have an effect on the measured parameters. For example, an upgrade from secondary to tertiary treatment at a wastewater treatment plant in a watershed could have a major effect on the measured nitrogen levels. An effective monitoring program would isolate the effects of changes in the point source contributions by measuring the discharge from these sources over time.

3. Experimental Design

a. Types of Experimental Designs

EPA has prescribed monitoring designs for use in watershed projects funded under section 319 of the Clean Water Act (USEPA, 1991b). The objective in promoting these designs is to document changes in water quality that can be related to the implementation of nonpoint source control measures in selected watersheds. The designs recommended by EPA are paired-watershed designs and upstream-downstream designs. Single downstream station designs are not recommended by EPA for section 319 watershed projects (USEPA, 1991b).

Monitoring before implementation is usually required to detect a trend or show causality (Coffey and Smolen, 1990). Two years of pre-implementation monitoring are typically needed to establish an adequate baseline. Less time may be needed for studies at the management measure or edge-of-field scale, when hydrologic variability is known to be less than that of typical agricultural systems, or when a paired-watershed design is used.

Paired-Watershed Design. In the paired-watershed design there is one watershed where the level of implementation (ideally) does not change (the control watershed) and a second watershed where implementation occurs (the study watershed). This design has been shown in agricultural nonpoint source studies to be the most powerful study design for demonstrating the effectiveness of nonpoint source control practice implementation (Spooner et al., 1985). Paired-watershed designs have a long history of application in forest hydrology studies. The paired-watershed design must be implemented properly, however, to generate useful data sets. Some of the considerations to be made in designing and implementing paired-watershed studies are described below.

In selecting watershed pairs, the watersheds should be as similar as possible in size, shape, aspect, slope, elevation, soil type, climate, and vegetative cover (Striffler, 1965). The general procedure for paired-watershed studies is to monitor the watersheds long enough to establish a statistical relationship between them. A correlation should be found between the values of the monitored parameters for the two watersheds. For example, the total nitrogen values in the control watershed should be correlated with the total nitrogen values in the study watershed. A pair of watersheds may be considered sufficiently calibrated when a parameter for the control watershed can be used to predict the corresponding value for the study watershed (or vice versa) within an acceptable margin of error.

It is important to note that the calibration period should cover all or the significant portion of the range of conditions for each of the major water quality determinants in the two watersheds. For example, the full range of hydrologic conditions should be covered (or nearly covered) during the calibration period. This may be problematic in areas where rainfall and snowmelt are highly variable from year to year or in areas subject to extended wet periods or drought. Calibration during a dry year is likely to not be adequate for establishing the relationship between the two watersheds, particularly if subsequent years include both wet and dry periods.

Similarly, some agricultural areas of the country use long-term, multiple-crop rotations. The calibration period should cover not only the range of hydrologic conditions but also the range of cropping patterns that can reasonably be expected to have an influence on the measured water quality parameters. This is not to say that the calibration period should take 5 to 10 years, but rather that States should use careful judgment in determining when the calibration period can be safely ended.

After calibration, the study watershed receives implementation of management measures, and monitoring is continued in both watersheds. The effects of the management measures are evaluated by testing for a change in the relationship between the monitored parameters (i.e., a change in the correlation). If treatment is working, then there should be a greater difference over time between the treated study watershed and the untreated (poorly managed) control watershed. Alternatively, the calibration period could be used to establish statistical relationships between a fully treated watershed (control watershed) and an untreated watershed (study watershed). After calibration under this approach, the study watershed would be treated and monitoring continued. The effects of the management measures would be evaluated, however, by testing for a change in the correlation that would indicate that the two watersheds are more similar than before treatment.

It is important to use small watersheds when performing paired-watershed studies since they are more easily managed and more likely to be uniform (Striffler, 1965). EPA recommends that paired watersheds be no larger than 5,000 acres (USEPA, 1991b).

Upstream-Downstream Studies. In the upstream-downstream design, there is one station at a point directly upstream from the area where implementation of management measures will occur and a second station directly downstream from that area. Upstream-downstream designs are generally more useful for documenting the magnitude of a nonpoint source than for documenting the effectiveness of nonpoint source control measures (Spooner et al., 1985), but they have been used successfully for the latter. This design provides for the opportunity to account for covariates (e.g., an upstream pollutant concentration that is correlated with a downstream concentration of same pollutant) in statistical analyses and is therefore the design that EPA recommends in cases where paired watersheds cannot be established (USEPA, 1991b).

Upstream-downstream designs are needed in cases where project areas are not located in headwaters or where upstream activities that are expected to confound the analysis of downstream data occur. For example, the effects of upstream point source discharges, uncontrolled nonpoint source discharges, and upstream flow regulation can be isolated with upstream-downstream designs.

Inflow-Outflow Design. Inflow-outflow, or process, designs are very similar to upstream-downstream designs. The major differences are scale and the significance of confounding activities. Process designs are generally applied in studies of individual management measures or practices. For example, sediment loading at the inflow and outflow of a detention basin may be measured to determine the pollutant removal efficiency of the basin. In general, no inputs other than the inflow are present, and the only factor affecting outflow is the management measure. As noted above (see The Management Measures to Be Implemented), process monitoring cannot generally be applied to studies of source-reduction management measures or measures that prevent direct impacts, but it can be applied successfully in the evaluation of delivery-reduction management measures.

b. Scale

Management Measure. Monitoring the inflow and outflow of a specific management measure should be the most sensitive scale since the effects of uncontrollable discharges and uncertainties in treatment mechanisms are minimized.

Edge of Field. Monitoring pollutant load from a single-field watershed should be the next most sensitive scale since the direct effects of implementation can be detected without pollutant trapping in a field border or stream channel (Coffey and Smolen, 1990).

Subwatershed. Monitoring a subwatershed can be useful to monitor the aggregate effect of implementation on a group of fields or smaller areas by taking samples close to the treatment (Coffey and Smolen, 1990). Subwatershed monitoring networks measure the aggregate effects of treatment and nontreatment runoff as it enters an upgradient tributary or the receiving waterbody. Subwatershed monitoring can also be used for targeting critical areas.

Watershed. Monitoring at the watershed scale is appropriate for assessing total project area pollutant load using a single station (Coffey and Smolen, 1990). Depending on station arrangement, both subwatershed and watershed outlet studies are very useful for water and pollutant budget determinations. Monitoring at the watershed outlet is the least sensitive of the spatial scales for detecting treatment effect. Sensitivity of the monitoring program decreases with increased basin size and decreased treatment extent or both (Coffey and Smolen, 1990.

c. Reference Systems and Standards

EPA's rapid bioassessment protocols advocate an integrated assessment, comparing habitat and biological measures with empirically defined reference conditions (Plafkin et al., 1989). Reference conditions are established through systematic monitoring of actual sites that represent the natural range of variation in "least disturbed" water chemistry, habitat, and biological condition. Reference sites can be used in monitoring programs to establish reasonable expectations for biological, chemistry, and habitat conditions. An example application of this concept is the paired-watershed design (Coffey and Smolen, 1990).

EPA's ecoregional framework can be used to establish a logical basis for characterizing ranges of ecosystem conditions or quality that are realistically attainable (Omernik and Gallant, 1986). Ecoregions are defined by EPA to be regions of relative homogeneity in ecological systems or in relationships between organisms and their environments. Hughes et al. (1986) have used a relatively small number of minimally impacted regional reference sites to assess feasible but protective biological goals for an entire region.

Water quality standards can be used to identify criteria that serve as reference values for biological, chemical, or habitat parameters, depending on the content of the standard. The frequency distribution of observation values can be tracked against either a water quality standard criterion or a reference value as a method for measuring trends in water quality or loads (USEPA, 1991b).

4. Site Locations

Within any given budget, site location is a function of water resource type (see The Water Resource), monitoring objectives (see Monitoring Objectives), experimental design (see Types of Experimental Designs), the parameters to be monitored (see Parameter Selection), sampling techniques (see Sampling Techniques and Samples and Sampling), and data analysis plans (see Data Analysis). Additional considerations in site selection are accessibility and landowner cooperation.

It is recommended that monitoring stations be placed near established gaging stations whenever possible due to the extreme importance of obtaining accurate discharge measurements. Where gaging stations are not available but stream discharge measurements are needed, care should be taken to select a suitable site. Brakensiek et al. (1979) provide excellent guidance regarding runoff measurement, including the following selected recommendations regarding site selection:

  • Field-calibrated gaging stations should be located in straight, uniform reaches of channel having smooth beds and banks of a permanent nature whenever possible.
  • Gaging stations should be located away from sewage outfall, power stations, or other installations causing flow disturbances.
  • Consider the geology and contributions of ground-water flow.
  • Where ice is a potential problem, locate measuring devices in a protected area that receives sunlight most of the time.
  • Daily current-meter measurements may be necessary where sand shifts occur.

5. Sampling Frequency and Interval

a. Sample Size and Frequency

It is important to estimate early in a monitoring effort the number and frequency of samples required to meet the monitoring objectives. Spooner et al. (1991) report that the sampling frequency required at a given monitoring station is a function of the following:

  • Monitoring goals;
  • Response of the water resource to changes in pollutant sources;
  • Magnitude of the minimum amount of change for which detection with trend analyses is desired (i.e., minimum detectable change);
  • System variability and accuracy of the sample estimate of reported statistical parameter (e.g., confidence interval width on a mean or trend estimate);
  • Satistical power (i.e., probability of detecting a true trend);
  • Autocorrelation (i.e., the extent to which data points taken over time are correlated);
  • Monitoring record length;
  • Number of monitoring stations; and
  • Statistical methods used to analyze the data.

The minimum detectable change (MDC) is the minimum change in a water quality parameter over time that is considered statistically significant. Knowledge of the MDC can be very useful in the planning of an effective monitoring program (Coffey and Smolen, 1990). The MDC can be estimated from historical records to aid in determining the required sampling frequency and to evaluate monitoring feasibility (Spooner et al., 1987a). MacDonald (1991) discusses the same concept, referring to it as the minimum detectable effect.

The larger the MDC, the greater the change in water quality that is needed to ensure that the change was not just a random fluctuation. The MDC may be reduced by accounting for covariates, increasing the number of samples per year, and increasing the number of years of monitoring.

Sherwani and Moreau (1975) stated that the desired frequency of sampling is a function of several considerations associated with the system to be studied, including:

  • Response time of the system;
  • Expected variability of the parameter;
  • Half-life and response time of constituents;
  • Seasonal fluctuation and random effects;
  • Representativeness under different conditions of flow;
  • Short-term pollution events;
  • Magnitude of response; and
  • Variability of the inputs.

Coastal waters, estuaries, ground water, and lakes will typically have longer response times than streams and rivers. Thus, sampling frequency will usually be greater for streams and rivers than for other water resource types. Some parameters such as total suspended solids and fecal coliform bacteria can be highly variable in stream systems dominated by nonpoint sources, while nitrate levels may be less volatile in systems driven by baseflow from ground water. The highly variable parameters would generally require more frequent sampling, but parameter variability should be evaluated on a site-specific basis rather than by rule of thumb.

In cases where pollution events are relatively brief, sampling periods may also be short. For example, to determine pollutant loads it may be necessary to sample frequently during a few major storm events and infrequently during baseflow conditions. Some parameters vary considerably with season, particularly in watersheds impacted primarily by nonpoint sources. Boating is typically a seasonal activity in northern climates, so intensive seasonal monitoring may be needed to evaluate the effectiveness of management measures for marinas.

The water quality response to implementation of management measures will vary considerably across the coastal zone. Pollutant loads from confined livestock operations may decline significantly in response to major improvements in runoff and nutrient management, while sediment delivery from logging areas may decline only a little if the level of pollution control prior to section 6217 implementation was already fairly good. Fewer samples will usually be needed to document water quality improvement in watersheds that are more responsive to pollution control efforts.

Sherwani and Moreau (1975) state that for a given confidence level and margin of error, the necessary sample size, and hence sampling frequency, is proportional to the variance. Since the variance of water quality parameters may differ considerably over time, the frequency requirements of a monitoring program may vary depending on the time of the year. Sampling frequency will need to be greater during periods of greater variance.

There are statistical methods for estimating the number of samples required to achieve a desired level of precision in random sampling (Cochran, 1963), stratified random sampling (Reckhow, 1979), cluster sampling (Cochran, 1977), multistage sampling (Gilbert, 1987), double sampling (Gilbert, 1987), and systematic sampling (Gilbert, 1987). For a more detailed discussion of sampling theory and statistics, see Samples and Sampling .

b. Sampling Interval

A method for estimating sampling interval is provided by Sherwani and Moreau (1975). They note that the least favorable sampling interval for parameters that exhibit a periodic structure is equal to the period or an integral multiple of the period. Such sampling would introduce statistical bias. Reckhow (1979) points out that, for both random and stratified random sampling, systematic sampling is acceptable only if "there is no bias introduced by incomplete design, and if there is no periodic variation in the characteristic measured." Gaugush (1986) states that monthly sampling is usually adequate to detect the annual pattern of changes with time.

c. Some Recommendations

It is generally recommended that the sampling of plankton, fish, and benthic organisms in estuaries should be seasonal, with the same season sampled in multiyear studies (USEPA, 1991a). The aerial coverage and bed density for submerged aquatic vegetation (SAV) vary from year to year due to catastrophic storms, exceptionally high precipitation and turbidity, and other poorly understood natural phenomena (USEPA, 1991a). For this reason, short-term SAV monitoring may be more reflective of infrequent impacts and may not be useful for trend assessment. In addition, incremental losses in wetland acreage are now within the margin of error for current detection limits. It is recommended that SAV and wetland sampling be conducted during the period of peak biomass (USEPA, 1991a).

The frequency of sediment sampling in estuaries should be related to the expected rate of change in sediment contaminant concentrations (USEPA, 1991a). Because tidal and seasonal variability in the distribution and magnitude of several water column physical characteristics in estuaries is typically observed, these influences should be accounted for in the development of sampling strategies (USEPA, 1991a).

For monitoring the state of biological variables, the length of the life cycle may determine the sampling interval (Coffey and Smolen, 1990). EPA (1991b) recommends a minimum of 20 evenly spaced (e.g., weekly) samples per year to document trends in chemical constituents in watershed studies lasting 5 to 10 years. The 20 samples should be taken during the time period (e.g., season) when the benefits of implemented pollution control measures are most likely to be observed. For benthic macroinvertebrates and fish, EPA recommends at least one sample per year.

6. Load Versus Water Quality Status Monitoring

The choice between monitoring either (a) the status or condition of the water resource or (b) the pollutant load to the water resource should be made carefully (Coffey and Smolen, 1990). Loading is the rate of pollutant transport to the managed resource via overland, tributary, or ground-water flow. Load monitoring may be used to assess the change in magnitude of major pollutant sources or to assess the change in pollutant export at a fixed station. Monitoring water quality status includes measuring a physical attribute, chemical concentration, or biological condition, and may be used to assess baseline conditions, trends, or the impact of treatment on the managed resource.

Monitoring water quality status may be the most direct route to an answer on the effect of management measure implementation on designated use, but sensitivity may be low (Coffey and Smolen, 1990). When the likelihood of detecting a trend in water quality status is low, load monitoring near the source may be necessary. For example, measuring the effectiveness of nutrient management in one tributary to a large coastal embayment may require monitoring nitrogen load, since bay monitoring is unlikely to measure the change in the mean nitrogen concentration or trophic state measures for the bay.

When the basis for a choice between load or water quality status is less obvious (i.e., it is not clear whether abatement can be detected in the receiving resource), a pollutant budget may help to make the decision (Coffey and Smolen, 1990). The budget should account for mass balance of pollutant input by source, including ground-water and atmospheric deposition, all output, and changes in storage. The budget may show the magnitude and relative importance of controlled and uncontrolled sources (e.g., atmospheric deposition, resuspension from sediments, streambank erosion). Sources of error in the budget should also be evaluated. Where treatment is not likely to produce measurable change in the waterbody, load monitoring may be required.

a. Pollutant Load Monitoring

Load monitoring requires a complex, and typically expensive, sampling protocol to measure water discharge and pollutant concentration (Coffey and Smolen, 1990). Both discharge and concentration data are needed to calculate pollutant loading.

Given the variability of discharge and pollutant concentrations in watersheds impacted by nonpoint sources, the consequences of not collecting data from all storm events and baseflow over a range of conditions (e.g., season, land cover) can be major. For example, equipment failure during a single storm event can result in considerable error in estimating annual pollutant load. It is typical that data gaps will occur, requiring the application of mathematical techniques to estimate the discharge and pollutant concentrations for missed events.

Brakensiek et al. (1979) provide a detailed description of methods and equipment needed for discharge monitoring. Techniques are described for both field and watershed studies.

b. Water Quality Status Monitoring

Water quality status can be evaluated in a number of ways, including:

  • Evaluating designated use attainment;
  • Evaluating standards violations;
  • Assessing ecological integrity; or
  • Monitoring an indicator parameter.

Monitoring for designated use attainment should focus on those parameters or criteria specified in State water quality standards. Where such parameters or criteria are not specified, critical variables related to use support should be monitored. If the monitoring objective includes relating water quality improvement to the pollution control activities, then it is important that monitored parameters can be related to the management measures implemented. For example, it may be appropriate to monitor nitrogen concentrations if septic system improvements are implemented.

For violations of standards, the choice of variable is specified by the State water quality standard (Coffey and Smolen, 1990). To assess ecological integrity, the selection of parameters should be based on criteria used to evaluate such status. For trend detection the indicator parameter must be carefully selected to account for changes in treatment and system variability (Coffey and Smolen, 1990). Additional information regarding appropriate parameters to monitor can be found under Parameter Selection below.

7. Parameter Selection

Monitoring parameters should be related directly to the identified problems caused by the nonpoint sources that will be controlled, and to those principal pollutants that will be controlled through the implementation of management measures. For example, if metal loads are to be determined to be the primary pollutant of concern from marinas, then appropriate monitoring parameters will include flow and the metals of concern. If the effectiveness of improved management of repair and maintenance areas is to be determined, then implementation should be tracked as well. There should also be a mechanism for relating the management measure to the specific pollutants monitored. For example, it should be clear that improved management of repair and maintenance areas of a marina will have an effect on metals loads if such loads are monitored.

a. Relationship to Sources

MacDonald (1991) evaluates the sensitivity of various monitoring parameters to a range of management activities in forested areas in the Pacific Northwest and Alaska. Table 8-1 provides examples of parameters that could be monitored to determine the effectiveness of management measures. Some of the listed parameters (e.g.,. benthic macroinvertebrates) can be sampled only in waterbodies, while others (e.g., total suspended solids) can be sampled at the source or in waterbodies. This table is provided for illustrative purposes only.

b. Implementation Tracking

Land treatment and land use monitoring should relate directly to the pollutants or impacts monitored at the water quality station (Coffey and Smolen, 1990). Land use monitoring should also reflect historical impacts as well as activities during the project. Since the impact of management measures on water quality may not be immediate or implementation may not be sustained, information on relevant watershed activities will be essential for the final analysis.

EPA recommends that the reporting units used to track implementation should be reliable indicators of the extent to which the pollutant source will be controlled (USEPA, 1991b). For example, the tons of animal waste managed may be a much more useful parameter to track than the number of confined animal facilities constructed.

c. Explanatory Variables

An effective nonpoint source monitoring program accounts for as many sources of variability as possible to increase the likelihood that the effects of the management measures can be separated from the other sources of variability. Some of this other variability can be accounted for by tracking the parameters (e.g., precipitation, flow, pH, salinity) most likely to affect the values of the principal monitored parameters (Coffey and Smolen, 1990). These explanatory variables are treated as covariates in statistical analyses that isolate the effect of the management measures from the variability, or noise, in the data caused by natural factors. In paired-watershed and upstream-downstream studies, EPA recommends that the complete set of parameters (including explanatory variables) are monitored at each monitoring site, following the same monitoring schedule and protocol (USEPA, 1991b).

8. Sampling Techniques

a. Automated Sampling to Estimate Pollutant Loads

Typical methods for estimating pollutant loads include continuous flow measurements and some form of automated sampling that is either timed or triggered by some feature of the runoff hydrograph. For example, in the Santa Clara watershed of San Francisco Bay, flow was continuously monitored at hourly intervals, wet-weather monitoring included collection of flow-composite samples taken with automatic samplers, and dry-weather monitoring was conducted by obtaining quarterly grab samples (Mumley, 1991). Data were used to estimate annual, wet-weather, and dry-weather copper loads.

In St. Albans Bay, Vermont, continuous flow and composite samples were used to estimate nutrient loads for trend analysis (Vermont RCWP, 1984). In the Nationwide Urban Runoff Program (NURP) project in Bellevue, Washington, catchment area monitoring included continuous gaging and automatic sampling that occurred at a preset time interval (5 to 50 minutes) once the stage exceeded a preset threshold (USEPA, 1982b).

b. Grab Sampling for Pollutant Loads

Grab sampling with continuous discharge gaging can be used to estimate load in some cases. Grab sampling is usually much less expensive than automated sampling methods and is typically much simpler to manage. These significant factors of cost and ease make grab sampling an attractive alternative to automated sampling and therefore worthy of consideration even for monitoring programs with the objective of estimating pollutant loads.

Grab sampling should be carefully evaluated to determine its applicability for each monitoring situation (Coffey and Smolen, 1990). Nonpoint source pollutant concentrations generally increase with discharge. For a system with potentially lower variability in discharge, such as irrigation, grab sampling may be a suitable sampling method for estimating loads (Coffey and Smolen, 1990). Grab sampling may also be appropriate for systems in which the distribution of annual loading occurs over an extended period of several months, rather than a few events. In addition, grab sampling may be used to monitor low flows and background concentrations.

For systems exhibiting high variability in discharge or where the majority of the pollutant load is transported by a few events (such as snowmelt in some northern temperate regions), however, grab sampling is not recommended.

c. Habitat Sampling

EPA recommends a procedure for assessing habitat quality where all of the habitat parameters are related to overall aquatic life use support and are a potential source of limitation to the aquatic biota (Plafkin et al., 1989). In this procedure, EPA begins with a survey of physical characteristics and water quality at the site. Such physical factors as land use, erosion, potential nonpoint sources, stream width, stream depth, stream velocity, channelization, and canopy cover are addressed. In addition, water quality parameters such as temperature, dissolved oxygen, pH, conductivity, stream type, odors, and turbidity are observed.

Then, EPA follows with the habitat assessment, which includes a range of parameters that are weighted to emphasize the most biologically significant parameters (Plafkin et al., 1989). The procedure includes three levels of habitat parameters. The primary parameters are those that characterize the stream "microscale" habitat and have the greatest direct influence on the structure of the indigenous communities. These parameters include characterization of the bottom substrate and available cover, estimation of embeddedness, and estimation of the flow or velocity and depth regime. Secondary parameters measure the "macroscale" and include such parameters as channel alteration, bottom scouring and deposition, and stream sinuosity. Tertiary parameters include bank stability, bank vegetation, and streamside cover.

MacDonald (1991) discusses a wide range of channel characteristics and riparian parameters that can be monitored to evaluate the effects of forestry activities on streams in the Pacific Northwest and Alaska. MacDonald states that "stream channel characteristics may be advantageous for monitoring because their temporal variability is relatively low, and direct links can be made between observed changes and some key designated uses such as coldwater fisheries." He notes, however, that "general recommendations are difficult because relatively few studies have used channel characteristics as the primary parameters for monitoring management impacts on streams."

On the other hand, MacDonald concludes that the documented effects of management activities on the stability and vegetation of riparian zones, and the established linkages between the riparian zone and various designated uses, provide the rationale for including the width of riparian canopy opening and riparian vegetation as recommended monitoring parameters. Riparian canopy opening is measured and tracked through a historical sequence of aerial photographs (MacDonald, 1991). Riparian vegetation is measured using a range of methods, including qualitative measures of vegetation type, visual estimations of vegetation cover, quantitative estimations of vegetation cover using point- or line-intercept methods, light intensity measurements to estimate forest cover density, stream shading estimates using a spherical densiometer, and estimates of vegetation density based on plot measurements.

Habitat variables to monitor grazing impacts include areas covered with vegetation and bare soil, stream width, stream channel and streambank stability, and width and area of the riparian zone (Platts et al., 1987). Ray and Megahan (1978) developed a procedure for measuring streambank morphology, erosion, and deposition. Detailed streambank inventories may be recorded and mapped to monitor present conditions or changes in morphology through time.

To assess the effect of land use changes on streambank stability, Platts et al. (1987) provide methods for evaluating and rating streambank soil alteration. Their rating system can be used to determine the conditions of streambank stability that could affect fish. Other measurements that could be important for fisheries habitat evaluations include streambank undercut, stream shore water depth, and stream channel bank angle.

d. Benthic Organism Sampling

Benthic communities in estuaries are sampled through field surveys, which are typically time-consuming and expensive (USEPA, 1991a). Sampling devices include trawls, dredges, grabs, and box corers. For more specific benthic sampling guidance, see Klemm et al. (1990).

e. Fish Sampling

For estuaries and coastal waters, a survey vessel manned by an experienced crew and specially equipped with gear to collect organisms is required (USEPA, 1991a). Several types of devices and methods can be used to collect fish samples, including traps and cages, passive nets, trawls (active nets), and photographic surveys. Since many of these devices selectively sample specific types of fish, it is not recommended that comparisons be made among data collected using different devices (USEPA, 1991a).

f. Shellfish Sampling

Pathobiological methods provide information concerning damage to organ systems of fish and shellfish through an evaluation of their altered structure, activity, and function (USEPA, 1991a). A field survey is required to collect target organisms, and numerous tissue samples may be required for pathobiological methods. In general, pathobiological methods are labor-intensive and expensive (USEPA, 1991a).

g. Plankton Sampling

Phytoplankton sampling in coastal waters is frequently accomplished with water bottles placed at a variety of depths throughout the water column, some above and some below the pycnocline (USEPA, 1991a). A minimum of four depths should be sampled. Zooplankton sampling methods vary depending on the size of the organisms. Devices used include water bottles, small mesh nets, and pumps (USEPA, 1991a).

h. Aquatic Vegetation Sampling

Attributes of emergent wetland vegetation can be monitored at regular intervals along a transect (USEPA, 1991a). Measurements include plant and mulch biomass, and foliar and basal cover. Losses of aquatic vegetation can be tracked through aerial photography and mapping.

i. Water Column Sampling

In estuaries and coastal waters, chemical samples are frequently collected using water bottles and should be taken at a minimum of four depths in the vertical profile (USEPA, 1991a). Caged organisms have also been used to monitor the bioaccumulation of toxic chemicals.

Physical sampling of the water column at selected depths in estuaries is done with bottles for temperature, salinity, and turbidity, or with probes for temperature and salinity (USEPA, 1991a). Current meters are used to characterize circulation patterns.

j. Sediment Sampling

Several types of devices can be used to collect sediment samples, including dredges, grabs, and box corers (USEPA, 1991a). Sampling depth may vary depending on the monitoring objective, but it is recommended that penetration be well below the desired sampling depth to prevent sample disturbance as the device closes (USEPA, 1991a). EPA also recommends the selection of sediment samplers that also sample benthic organisms to cut sampling costs and to permit better statistical analyses relating sediment quality to benthic organism parameters.

k. Bacterial and Viral Pathogen Sampling

For estuaries and coastal waters it is recommended that samples be taken of both the underlying waters and the thin microlayer on the surface of the water (USEPA, 1991a). This is recommended, despite the fact that standardized methods for sampling the microlayer have not been established, because research has shown bacterial levels several orders of magnitude greater in the microlayer. In no case should a composite sample be collected for bacteriological examination (USEPA, 1978).

Water samples for bacterial analyses are frequently collected using sterilized plastic bags or screw-cap, wide-mouthed bottles (USEPA, 1991a). Several depths may be sampled during one cast, or replicate samples may be collected at a particular depth by using a Kemmerer or Niskin sampler (USEPA, 1978). Any device that collects water samples in unsterilized tubes should not be used for collecting bacteriological samples without first obtaining data that support its use (USEPA, 1991a). Pumps may be used to sample large volumes of the water column (USEPA, 1978).

9. Quality Assurance and Quality Control

Effective quality assurance and quality control (QA/QC) procedures and a clear delineation of QA/QC responsibilities are essential to ensure the utility of environmental monitoring data (Plafkin et al., 1989). Quality control refers to the routine application of procedures for obtaining prescribed standards of performance in the monitoring and measurement process. Quality assurance includes the quality control functions and involves a totally integrated program for ensuring the reliability of monitoring and measurement data.

EPA's QA/QC program requires that all EPA National Program Offices, EPA Regional Offices, and EPA laboratories participate in a centrally planned, directed, and coordinated Agency-wide QA/QC program (Brossman, 1988). This requirement also applies to efforts carried out by the States and interstate agencies that are supported by EPA through grants, contracts, or other formalized agreements. The EPA QA program is based on EPA order 5360.1, which describes the policy, objectives, and responsibilities of all EPA Program and Regional Offices (USEPA, 1984).

Each office or laboratory that generates data under EPA's QA/QC program must implement, at a minimum, the prescribed procedures to ensure that precision, accuracy, completeness, comparability, and representativeness of data are known and documented. In addition, EPA QA/QC procedures apply throughout the study design, sample collection, sample custody, laboratory analysis, data review (including data editing and storage), and data analysis and reporting phases.

Specific guidance for QA/QC is provided for EPA's rapid bioassessment protocols (Plafkin et al., 1989) and for EPA's Ocean Data Evaluation System (USEPA, 1991a). Standardized procedures for field sampling and laboratory methods are an essential element of any monitoring program.

D. Data Needs

Data needs are a direct function of monitoring goals and objectives. Thus, data needs cannot be established until specific goals and objectives are defined. Furthermore, data analyses should be planned before data types and data collection protocols are agreed upon. In short, the scientific method, defined as "a method of research in which a problem is identified, relevant data gathered, an hypothesis formulated, and the hypothesis empirically tested" (Stein, 1980), should be applied to determine data needs.

Types of data generally needed for nonpoint source monitoring programs will include chemical, physical, and biological water quality data; precipitation data; topographic and morphologic data; soils data; land use data; and land treatment data. The specific parameters should be determined based on site-specific needs and the monitoring objectives that are established.

Under EPA's quality assurance and quality control (QA/QC) program (see Quality Assurance and Quality Control), a full assessment of the data quality needed to meet the intended use must be made prior to specification of QA/QC controls (Brossman, 1988). The determination of data quality is accomplished through the development of data quality objectives (DQOs), which are qualitative and quantitative statements developed by data users to specify the quality of data needed to support specific decisions or regulatory actions. Establishment of DQOs involves interaction of decision makers and the technical staff. EPA has defined a process for developing DQOs (USEPA, 1986).

E. Statistical Considerations

A significant challenge for those performing monitoring under section 6217 is to isolate the changes in loads and water quality caused by the implementation of management measures from those changes caused by the other sources of variability. In short, the task is to separate the effect, or "signal," from the noise.

Successful monitoring programs typically resemble research, complete with focused objectives, hypotheses to test, statistical analyses, thorough data interpretation, and clear reporting. Statistics are an inherent component of nearly all water quality monitoring programs (MacDonald, 1991). The capability to plan for and use statistical analyses, therefore, is essential to the development and implementation of successful monitoring programs. The following discussion provides some basic information regarding statistics that should be understood by monitoring professionals. A qualified statistician should be consulted to review the proposed monitoring design, the plan for statistical analyses, the application of statistical techniques, and the interpretation of the analytic results.

1. Variability and Uncertainty

Gilbert (1987) identifies five general sources of variability and uncertainty in environmental studies:

  1. Environmental variability;
  2. Measurement bias, precision, and accuracy;
  3. Statistical bias;
  4. Random sampling errors; and
  5. Gross errors and mistakes.

The author describes environmental variability as "the variation in true pollution levels from one population unit to the next." There are multiple sources of environmental variability that could affect pollutant loads and water quality conditions. These sources include variability in weather patterns within and across years, natural variability in water resource conditions, variations in biological communities, variability in loadings from point sources and other sources that may not be addressed under section 6217 programs, and variability in land use. Changing land use brings with it changes in the level of pollution control possible under section 6217. For example, a conversion from well-managed agricultural cropland to well-managed suburban development may cause decreases in nutrient and sediment loads while possibly causing increases in metal loads and changes in hydrology. Gilbert (1987) notes that existing information on environmental variability can be used to "design a plan that will estimate population parameters with greater accuracy and less cost than can otherwise be achieved."

Accuracy is a measure of how close the sample value is to the true population value, whereas precision refers to the repeatability of sample values. Measurement bias occurs when estimates are consistently higher or lower than the true population value (Gilbert, 1987). Random sampling errors (e.g., variability in sample means for different random samples from the same population) are due only to the random selection process and arise from the environmental variability of population units (Gilbert, 1987). By definition, random sampling error is zero if all population units are measured.

Statistical bias is "a discrepancy between the expected value of an estimator and the population parameter being estimated" (Gilbert, 1987). Gilbert (1987) provides examples of estimators that are biased for small sample sizes but less biased or unbiased for larger samples.

Gross mistakes can occur at any point in the process, beginning with sample collection and ending with the reporting of study results (Gilbert, 1987). Adherence to accepted sampling and laboratory protocol, combined with thorough quality control and data screening procedures, will minimize the chances for gross errors.

2. Samples and Sampling

a. Samples

A sample is defined as "a small part of anything or one of a number, intended to show the quality, style, or nature of the whole" (Stein, 1980). Environmental samples are collected for both economic and practical reasons: that is, researchers cannot afford to inspect the whole and researchers usually have neither the time and resources nor the capability to even try to inspect the whole. Besides, researchers often find that a sample or collection of samples will provide sufficient information about the whole to allow decisions to be made regarding actions that should or should not be taken.

In a statistical sampling program, the whole is called the population or target population, and it consists of the set of population units about which inferences will be made (Gilbert, 1987). As an example, population units could be defined as macroinvertebrate populations on square-meter sections of river bottom, nitrogen concentrations in 1-liter grab samples, or hourly mean-flow values at a specific gaging station. Gilbert (1987) refers to the sampled population as the set of population units directly available for measurement.

b. Sampling Objectives

Gaugush (1986) states that "the major objective in sampling program design is to obtain as accurate or unbiased an estimate as possible, and at the same time to reduce or explain as much of the variability as possible in order to improve the precision of the estimates." According to Cochran (1977), an estimator is unbiased if its mean value, taken over all possible samples, is equal to the population statistic that it estimates.

In the real world it is necessary to design sampling programs that meet accuracy and precision requirements while not placing unreasonable burdens on sampling personnel or sampling budgets. As stated by Gaugush (1986), budget constraints may force the issue of whether sampling results will produce information sufficient to meet the study objectives.

Gaugush (1986) describes in some detail specific points to consider in defining study objectives. He notes that "sampling is facilitated by specifying the narrowest possible set of objectives which will provide the desired information." First, he recommends that the target population be defined as a key step in limiting the variability encountered in the sampling program. As an example, in a coastal watershed impacted by nonpoint sources, the target population could be defined as storm-event, total nitrogen concentrations at the outlets of all tributaries to the bay, thus eliminating the need to monitor at upstream and in-bay sites and during baseflow conditions. In this example, the definition of the target population also specifies the water quality parameter of interest (i.e., total nitrogen concentration). Note that both spatial and temporal limits should be established when defining the target population. With respect to the example, then, the researcher may more specifically define the population units as the total nitrogen concentrations in half-hour, composite samples taken during all storms (storms as defined by the researcher).

The next step, according to Gaugush (1986), is to decide whether parameter estimation or hypothesis testing is the primary analytic goal. This choice will have an impact on the sampling design. As an example, Gaugush points out that balanced designs are desirable for hypothesis testing (see Estimation and Hypothesis Testing), whereas parameter estimation may require unbalanced sample allocations to account for the spatial variability of parameter levels. Hypothesis testing is likely to be used in program evaluation (e.g., water quality before and after nonpoint source management measures are implemented), whereas parameter estimation can be applied in assessments when determining pollutant loads from various sources.

Finally, Gaugush (1986) recommends that exogenous variables and sampling strata be defined. Exogenous variables are used to explain some of the variability in the measured parameter of interest. As an example, total suspended solids (TSS) is often a covariate of total phosphorus (TP) concentration in watersheds impacted by agricultural runoff. Measurement of TSS may help increase the precision of TP estimates.

c. Sample Type and Sampling Design

The sampling program should provide representative and sufficient data to support planned analyses. Site location and sampling frequency are often considered sufficient to describe the "where" and "when" of sampling programs. While this is certainly true to a large extent, these two factors alone do not describe fully where and when samples are collected. Additional considerations include the depth of sampling and the surface-water or ground-water stratum to which the sampling depth belongs, the origins of the aliquots taken in each sample bottle, and the time frame over which measurements are made (including specific dates). These additional considerations are factors that characterize the type of sample collected. Site location and sampling frequency are components of sampling design.

In order for the data analyst to interpret sampling results appropriately, the sample type, sampling design, and target population must all be clearly described. It should be clear from these descriptions whether the data collected are representative of the target population.

Examples of sample type classifications include instantaneous and continuous; discrete and composite; surface, soil-profile, and bottom; time-integrated, depth-integrated, and flow-integrated; and biological, physical, and chemical. Specific guidance regarding the collection of these various sample types is not presented in this guidance since there are several existing guidances to address sampling protocols and equipment.

An overview of a range of basic sampling designs is provided below. Users are encouraged to consult basic statistics textbooks (e.g., Cochran, 1977) and books on applied statistics (e.g., Gilbert, 1987) to obtain additional information regarding these designs.

Simple Random Sampling. In simple random sampling, each unit of the target population has an equal chance of being selected. For example, if the target population is the macroinvertebrate population found on 100 square meters of river bottom and the population units are 1-square-meter sections of river bottom, then each unit would have a 1 percent chance of being sampled under a random sampling program.

Gilbert (1987) and Cochran (1977) both address many aspects of simple random sampling. Included in these texts are methods for estimation of the mean and total for sampling with and without replacement, equations for determining the number of samples required for both independent and correlated data, and the impact of measurement errors.

Stratified Random Sampling. In stratified random sampling, the target population is divided into separate groups called strata for the purpose of obtaining a better estimate of the mean or total for the entire population (Gilbert, 19987). Simple random sampling is then used within each stratum.

Stratified random sampling could be used, for example, to monitor water quality in streams below irrigation return flows. Based on a knowledge of irrigation and precipitation patterns for the watershed, the researcher could divide the year into two or more homogenous periods. Within each period random samples could be taken to characterize the average concentration of a particular pollutant. These random samples could take the form of daily, flow-weighted composite samples, with the sampling dates randomly determined.

Cluster Sampling. In cluster sampling, the total population is divided into a number of relatively small subdivisions, or clusters, and then some of these subdivisions are randomly selected for sampling (Freund, 1973). For one-stage cluster sampling these selected clusters are sampled totally, but in two-stage cluster sampling random sampling is then performed within each cluster (Gaugush, 1986).

Cluster sampling is applied in cases where it is more practical to measure randomly selected groups of individual units than to measure randomly selected individual units (Gilbert, 1987). An example of one-stage cluster sampling is the collection of all macroinvertebrates on randomly selected rocks within a specified sampling area. The stream bottom may contain hundreds of rocks with thousands of organisms attached to them, thus making it difficult to sample the organisms as individual units. However, it may be possible to randomly select rocks and then inspect every organism on each selected rock.

Multi-stage Sampling. Two-stage sampling involves dividing the target population into primary units, randomly selecting a subset of these primary units, and then taking random samples (subunits) within each of the selected subsets (Gilbert, 1987). All of the random samples from the subunits are measured completely. Two-stage cluster sampling, described above, is one form of two-stage sampling. Cochran (1977) describes two-stage sampling in great detail, and both Gilbert (1987) and Cochran (1977) discuss three-stage sampling and compositing.

Double Sampling. Double sampling, or two-phase sampling, involves taking a large preliminary sample to gain information (e.g., population mean or frequency distribution) about an auxiliary variate (xi) in the context of a larger sampling survey to make estimates for some other variate (yi) (Cochran, 1977). This technique can be used for stratification, ratio estimates, and regression estimates (Cochran, 1977).

Double sampling for stratification requires a first sample to estimate the strata weights (the proportion of samples to be taken in each stratum) and a second sample to estimate the strata means (Cochran, 1977). Gilbert (1987) discusses a use of double sampling in which two techniques are used in initial sampling and subsequent sampling is performed using only the cheaper or simpler technique. The initial sampling is used to establish a linear regression between the measurements from the two techniques. This regression is then applied to the subsequent measurements made with the cheaper technique to predict the measurement result that would have been obtained with the better, more expensive technique.

Systematic Sampling. A commonly used sampling approach is systematic sampling, which entails taking samples at a preset interval of time or space, using a randomly selected time or location as the first sampling point (Gilbert, 1987). Systematic sampling is used extensively in water quality monitoring programs usually because it is relatively easy to do from a management perspective.

Cochran (1977) points out that the difference between systematic sampling and stratified random sampling with one unit per stratum is that in systematic sampling the sampled unit occurs in the same relative position within each stratum while in stratified random sampling the relative position is selected randomly. Cochran recommends systematic sampling for the following situations:

  • When the ordering of the population is essentially random or it contains at most a mild stratification;
  • When stratification with numerous strata is employed and an independent systematic sample is drawn from each stratum;
  • When subsampling cluster units; and
  • When sampling populations with variation of a continuous type, provided that an estimate of the sampling error is not regularly required.

Sampling for Regression Analysis. Regression analysis is used to predict variable values based on a mathematical relationship between a dependent variable and one or more independent variables (Gaugush, 1986). Gaugush points out that regression analysis requires that at least one quantitative independent variable be used, whereas parameter estimation and hypothesis testing can be performed for groups or classes (i.e., only the variable tested needs to be quantitative). For example, one could quantify the relationship between sediment levels and flow rates by regressing the log of total suspended solids (TSS) concentrations (dependent) against flow rates (independent), which would require quantitative measurements of both parameters. Alternatively, one could estimate average TSS levels (parameter estimation) for high, medium, and low flow conditions with quantitative measures of TSS concentrations and qualitative measures of flow (e.g., visual observation).

Gaugush (1986) discusses sampling to support regression analyses in terms of relating variables to either a spatial or a temporal gradient, the latter being for trends over time. Some key points made are explained below.

Spatial Gradient Sampling

  • The gradient variable is treated as a covariant to the variable of interest.
  • If the relationship is linear, only two points need to be sampled; the extreme points are preferred.
  • Whenever the relationship is known, relatively few sampling points are needed along the gradient. More samples may then be used as replicates.
  • Whenever the relationship is not known, more sampling points are needed along the gradient. More replicates are also needed to test the proposed model.
  • It is usually acceptable to place sampling points equal distances from each other along the gradient. However, the investigator should be careful not to fall in step with some natural phenomenon, which would bias any data collected.
Time Sampling
  • Time can be used either as a covariate or as a grouping variable (e.g., season). Grouping by time may be desirable when changes in the variable of interest are either small over time or occur only during short periods with long periods of little or no change.
  • Considerations in using time as a covariate are similar to those above for gradients, but (1) time is usually only a surrogate for other variables (e.g., implementation of management measures) that truly affect the variable of interest, and (2) the relationship with time is likely to be complex.
  • If time is to be used as a covariate, relatively frequent sampling will be needed, with some replication within sampling periods. Random sampling within the periods is also recommended.

Comparison of Sampling Designs. Both Gilbert (1987) and Cochran (1977) indicate that systematic sampling is generally superior to stratified random sampling in estimating the mean. Cochran (1977), however, found that stratified random sampling provides a better estimate of the mean for a population with a linear trend, followed in order by systematic sampling and simple random sampling. Freund (1973) notes that estimates of the mean that are based on cluster sampling are generally not as good as those based on simple random samples, but they are better per unit cost. Table 8-2 summarizes the conditions under which each of six probabilistic sampling approaches should be used for estimating means and totals (Gilbert, 1987). Cochran (1977) states that "stratification nearly always results in a smaller variance for the estimated mean or total than is given by a comparable simple random sample." Estimates of variance from systematic samples may differ from those determined from random samples, but Cochran (1977) notes that "on average the two variances are equal." Cochran warns, however, that for any finite population for which the number of sampling units is small the variance from systematic sampling is erratic and may be smaller or larger than the variance from simple random sampling.

d. Preliminary Sampling

Preliminary sampling helps to ensure that the population of interest is being sampled and to evaluate its distribution (Coffey and Smolen, 1990). Preliminary sampling or previous testing helps avoid the problem of collecting large sets of useless data because of ineffective gear, or improper sample preparation or preservation. The target population can be easily missed, especially for biological monitoring.

e. Use of Existing Data

Existing data may be used for problem definition, or for a pre-implementation baseline data set if the collection protocol matches the monitoring objective, design, and quality assurance/quality control (QA/QC) required for the post-implementation data collection (Coffey and Smolen, 1990). Existing data may also be used for assessing parameter variability and estimating the number of samples or the time period for the monitoring survey based on the desired level of significance and error.

3. Estimation and Hypothesis Testing There are two major types of statistical inference: estimation and hypothesis testing (Remington and Schork, 1970). In estimation it is hoped that sample information can be used to make a reasonable conclusion regarding the value of an unknown parameter. For example, the sample mean and standard deviation are used to estimate a range within which it is likely that the population mean falls. This sort of estimation can be useful in developing baseline information, developing or verifying models, estimating the nonpoint source contributions in a watershed, or determining the nitrogen load from a single runoff event.

In hypothesis testing, data are collected for the purpose of accepting or rejecting a statement made about the expected results of a study or effort. Hypothesis testing can be used to help decide whether management measures have reduced pollutant loads or improved water quality. Because of this, hypothesis testing is a recommended element of monitoring programs under section 6217.

The null hypothesis (Ho) is the root of hypothesis testing. Traditionally, null hypotheses are statements of "no change," but Remington and Schork (1970) prefer the term "tested hypothesis" since these hypotheses can take the form of expected changes, effects, or differences. The alternate hypothesis (Ha) is the counter to the null hypothesis, traditionally being a statement of change, effect, or difference. That is, upon rejection of an Ho stating no change one would accept the Ha of change. One could, however, state an Ho of the type "change of at least 10 percent," with an Ha of the type "no change of at least 10 percent." The choice is left to the researcher.

If the monitoring design is sound and statistical testing shows the null hypothesis to be false, then a change can be inferred (Coffey and Smolen, 1990). Otherwise, the monitoring survey should conclude that the objective was not met or that detection of change was overcome by extreme variability. In either case, with a sound objective, well-formulated hypothesis, and careful design, the monitoring survey may be expected to produce valuable information.

The following are examples of hypotheses that could be developed for section 6217 monitoring programs.

  • Implementation of nutrient management on cropland in all tributary watersheds will not reduce mean total nitrogen concentrations in Beautiful Sound by at least 20 percent.
  • Urban detention basins in New City will not remove 80 percent of sediment delivered to the basins.
  • Improved marina management will not reduce metals loadings from the repair and maintenance areas of Stellar Marina.
  • Forestry harvest activities have not increased weekly mean total suspended solids concentrations in Clean River.

F. Data Analysis

A detailed preliminary analysis using scatter plots and statistical tests of assumptions and the properties of the data set such as the distribution, homogeneity in variance, bias, independence, etc. precede formal hypothesis testing and statistical analysis (Coffey and Smolen, 1990). From the objective and the properties of the data set, the appropriate statistical test may be chosen to determine a trend, impact, or causality.

Simple scatter plots can often reveal much about the data set. For example, a scatter plot of nitrate concentrations versus depth collected at 106 monitoring wells in South Dakota (Figure 8-2) clearly shows that (Goodman et al., 1992):

  • With few exceptions, nitrate concentrations above 5 parts per million (ppm) were not detected at depths greater than 20 feet below the water table;
  • Nitrate concentrations greater than 0.2 ppm were not observed at depths greater than 30 feet below the water table; and
  • Nitrate concentrations exceeded 50 ppm only twice.

For trend detection some of the appropriate tests include Student's t-test, linear regression, time series, and nonparametric trend tests (Coffey and Smolen, 1990). For an assessment of impact and causality, a careful tracking of treatment is required and the two-sample Student's t-test, linear regression, and intervention time series are appropriate statistical tests (Spooner, 1990). Evidence from experimental plot studies, edge-of-field pollutant runoff monitoring, and modeling studies may be used to support the conclusion of causality (Coffey and Smolen, 1990).

A comparison of regression lines for data collected before best management practices (BMPs) were implemented (pre-BMP) and for data collected after BMPs were implemented (post-BMP) can be used to explore the presence of trends in a paired-watershed study. The example in Figure 8-3 (Meals, 1991b) shows a downward shift of the post-BMP regression line, suggesting a significant decrease in total phosphorus (TP) export from the treated (study) watershed (WS 4). In this study, pre-BMP data were collected for 3 years for calibration (see Types of Experimental Designs) of the two watersheds (control and study), followed by a post-BMP monitoring period of 5 years. Meals (1991b) explains the plot by noting that a 5-pound-per-week (lb/wk) export of TP from the control watershed (WS 3) corresponded to an 8.25-lb/wk export from the study watershed (WS 4) before BMP implementation. After BMP implementation, the same 5-lb/wk export from the control watershed corresponded to a 6-lb/wk export from the study watershed.

Lietman (1992) used cluster analysis to establish eight different storm groups based on total storm precipitation, antecedent soil-moisture conditions, precipitation duration, precipitation intensity, and crop cover. The results of analyses performed using the following clusters will be presented:

  • Cluster 1: Summer showers on moist soil with crop cover.
  • Cluster 3: Typical spring and fall all-day storms generally with 0.2 to 0.6 inch of precipitation on soil with little crop coverage.
  • Cluster 6: Thunderstorms occurring predominantly in the summer on soil with cover crop.
  • Cluster 7: Very small storms throughout the year on dry soil; most storms occurring on soil with little crop cover.
  • Cluster 8: Typical spring and fall all-day storms generally with 0.8 to 1.6 inches of precipitation on soil with little crop cover.

These clusters were then used to group data for testing for significant differences between pre-BMP (Period 1, 1983-1984) and post-BMP (Period 3, 1987-1988; after terraces were installed) median runoff volume, mean suspended sediment concentrations, and mean nutrient concentrations at a 22.1-acre field site in Lancaster County, Pennsylvania. Cluster 3 had a very small number of storms producing runoff in Period 3, indicating that terracing increased the threshold at which runoff occurred (Lietman, 1992). Other results, summarized in Figure 8-4 (Lietman, 1992), indicate that terracing caused mean storm suspended sediment concentrations in runoff to decrease for storms in clusters 6, 7, and 8. Terraces also appeared to increase mean nitrate (Clusters 1, 6, 7, and 8) and mean total nitrogen concentrations (Clusters 1 and 8).

Failure to observe improvement may mean that the problem is not carefully documented, management action is not directed properly, the strength of the treatment is inadequate, or the monitoring program is not sensitive enough to detect change (Coffey and Smolen, 1990). A mid-course evaluation, if conducted early enough, provides an opportunity for modifications in project goals or monitoring design.

Clear reporting of the results of statistical analyses is essential to effective communication with managers. Graphical techniques and simple narrative interpretations of statistical findings generally help managers obtain the level of detail they need to make decisions regarding subsequent actions. For example, Figure 8-5 illustrates the use of box-and-whisker plots to summarize fecal coliform data at the beach on St. Albans Bay, Vermont (Meals et al., 1991). The graphic clearly shows a general decline in bacteria counts in 1987-1989, as well as the fact that the water quality standard has been met during those same years. A graphic summary of trends is illustrated in Figure 8-6, also taken from the St. Albans Bay project (Meals, 1992). This simple graphic is particularly easy for managers to interpret.

Return to Previous Section

Continue to Next Section

Return to the Table of Contents

Jump to main content.