solution

A storm on July 4, 1999 with wind speeds exceeding 90 miles per hour hit the Boundary Waters Canoe Area Wilderness (BWCAW) in northeastern Minnesota, causing serious damage to the forest. A study of the effects of the storm surveyed the area and counted over 3600 trees to determine whether each of them was dead or alive (data blowdown from package alr3). One of the objectives of the study is to learn the dependence of survival on species, size of the tree, and on local severity. The data set includes results from 3666 trees, including whether a tree was dead or alive (y=1 or y=0), its diameter (D in cm), local severity (S proportion of trees killed), and species (SPP: BF= balsam fir, BS= black spruce, C= cedar, JP= jackpine, PB= paper birch, RP= red pine, RM= red maple, BA = black ash, A= aspen). Fit a logistic regression model and discuss the dependence of survival on the three potential predictors (size, local severity, and species).

 
"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"

solution

Routine water quality data are used in the U.S. by state agencies for assessing environmental standard compliance. Frey et al. [2011] collected water quality and biological monitoring data from wadeable streams in watersheds surrounding the Great Lakes to understand the impact of nutrient enrichment on stream biological communities. Because sample sizes for different streams vary greatly, assessment uncertainty also fluctuates. Qian et al. [2015b] recommended that similar sites be partially pooled using multilevel models for improving assessment accuracy. Water quality monitoring data from Frey et al. [2011] are in file greatlakes.csv. The data file includes information on sites (e.g., location), sampling dates, and various nutrient concentrations. Of interest is the total phosphorus concentration (Tpwu). Detailed site descriptions are in file greatlakessites.csv, including level III ecoregrion, drainage area, and other calculated nutrient loading information.

When assessing a water’s compliance to a water quality standard, we compare the estimated concentration distribution to the water quality standard. The U.S. EPA recommended TP standard for this area is 0.02413 mg/L. We can use monitoring data from a site to estimate the log-mean and log-variance to approximate the TP concentration distribution (a log-normal distribution) and can calculate the probability of a site exceeding the standard.

• Use linear regression to estimate site means simultaneously (with site as the only predictor variable) and estimate the probability of each site exceeding the standard assuming a common within-site variance.

• Use the multilevel model to estimate site means and estimate the probability of each site exceeding the standard. Compare the multilevel model results to the linear regression result and discuss the difference.

 
"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"

solution

Consider the model you developed in question 8 of Chapter 8 and perform a simulation to see if the model you developed adequately describes the response variable data distribution. A potential problem with this data set is the limited variability in the response variable. This could be caused by the difficulty in accurately recording the number of mates a frog had; either the duration of observation is too short, or there might be mates that were not observed. The consequence of this problem is the underreporting of the number of mates, and the resulting model is likely to underestimate the number of mates (and producing too many 0s). Arnold and Wade [1984] discussed other problems with such data.

 
"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"

solution

Qian et al. [2003a] proposed a “nonparametric deviance reduction” method for detecting ecological threshold. The method is based on the CART model, but uses only one predictor representing the environmental gradient. The first split point is used as the threshold. In the paper, the authors suggested that a χ 2 test can be used to test whether the resulting split point is “statistically significant.” Because the split point is the point that results in the largest difference in deviance, it is highly likely that such a test will have a highly inflated type I error probability. Design a simulation to estimate the type I error probability of such a test. In the simulation, we can assume that the response variable is a normal random variable, such that the χ 2 test is reduced to a two-sample t-test. As the method is used to detect a threshold, the null hypothesis should be that a threshold does not exist, or the response variable distribution does not change along the gradient.

 
"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"