Our Paper on Benthic Ecosystem Health and Local Marine Economies in Florida and the Gulf

Iain and I wrote a brief paper on benthic ecosystem health and the marine economy in Florida and the Gulf, and we won the $2000 EPA NARS Data prize. We used econometric methods for our analysis. The full paper will be entered in January for the main prize.


From the Press release:

WASHINGTON – Today the U.S. Environmental Protection Agency (EPA) announced seven undergraduate and graduate student winners for phase 1 of the National Aquatic Resource Surveys (NARS) Campus Challenge, recognizing exemplary research in the area of water quality and ecosystems. Announced in February, the NARS Campus Challenge encourages students to develop proposals for research projects that find innovative ways to use NARS data about the condition of the nation’s rivers, streams, lakes, and coastal areas.

“The National Aquatic Resource Surveys are helping our states and tribes effectively and accurately monitor the ecological condition of our surface waters, which in turn helps EPA better target program efforts to meet our Clean Water Act goals,” said Ken Kopocis, Deputy Assistant Administrator for EPA’s Water Office. “These students are working to protect America’s surface water resources and bring to this challenge energy, innovative perspectives, and cutting-edge knowledge.”

The National Aquatic Resource Surveys are a series of statistically representative surveys conducted by state, tribal and federal partners about the condition of the nation’s waters using core indicators and standardized lab and field methods. In addition to providing national assessments of key water body types such as coastal areas, rivers and streams, lakes, and wetlands, NARS also helps to improve the states’ capacity for water quality monitoring and assessment.

The Phase 1 winners each received an award of $2000 for their proposals. After completing their proposed work, these students may apply for Phase 2 of the NARS Campus Research Challenge. The Phase 2 winners will be awarded $5000 each.


Conserve.io collaboration: Using GIS and mobile technology to plot whales by species and behavior

Recently I was invited by Jake Levenson and the folks over at Conserve.io to help analyze spatial and quantitative data for a project by entitled “Conservation in the Cloud: Leveraging mobile technology connecting tourism & resource managers.” The plots that I contributed use data from their application to reflect real time whale sightings in two embayments in Iceland, as well as recorded behaviors of these marine mammals. The Conserve.io technology and application is not my baby, so I won’t go into it other than to say it is an amazing high-tech solution and opportunity for citizen scientists to monitor local marine mammal populations, and that you can read more about their work here.

Instead, I wanted to showcase a few of the plots I made using Spotter App’s data, as it was also featured in Jake’s talk at the IMCC3 Conference in summer of 2014 (of which I am a co-author). The slides can be viewed on this website under the Conferences Header on the Homepage. Thanks to Jake and Conserve.io for inviting me onto the project. Now onto the whale plots.

This first plot was made in an effort to decide if we should create plots normalized by unit effort, but since effort (which I decided to quantify by hours at sea that their application recorded) was so tightly clustered, this didn’t really tell us much in the end.

Screen Shot 2014-09-24 at 10.25.17 PM

This is the same normalized data reflecting unit effort, but made in R and with a Lowess line to further demonstrate that this normalization would make more sense if trip lengths varied more. Thanks to Iain Dunning for helping me re-format some messy real time data.

Screen Shot 2014-09-24 at 10.27.00 PM

The following image is all of the whale sightings broken down by species in this one particular embayment in Northern Iceland.

Screen Shot 2014-09-24 at 10.30.45 PM

The next image is sightings broken down by species off the coast of Reykjavik:

Screen Shot 2014-09-24 at 10.32.08 PM

The next plot shows species, but with data points adjusted by percentage that species gets sighted across all trips.

Screen Shot 2014-09-24 at 10.33.19 PM

This plot shows the number of different whales sighted in a single trip:

Screen Shot 2014-09-24 at 10.35.13 PM

The following plots show all sightings, with the sightings that included calves in red (off the coast of iceland)

Screen Shot 2014-09-24 at 10.36.31 PMScreen Shot 2014-09-24 at 10.37.48 PM

These next few plots focus on sightings of particular behaviors, coded by color, but with data points sized according to frequency of that particular observed behavior:

Screen Shot 2014-09-24 at 10.38.30 PM

Screen Shot 2014-09-24 at 10.39.41 PM

Again, if you liked what you saw here, go and check out Conserve.io!

EPA Coastal Data Contest Winners

This week it was announced that Iain and I won the coastal data contest held by the US EPA’s National Aquatic Resource Survey (NARS, link here). We will use the $2000 prize to write up our formal analysis for the grand prize in January.

Our paper linked benthic ecosystem data over time with socio-economic data from coastal communities in Florida and the Gulf region whose economies depend largely on the marine industry. We used a categorical output of ecosystem health as a function of median income, percent of the local economy dependent on marine resources, and several other socio-economic indicators.

The final part of the contest involves actually running the numbers in our proposed statistical model, and is due in January. Until then we will continue to survey the work already done that links communities in Florida and the Gulf and their blue economy.

Crocodile Lake National Wildlife Refuge Florida Keys

Crocodile Lake National Wildlife Refuge Florida Keys


Islamorada Florida

Islamorada Florida


Statistics for fisheries regulations

I wanted to post the question I wrote for the 2014 midterm in quantitative reasoning at MIT. The answers are italicized. It’s a good example of using quantitative reasoning for fisheries regulations planning.

Question 4.

It’s lobster season in Fiji. Villagers are allowed to take lobsters from a protected area for a two-week season every year in the summer. Lobster weights are distributed normally with a mean of 450 grams and a standard deviation of 120 grams. The national fisheries agency will not allow lobsters to be removed if they weigh less than 100 grams, as these are juveniles. If a lobster weighs 600 grams or more, the fisheries agency will not allow it to be removed since it is a “valuable spawner”.

a. What is the probability that a lobster chosen at random by a fisherman is too small to keep (i.e. it is 100 grams or less) show your work.

-Find the z score for the juvenile weight: (100-450)/120= ~2.92

-Look it up in the z table and see 0.4982

-Subtract this from 0.5 to get the area under the tail =.0018

-The probability of choosing a lobster this small is .0018

b. The chief has called you in to consult on behalf of his village. He says that the juvenile lobsters are too easy to catch, and that most fishermen have to throw most of their catch back into the water. He wants the size classification for juveniles changed to that they can keep the 100g lobsters. Do you think his perception is accurate?

No the likelihood of catching juveniles is so small that the fact they are catching so many is either untrue or it means that assuming the weights are normally distributed may not be correct.

c. What percentage of lobsters caught can the fishermen keep and bring to market? What can you say to the chief about the fairness of these regulations?

The allowable catch is between 100 and 600 grams. This is the area bounded by their two z scores, which we know is 2.92 for 100 g from part a. Now we need to get the z score for 600 g:

(600-450)/120= 1.25

-The percentage from the mean to juveniles is .4982, and the percentage from the mean to valuable spawners is .3944. Add them to get the total area beneath the curve.


-A random lobster caught has an 89.26% chance of being kept since it is within the regulated allowable size. You can keep nearly 90% of everything you catch, the regulations do not need to be changed. 

Sea Turtle Nesting Habits & Socioeconomic Indicators in R (part II)

Last time I looked at correlations between actual nest building and socioeconomic variables like population in South Florida’s coastal counties, and per capita income. This time I am looking for relationships between emergence of the same three sea turtle species (loggerhead, green, and leatherbacks noted in their latin names in the code). Emergence means coming onto the beach and returning into the ocean, laying eggs or not. I thought that counties with lower populations would see more emergence because beaches would probably be emptier at night. I ran the correlations and got pretty non-remarkable correlations for population. The correlations were mild to moderate for per capita income by county. It was  0.33 for loggerhead emergence and per capita income by county, .25 for greens, and .37 for leatherbacks. I ran the linear regressions and the result was disappointing. No stars and high p-values. I have attached the .csv and the R code in case you want to play around. Best for people learning R. As I said in my last blog entry, the data could tell us more if it looked at emergence through time and socioeconomic trends related more closely to direct, shore-front development.

Excel data file here:  sea.turtle.nests

R code below:

#take cor of emergence and per capita income
cor(sea.turtle.nests$C..carettaNON.NESTING.EMERGENCE , sea.turtle.nests$per.capita.income)
cor(sea.turtle.nests$C..mydasNON.NESTING.EMERGENCE , sea.turtle.nests$per.capita.income)
cor(sea.turtle.nests$D..coriaceaNON.NESTING.EMERGENCE , sea.turtle.nests$per.capita.income)
#run linear regressions of emergence as a function of per capita income
loggerhead.reg <- lm(C..carettaNON.NESTING.EMERGENCE ~ per.capita.income , data=sea.turtle.nests)
#Estimate Std. Error t value Pr(>|t|)
#(Intercept) -5122.3617 10669.0678 -0.480 0.641
#per.capita.income 0.4263 0.3732 1.142 0.278
#Residual standard error: 9566 on 11 degrees of freedom
#Multiple R-squared: 0.1061, Adjusted R-squared: 0.02479
#F-statistic: 1.305 on 1 and 11 DF, p-value: 0.2776

green.reg <- lm(C..mydasNON.NESTING.EMERGENCE ~ per.capita.income , data=sea.turtle.nests)
#Estimate Std. Error t value Pr(>|t|)
#(Intercept) -475.84094 1672.65952 -0.284 0.781
#per.capita.income 0.04852 0.05851 0.829 0.425
#Residual standard error: 1500 on 11 degrees of freedom
#Multiple R-squared: 0.05885, Adjusted R-squared: -0.02671
#F-statistic: 0.6878 on 1 and 11 DF, p-value: 0.4245

leather.reg <- lm(D..coriaceaNON.NESTING.EMERGENCE ~ per.capita.income , data=sea.turtle.nests)
#Estimate Std. Error t value Pr(>|t|)
#(Intercept) -47.376772 57.792547 -0.820 0.430
#per.capita.income 0.002672 0.002021 1.322 0.213
#Residual standard error: 51.82 on 11 degrees of freedom
#Multiple R-squared: 0.1371, Adjusted R-squared: 0.05862
#F-statistic: 1.747 on 1 and 11 DF, p-value: 0.2131

Sea Turtle Nesting Habits in R

I pulled together a small and simple data set to have a look at turtle nesting on South Florida’s beaches, and to see if it had any relationships to some easy-to-find socioeconomic data. I looked at nesting habits for greens, loggerheads, and leatherbacks to see if you could create linear regressions with population by county, median household income, number of households by county, and per capita income by county.

The highest correlation I got was between leatherback nests and per capita income, but it bore no statistically significant relationship upon running the regression. I put my R code here with comments for people like me who are learning to play around in R.

R Code

loggerhead.nest <- sea.turtle.nests$C..carettaNEST
loggerhead.emergence <- sea.turtle.nests$C..carettaNON.NESTING.EMERGENCE
green.nest <- sea.turtle.nests$C..mydasNEST
green.emergence <- sea.turtle.nests$C..mydasNON.NESTING.EMERGENCE
leather.nest <- sea.turtle.nests$D..coriaceaNEST
leather.emergence <- sea.turtle.nests$D..coriaceaNON.NESTING.EMERGENCE
#make latin to common name labeling
plot(loggerhead.nest ~ population , data=sea.turtle.nests)
plot(loggerhead.nest ~ log(population) , data=sea.turtle.nests)
#population alone looked off, took the log
plot(green.nest ~ log(population) , data=sea.turtle.nests)
plot(leather.nest ~ log(population) , data=sea.turtle.nests)
#took plots for all three species
leather.lm <- lm(leather.nest ~ sea.turtle.nests$per.capita.income , data=sea.turtle.nests)
#Estimate Std. Error t value Pr(>|t|)
#(Intercept) -2.083e+02 2.397e+02 -0.869 0.403
#sea.turtle.nests$per.capita.income 1.227e-02 8.385e-03 1.463 0.171
#p-value: 0.1714
#Adjusted R-squared: 0.0868

Figure 1: leatherback nests and log of population: no linear relationship


Figure 2 Loggerhead nest and log population


Figure 3 Green nests and log population


Getting better data to reflect development along the ocean front would probably help here, also looking at nesting habits through time .