Author information:

Date: “9/22/2019”

Author (analysis): Divya Panicker

Study: Cetacean distribution and diversity in Lakshadweep Islands: October 2015 to April 2016

Funding: Rufford Small Grants for Nature Conservation

Institutions: University of Washington, National Centre for Biological Sciences, Centre for Wildlife Studies

Email for correspondence: dpanic@uw.edu

Citation: Panicker, D., Sutaria, D., Kumar, A., Stafford, K. (2019) ‘Cetacean distribution and diversity in Lakshadweep Islands, India using platforms of opportunity: Oct 2015 to Apr 2016’. ResearchWorks. University of Washington.


Corresponding datasets:

Dataset ‘Panicker_et_al_2019_surveyeffortdataset.csv’ contains the effort for all the surveys. Read into the program as ‘effdat’

Dataset ‘Panicker_et_al_2019_cetaceandataset.csv’ is the sightings dataset which include both on survey effort and opportunistic sightings. The sightings data set also include variables such as depth at boat location, slope, aspect and nearest distance to land. Depth was obtained from GeoMapApp (Ryan et al. 2009). Slope, aspect and nearest distance to land was calculated using ArcGIS. The file is read into the program as ‘cetenvdat’.

Dataset ‘Panicker_et_al_2019_beaufortdataset.csv’ is the dataset with corresponding beaufort states. This sheet has the distance covered in each beaufort state per season, per route, per day. Read into the program as ‘effbeaudist’


Names of subsets from data csv files within this notebook:

  1. ‘idcetenvdat’ is the subset for identified sightings including opportunistic sightings.
  2. ‘oncetenvdat’ is the subset for on survey effort sightings.
  3. ‘idcetenvdatstats’ is the subset for identified sightings with the exception of baleen whale and pygmy whale sightings as n=1. This file is used for all statistical analysis.
  4. ‘comidcetenvdatstats’ is the sightings data for identified sightings of three of the most commonly sighted species (spinnner, bottlenose and pilot).
  5. ‘routenames’ is a dataset created for a list of survey routes and corresponding route numbers.

Note: If subsets have inter or ne at the end, it corresponds to inter-monsoon or northeast monsoon seasons


R Setup

Working directory for R studio, Reading Data files and Data preparation

Run line by line to avoid masking

#knitr::opts_chunk$set(echo = FALSE)
rm(list = ls()) # clears workspace
# Working directory on your computer! CHANGE AS YOU NEED
# set the working directory to your home directory where data files are stored
setwd("/Volumes/GoogleDrive/My Drive/Laksh I project/design & data/Data Entry/Laksh R files/Panicker_2019_Cetaceans_Lakshadweep/") 
# Required Packages and libraries
library(lubridate)
library(gdata)
library(plotrix)
library(psych)
library(devtools)
library(reshape2)
library(ggplot2)
# Reading in data files
effdat<-read.csv("Panicker_et_al_2019_surveyeffortdataset.csv", header = T, stringsAsFactors = F) # effort dataset
cetenvdat<-read.csv("Panicker_et_al_2019_cetaceandataset.csv", header = T, stringsAsFactors = F) # cetacean dataset
effbeaudist<-read.csv("Panicker_et_al_2019_beaufortdataset.csv", header = T, stringsAsFactors = F) # beaufort dataset
# Data preparation for analysis
# Creating a dataset of routenames and corresponding route numbers
routenames<-data.frame((c(1,2,3,4,5,6,7,8,9,10,11,12)),(c("AGT-AKC","KVT-AGT","BTR-AKC","KVT-AKC","CHT-AKC","KLT-AKC","KVT-AND","BTR-KVT","KLT-CHT","KVT-KLP","AMN-KDT","KVT-KLT")))
colnames(routenames) <- c("routeno", "route")
# Creating a subset for the identified species
idcetenvdat<-subset(cetenvdat, !cetenvdat$species == "blackfish" & !cetenvdat$species == "uniddolphin" & !cetenvdat$species == "unidcet" & !cetenvdat$species=="unidsmallwhale" & !cetenvdat$species == "unidwhale" & !cetenvdat$species == "unidsmalldolphin" )
# Creating a subset for on survey effort sightings
oncetenvdat<-subset(cetenvdat, !cetenvdat$eff == "off")
# Removing baleen and pgymykiller whale from identified species as sample size is 1 for these species (This step needed for stats analysis)
idcetenvdatstats<-drop.levels(subset(idcetenvdat,!idcetenvdat$species == "baleen" & !idcetenvdat$species == "pygmykiller"))
# Creating a subset for identified commonly sighted species (n>10 namely, spinner dolphin, bottlenose dolphin and pilot whale)
comidcetenvdatstats<-rbind(subset(idcetenvdat, idcetenvdat$species == "spinner"), subset(idcetenvdat, idcetenvdat$species == "bottlenose"), subset(idcetenvdat, idcetenvdat$species == "pilot"))
# Creating a subset for identified species in northeast monsoon season
idcetenvdatstatsne<-subset(idcetenvdatstats, idcetenvdatstats$seas == 'ne')
# Creating a subset for identified species in northeast monsoon season for commonly sighted species (n>10 namely, spinner dolphin, bottlenose dolphin and pilot whale)
comidcetenvdatstatsne<-subset(comidcetenvdatstats, comidcetenvdatstats$seas == 'ne')
# Creating a subset for identified species in inter monsoon season
idcetenvdatstatsinter<-subset(idcetenvdatstats, idcetenvdatstats$seas == 'inter')
# Creating a subset for identified species in inter monsoon season for commonly sighted species (n>10 namely, spinner dolphin, bottlenose dolphin and pilot whale)
comidcetenvdatstatsinter<-subset(comidcetenvdatstats, comidcetenvdatstats$seas == 'inter')
print ("SETUP DONE AND DATA READ INTO R")
[1] "SETUP DONE AND DATA READ INTO R"

Table of Contents

Structure of data

No. of sightings, survey effort and beaufort states

Encounter rates

Species list

Group sizes (Table 1)

Seafloor depths at sighting locations(Table 1)

Cetacean sightings across season (Figure 4)

Cetacean species across distance to nearest landmass

Cetacean species across seafloor slope gradient

Cetacean species across seafloor depth

Plot of cetacean species across nearest distance to land and slope gradient (Figure 5)



Structure of data

Note: Column header details are given in ReadMe file.

Effort dataset (effdat)

# structure of effort
effdat$date<-as.Date(effdat$date, "%m/%d/%y") # setting data format
str(effdat)
'data.frame':   78 obs. of  13 variables:
 $ date            : Date, format: "2015-10-28" "2015-10-31" "2015-10-31" "2015-11-01" ...
 $ seas            : chr  "ne" "ne" "ne" "ne" ...
 $ HSV             : chr  "small" "large" "large" "small" ...
 $ routeno         : int  10 9 5 3 8 4 11 2 6 9 ...
 $ route           : chr  "KVT KLP" "KLT CHT" "CHT AKC" "BTR AKC" ...
 $ tripno          : int  1 1 1 1 1 1 1 1 1 2 ...
 $ dist            : num  69.6 31.3 33.5 69.1 58.6 ...
 $ hr.dec          : num  2.83 1.25 1.03 2.28 2.08 1.92 0.37 1.88 1.15 1.08 ...
 $ hrs             : chr  "2:50" "1:15" "1:02" "2:17" ...
 $ starttimewbreaks: chr  "8:35" "12:55" "14:43" "9:59" ...
 $ endtimewbreaks  : chr  "12:01" "13:53" "15:45" "12:40" ...
 $ sight           : chr  "Y" "Y" "Y" "Y" ...
 $ no.sight        : int  2 2 1 2 1 1 1 4 1 0 ...

Cetacean dataset (cetenvdat)

str(cetenvdat)
'data.frame':   139 obs. of  19 variables:
 $ sno       : int  32 33 76 77 91 92 93 94 95 102 ...
 $ mon       : int  11 11 12 12 3 3 3 3 3 3 ...
 $ date      : int  18 18 11 11 11 11 11 11 11 18 ...
 $ year      : int  2015 2015 2015 2015 2016 2016 2016 2016 2016 2016 ...
 $ seas      : chr  "ne" "ne" "ne" "ne" ...
 $ HSV       : chr  "large" "large" "large" "large" ...
 $ eff       : chr  "on" "on" "on" "on" ...
 $ routeno   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ route     : chr  "AGT AKC" "AGT AKC" "AGT AKC" "AGT AKC" ...
 $ time      : chr  "12:10" "12:20" "12:36" "12:36" ...
 $ species   : chr  "spinner" "unidcet" "uniddolphin" "unidsmalldolphin" ...
 $ class     : chr  "smalldolphin" "cetacean" "dolphin" "smalldolphin" ...
 $ lat       : num  11 11 11 11 10.9 ...
 $ long      : num  72.5 72.4 72.5 72.5 72.3 ...
 $ grpsize   : int  20 2 1 2 6 12 1 4 15 6 ...
 $ depth     : int  -1604 -1525 -1140 -1140 -655 -198 -51 -948 -1561 -966 ...
 $ neardistkm: num  19.13 13.62 14.26 14.26 2.93 ...
 $ slope     : num  3.08 6.07 14.41 14.41 17.24 ...
 $ beaufort  : int  4 4 3 3 1 1 1 1 1 3 ...



No. of sightings, survey effort and beaufort states

Refer to subheading of results section of manuscript

Survey effort

# Total hours spent on survey effort
toteffhours<-sum(effdat$hr.dec)
paste ("Total hours spent on survey effort =", toteffhours)
[1] "Total hours spent on survey effort = 128.75"
# Total distance covered during survey effort
toteffdist<-sum(effdat$dist)
paste("Total distance (km) covered during survey effort = ", toteffdist)
[1] "Total distance (km) covered during survey effort =  3880.33"
# Number of surveys hours each day
paste("Average number of hours on survey effort per day = ", round(mean(effdat$hr.dec),2), "+/-", round(std.error(effdat$hr.dec),1))
[1] "Average number of hours on survey effort per day =  1.65 +/- 0.1"
paste("Min hours on survey effort in a day = ", round(min(effdat$hr.dec),1))
[1] "Min hours on survey effort in a day =  0.2"
paste("Max hours on survey effort in a day = ", round(max(effdat$hr.dec),1))
[1] "Max hours on survey effort in a day =  3.2"

No. of sightings

# note: Off effort and opporunistic is the same!
# Total number of cetacean group sightings (including opportunistic sightings)
paste ("Total no of cetacean group sightings (on and off survey effort) =", nrow(cetenvdat))
[1] "Total no of cetacean group sightings (on and off survey effort) = 139"
# No: of cetacean group sightings (on survey effort)
paste ("No. of cetacean group sightings on survey effort =", length(which(cetenvdat == "on")))
[1] "No. of cetacean group sightings on survey effort = 78"
# No: of cetacean group sightings (off survey effort/opportunistic)
paste ("No. of cetacean group sightings off survey effort =", length(which(cetenvdat == "off")))
[1] "No. of cetacean group sightings off survey effort = 61"
# Total number of identified cetacean groups (on and off survey effort)
paste ("Total no of identified cetacean groups (on and off survey effort) =", nrow(idcetenvdat))
[1] "Total no of identified cetacean groups (on and off survey effort) = 74"
# Number of identified cetacean groups (on survey effort)
paste ("No. of identified cetacean groups on survey effort =", length(which(idcetenvdat == "on")))
[1] "No. of identified cetacean groups on survey effort = 31"

Beaufort states recorded during survey effort

Table of distance (km) for each beaufort state

beau.table<-tapply((effbeaudist$Distance),(effbeaudist$Beaufort),FUN=sum)
beau.table
      0       1       2       3       4       5       6 
  17.00  775.48 1139.90 1206.70  603.42  115.57   34.80 

Percentages of the distances with respect to the total distance traveled in a particular beaufort state (e.g. 80.63% distance traveled in 0-3 state, note that this includes not just 0 but 0-3, 15.5% in beaufort 4, 2.96% in beaufort 5 and 0.89% in beaufort 6)

beau.one.two.three.percent<-round(((beau.table[1]+beau.table[2]+beau.table[3]+beau.table[4])/sum(beau.table))*100,1)
beau.one.two.three.percent
   0 
80.6 
beau.four.percent<-round(((beau.table[5])/sum(beau.table))*100,1)
beau.four.percent
   4 
15.5 
beau.five.percent<-round(((beau.table[6])/sum(beau.table))*100,1)
beau.five.percent
5 
3 
beau.six.percent<-round(((beau.table[7])/sum(beau.table))*100,1)
beau.six.percent
  6 
0.9 
rm(beau.table, beau.one.two.three.percent, beau.four.percent, beau.five.percent, beau.six.percent)



Encounter rates

Overall Encounter rate
totencratedist<-round (((nrow(oncetenvdat)/toteffdist)*100),2)
paste("Overall encounter rate (per 100km) of cetacean groups for the entire study area =", totencratedist)
[1] "Overall encounter rate (per 100km) of cetacean groups for the entire study area = 2.01"
Route-wise encounter rate (per 100 km)
step1<-aggregate(effdat$dist ~ effdat$routeno, FUN=sum) # extracting sum of distances for routes
step2<-aggregate(effdat$no.sight ~ effdat$routeno, FUN=sum) # extracting sum of sightings for routes
step3<-step2$`effdat$no.sight`/step1$`effdat$dist` # encrate/km for each route = sighting/distance for each route, step dataframes can be removed from workspace after running. 
route_encrate<-round(step3*100,2) # enc rate/distance * 100 kms and rounded to two decimal points
rm(step1, step2, step3)
route_encrate <- data.frame(routenames,route_encrate)
route_encrate 


Species list

Refer to the second subheading of the results section

Species list and most commonly sighted

Eight toothed whale and one baleen whale were encountered. Note: Balaenoptera sp identified till genus, Tursiops sp (includes aduncus and truncatus)

species<- c("Baleen whale", "Blackfish", "Bottlenose dolphin", "False killer whale", "Pilot whale", "Pygmy killer whale", "Risso's dolphin", "Spinner dolphin","Spotted dolphin", "Striped dolphin", "Unidentified cetacean", "Unidentified dolphin", "Unidentified small dolphin", "Unidentified small whale", "Unidentified whale")
freq<-table(cetenvdat$species)
freq<-as.vector(freq)
species_counts<-data.frame(species, freq)
species_counts
rm(freq)

Season-wise species lists

Baleen; Baleen whale; Balaenoptera sp, Bottlenose; Tursiops sp, False Killer whale; Pseudorca crassidens, pilot; Globicephala machrorhynchus, pygmykiller; Feresa attenuata, risso; Grampus griseus, spinner; * Stenella longirostris, spotted; Stenella attenuata, striped; Stenella coruleoalba*, unidcet; unidentified cetacean, uniddolphin; unidentified dolphin, unidsmalldolphin; unidentified small dolphin, unidsmallwhale; unidentified small whale, unidwhale; unidentified whale.

Species sighted during the northeast monsoon season

# No: of species in ne monsoon season
table(idcetenvdatstatsne$species)

 bottlenose falsekiller       pilot       risso     spinner     spotted     striped 
         11           3          10           5           8           1           4 

Species sighted during the inter monsoon season

# No: of species in inter monsoon season
table(idcetenvdatstatsinter$species)

 bottlenose falsekiller       pilot       risso     spinner     spotted     striped 
          7           1           3           2          14           2           1 



Group sizes (Table 1)

Average group size, standard errors and minimum and maximum group sizes for each species

meangrpsize<-round(tapply(cetenvdat$grpsize,cetenvdat$species, mean, na.rm = TRUE),2)
meangrpsize<-as.vector(meangrpsize)
SEgrpsize<-round(tapply(cetenvdat$grpsize,cetenvdat$species, std.error, na.rm = TRUE),2)
SEgrpsize<-as.vector(SEgrpsize)
SEgrpsize<-round(SEgrpsize,1)
mingrpsize<-tapply(cetenvdat$grpsize,cetenvdat$species, min, na.rm = TRUE)
mingrpsize<-as.vector(mingrpsize)
maxgrpsize<-tapply(cetenvdat$grpsize,cetenvdat$species, max, na.rm = TRUE)
maxgrpsize<-as.vector(maxgrpsize)
species_grpsize<-data.frame(species, meangrpsize, SEgrpsize, mingrpsize, maxgrpsize)
species_grpsize<-species_grpsize[1:10,]
species_grpsize
rm(meangrpsize, SEgrpsize, mingrpsize, maxgrpsize)


Seafloor depths at sighting locations(Table 1)

Average depth for cetacean sightings in the entire study area (in metres)

paste("Average depth =", round(mean(cetenvdat$depth),2), "+/-" ,round(std.error(cetenvdat$depth),2))
[1] "Average depth = -1179.1 +/- 57.34"
paste("Minimum depth =", round(min(cetenvdat$depth),2))
[1] "Minimum depth = -2259"
paste("Maximum  depth =", round(max(cetenvdat$depth),2))
[1] "Maximum  depth = -5"

Average depths, standard errors and minimum and maximum group sizes for each species

meandepth<-round(tapply(cetenvdat$depth,cetenvdat$species, mean, na.rm = TRUE),2)
meandepth<-as.vector(meandepth)
SEdepth<-round(tapply(cetenvdat$depth,cetenvdat$species, std.error, na.rm = TRUE),2)
SEdepth<-as.vector(SEdepth)
mindepth<-tapply(cetenvdat$depth,cetenvdat$species, min, na.rm = TRUE)
mindepth<-as.vector(mindepth)
maxdepth<-tapply(cetenvdat$depth,cetenvdat$species, max, na.rm = TRUE)
maxdepth<-as.vector(maxdepth)
species_depth<-data.frame(species, meandepth, SEdepth, mindepth, maxdepth)
species_depth<-species_depth[1:10,]
species_depth
rm(maxdepth, meandepth, mindepth, SEdepth)



Cetacean sightings across season (Figure 4)

Cetacean sightings in northeast monsoon and inter monsoon season (only on survey effort)

# on effort sightings
table(oncetenvdat$seas)

inter    ne 
   27    51 

Effort (in km) across season

# Distances (effort) in each season
test<-subset(effdat, effdat$seas == "ne")
test2<-subset(effdat, !effdat$seas == "ne")
paste("Dist (km) in north east monsoon = ", sum(test$dist), "and Dist (km) in inter monsoon = ", sum(test2$dist))
[1] "Dist (km) in north east monsoon =  1720.9 and Dist (km) in inter monsoon =  2159.43"
# Proprotion of effort in each season
paste("Proportion traveled in north east monsoon = ", round((sum(test$dist)/toteffdist)*100,2), "%", "and Proportion traveled in inter monsoon = ", round((sum(test2$dist)/toteffdist)*100,2), "%")
[1] "Proportion traveled in north east monsoon =  44.35 % and Proportion traveled in inter monsoon =  55.65 %"
rm(test2)
rm(test)
Error in names(frame)[names(frame) == "x"] <- name : 
  names() applied to a non-vector
Barplot of observed and expected cetacean sightings across season (on survey effort)

Expected ratio’s correspond to distance traveled in a season

par(mar = c(7,6,4,4)) # sets the outer margin, order = left,bottom,top,right
#par(oma=c(0,0,0,5.8))
obsexpseas<-as.matrix(data.frame(inter=c(27, 43.41), ne=c(51,34.59))) # creating a matrix with observed and expected cetacean sighting values for intermonsoon (27 & 43.41) and ne monsoon (51,34.59)
onseasbarplot<-barplot(obsexpseas, beside=TRUE, 
                       width = 0.5, 
                       xlab = "Season", 
                       ylab="No: of cetacean sightings", 
                       names.arg=c("Inter monsoon (n=27)","Northeast monsoon (n=51)"), 
                       cex.lab = 0.9, 
                       cex.axis = 1, 
                       cex.names = 0.7, 
                       col=c("grey80", "grey20"), 
                       density = c(100,100), 
                       ylim = c(0,80), 
                       xlim=c(0.3,5))
legend("topleft",  c("Cetacean sightings on survey", "Expected cetacean sighting based on effort"), fill = c("grey80", "grey20"), cex=0.7, bty='n')

Non parameteric statistical test - Chi-Square Goodness of Fit

Chi square goodness of fit test between observed and expected values. Expected values are calculated as effort (distance traveled in each season). The test is significant and the null hypothesis is rejected.

Null hypotheis: Sightings are equally distributed across the two seasons as per the distance traveled during survey effort.

Alternate Hypothesis: Sightings are differentially distributed across the two seasons as per the distance traveled during survey effort.

# chi square test for on effort sightings in each season - goodness of fit according to distance traveled
effdatne<-subset(effdat, effdat$seas == 'ne') # effort dataset for only ne monsoon
effdatinter<-subset(effdat, effdat$seas == 'inter') # effort dataset for only inter monsoon
toteffdistne<-sum(effdatne$dist) # total distance traveled for northeast monsoon
toteffdistinter<-sum(effdatinter$dist) # total distance traveled for inter monsoon
# table(oncetenvdat$seas) # sightings per season on effort
# chisquare goodness of fit test for no: of sightings in each season to expected number of sightings based on distance travelled.
table(oncetenvdat$seas)

inter    ne 
   27    51 
seasallchisq<-chisq.test( (table(oncetenvdat$seas)),
             p=c((toteffdistinter/toteffdist),(toteffdistne/toteffdist)) ) 
seasallchisq

    Chi-squared test for given probabilities

data:  (table(oncetenvdat$seas))
X-squared = 13.984, df = 1, p-value = 0.0001844
paste('Expected value for inter monsoon period is', round(seasallchisq$expected[1],2))
[1] "Expected value for inter monsoon period is 43.41"
paste('Expected values for north east monsoon is', round(seasallchisq$expected[2],2))
[1] "Expected values for north east monsoon is 34.59"
# formula chi.sq.test ( no: of seas sightings 27 (inter), 51(ne), probability = 0.5565068 (inter), 0.4434932 (ne) )



Cetacean species across distance to nearest landmass

Boxplot of distance to nearest landmass for all species

Species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin

par(mar = c(5,5,3,3)) # sets the outer margin, order = left,bottom,top,right
neardistboxplot<-boxplot(idcetenvdatstats$neardistkm~idcetenvdatstats$species, xlab = "Species (n)", ylab="Nearest distance (km)", main = "Nearest distance to land for each species", names=c("T (18)","PC(4)","GM(13)","GG(7)","SL(22)","SA(3)","SC(5)"), cex.lab = 1.2, cex.main = 1, col="mediumaquamarine")

# species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin
Non parameteric statistical test - Kruskal-Wallis test

Null hypothesis: All species are uniformly distributed with respect to the nearest landmass on the ferry routes.

Alternate hypothesis: Species are differentially distributed with respect to the nearest landmass on the ferry routes.

Kruskal Wallis test was performed and test was significant. Therefore the Null hypothsis is rejected.

# for all sps (except baleen and pygmy)
kruskal.test(as.factor(idcetenvdatstats$neardistkm)~as.factor(idcetenvdatstats$species), data=idcetenvdatstats)

    Kruskal-Wallis rank sum test

data:  as.factor(idcetenvdatstats$neardistkm) by as.factor(idcetenvdatstats$species)
Kruskal-Wallis chi-squared = 16.962, df = 6, p-value = 0.009425

Medians for nearest and furthest species

paste ("Spinner dolphins median (in km) =", round(neardistboxplot$stats[3,5],2))
[1] "Spinner dolphins median (in km) = 1.82"
paste ("Spotted dolphins median (in km) =", round(neardistboxplot$stats[3,6],2))
[1] "Spotted dolphins median (in km) = 43.28"
paste ("Striped dolphins median (in km) =", round(neardistboxplot$stats[3,7],2))
[1] "Striped dolphins median (in km) = 24.72"


Cetacean species across seafloor slope gradient

Boxplot of slope distribution for all species

Species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin

par(mar = c(5,5,3,3)) # sets the outer margin, order = left,bottom,top,right
slopeboxplot<-boxplot(idcetenvdatstats$slope~idcetenvdatstats$species, xlab = "Species (n)", ylab="Slope", main = "Slope distribution for each species", names=c("T (18)","PC(4)","GM(13)", "GG(7)","SL(22)","SA(3)","SC(5)"), cex.lab = 1.2, cex.main = 1, col="mediumaquamarine")

# species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin
Non parameteric statistical test - Kruskal-Wallis test

Null hypothesis: All species are uniformly distributed across slope gradients along the ferry routes.

Alternate hypothesis: Species are differentially distributed across slope gradients along the ferry routes.

Kruskal Wallis test was performed and test was significant. Therefore the Null hypothsis is rejected.

# for common sps
kruskal.test(as.factor(idcetenvdatstats$slope)~as.factor(idcetenvdatstats$species), data=idcetenvdatstats)

    Kruskal-Wallis rank sum test

data:  as.factor(idcetenvdatstats$slope) by as.factor(idcetenvdatstats$species)
Kruskal-Wallis chi-squared = 15.542, df = 6, p-value = 0.01643

Medians for species in the flatest and steepest slopes

paste ("Spinner dolphins median steepest slope =", round(slopeboxplot$stats[3,5],2))
[1] "Spinner dolphins median steepest slope = 13.54"
paste ("Spotted dolphins median flat slope =", round(slopeboxplot$stats[3,6],2))
[1] "Spotted dolphins median flat slope = 1.51"
paste ("Striped dolphins median flat slope =", round(slopeboxplot$stats[3,7],2))
[1] "Striped dolphins median flat slope = 1.65"


Cetacean species across seafloor depth

Boxplot between cetacean species and seafloor depth

Species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin

# All species and depth
par(mar = c(5,5,3,3)) # sets the outer margin, order = left,bottom,top,right
depthboxplot<-boxplot(idcetenvdatstats$depth~idcetenvdatstats$species, xlab = "Species (n)", ylab="Seafloor depth(m)", main = "Seafloor depth distribution for each species", names=c("T (18)","PC(4)","GM(13)","GG(7)","SL(22)","SA(3)","SC(5)"), cex.lab = 1.2, cex.main = 1, col="mediumaquamarine") # xlab = x label, ylab = y label, cex.lab = label font size, cex.main = title font size, main = title

# species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin
Non parameteric statistical test species and depth - Kruskal-Wallis test

Null hypothesis: All species are uniformly distributed away across seafloor depths.

Alternate hypothesis: Species are differentially distributed across seafloor depths.

Kruskal Wallis test was performed and test was not significant. Therefore the Null hypothsis is not rejected.

# for all sps (except baleen and pygmy)
kruskal.test(as.factor(idcetenvdatstats$depth)~as.factor(idcetenvdatstats$species), data=idcetenvdatstats)

    Kruskal-Wallis rank sum test

data:  as.factor(idcetenvdatstats$depth) by as.factor(idcetenvdatstats$species)
Kruskal-Wallis chi-squared = 10.912, df = 6, p-value = 0.09114


Plot of cetacean species across nearest distance to land and slope gradient (Figure 5)

### Plotting significant results together 
##### Slope and nearest distance to land across species
# collapse the data frame
x<-melt(data = idcetenvdatstats, id.vars = "species", measure.vars = c("slope", "neardistkm"))
require(ggplot2)
slopedistplot<-ggplot(data = x, aes(x=species, y=value)) + 
  # specify boxplot and variable to be filled in this case 'slope' & 'neardist'
  geom_boxplot(aes(fill=variable)) +
  # name x and y axis
  xlab("Species (n)") + ylab("Slope (percent)") +
  # add a second y axis on the right and specify values w.r.t to x axis (*1x)
  scale_y_continuous(sec.axis = sec_axis(~.*1, name = "Distance to nearest land (km)")) +
  
  theme(text = element_text(size=12), axis.text.x = element_text(angle=90)) +
  
  theme(plot.margin=unit(c(1,0.8,0.5,0.5),"cm")) +
  
  # change the legend labels
  scale_fill_discrete(name="", labels=c("Slope", "Distance to nearest land")) +
  # change the legend position (0,0) = bottom left & (1,1) = top right
  theme(legend.position = c(0.16, 0.9)) +
  # size of legend text
  theme(legend.text=element_text(size=rel(0.5))) +
  # remove the legend background
  theme(legend.background = element_blank()) +
  # adding sample sizes to the plot
  scale_x_discrete(labels=c("bottlenose" = "bottlenose dolphin (18)", 
                              "falsekiller" = "false killer whale(4)", 
                                "pilot" = "pilot whale(13)",
                            "risso" = "Risso's dolphin (7)",
                            "spinner" ="spinner dolphin(22)",
                            "spotted" ="spotted dolphin(3)",
                            "striped" = "striped dolphin(5)")) 
#stat_summary(fun.y=mean,shape=1,col='red',geom='point')
slopedistplot

rm(x)



Route Information

step1<-table(effdat$route) # No. of times a route was traversed
step1<-step1[c(1,8,3,9,5,6,10,4,7,11,2,12)] # reorder to match corresponding columns in route_encrate dataframe
step1<-as.vector(step1) # making into a vector
step2<-aggregate(effdat$dist ~ effdat$routeno, FUN=sum) # extracting sum of distances for routes
step3<-aggregate(effdat$no.sight ~ effdat$routeno, FUN=sum) # extracting sum of sightings for routes
route_info<-data.frame(route_encrate, step1, step2, step3) # Making a data frame with route no, route name, no: of times a route was tranversed, route distance, route sightings and route enc rate
route_info<-route_info[c(1,2,4,6,8,3)] # reorder
colnames(route_info)<-c("Route No.", "Route name", "No. of times", "Distance (km)", "No. of sightings", "Encounter rate")
route_info
rm(step1, step2, step3)
#write.csv(route_info, file = "/Volumes/GoogleDrive/My Drive/Laksh I project/design & data/Data Entry/Laksh R files/Panicker_2019_LD_analysis_manuscript_link/Sightings and effort for each route.csv")



THE END

---
title: "Cetacean distribution and diversity in Lakshadweep Islands, India using a platform of opportunity: October 2015 to April 2016 - Data Analysis"
output:
  html_notebook: 
    code_folding: hide
editor_options: 
  chunk_output_type: console
---

--------------------------------------------------------------------------

#### Author information:

Date: "9/22/2019"

Author (analysis): Divya Panicker

Study: Cetacean distribution and diversity in Lakshadweep Islands: October 2015 to April 2016

Funding: Rufford Small Grants for Nature Conservation

Institutions: University of Washington, National Centre for Biological Sciences, Centre for Wildlife Studies

Email for correspondence: dpanic@uw.edu

Citation: Panicker, D., Sutaria, D., Kumar, A., Stafford, K. (2019) 'Cetacean distribution and diversity in Lakshadweep Islands, India using platforms of opportunity: Oct 2015 to Apr 2016'. ResearchWorks. University of Washington.

--------------------------------------------------------------------------

#### Corresponding datasets:

Dataset *'Panicker_et_al_2019_surveyeffortdataset.csv'* contains the effort for all the surveys. Read into the program as *'effdat'*

Dataset *'Panicker_et_al_2019_cetaceandataset.csv'* is the sightings dataset which include both on survey effort and opportunistic sightings. The sightings data set also include variables such as depth at boat location, slope, aspect and nearest distance to land. Depth was obtained from GeoMapApp (Ryan et al. 2009). Slope, aspect and nearest distance to land was calculated using ArcGIS. The file is read into the program as *'cetenvdat'*.

Dataset *'Panicker_et_al_2019_beaufortdataset.csv'* is the dataset with corresponding beaufort states. This sheet has the distance covered in each beaufort state per season, per route, per day. Read into the program as *'effbeaudist'*

--------------------------------------------------------------------------

#### Names of subsets from data csv files within this notebook:

1. *'idcetenvdat'* is the subset for identified sightings including opportunistic sightings.
2. *'oncetenvdat'* is the subset for on survey effort sightings.
3. *'idcetenvdatstats'* is the subset for identified sightings with the exception of baleen whale and pygmy whale sightings as n=1. This file is used for all statistical analysis. 
4. *'comidcetenvdatstats'* is the sightings data for identified sightings of three of the most commonly sighted species (spinnner, bottlenose and pilot). 
5. *'routenames'* is a dataset created for a list of survey routes and corresponding route numbers.

_**Note**_: If subsets have inter or ne at the end, it corresponds to inter-monsoon or northeast monsoon seasons

--------------------------------------------------------------------------

#### R Setup
##### Working directory for R studio, Reading Data files and Data preparation
Run line by line to avoid masking 
```{r setup}
#knitr::opts_chunk$set(echo = FALSE)
rm(list = ls()) # clears workspace

# Working directory on your computer! CHANGE AS YOU NEED
# set the working directory to your home directory where data files are stored
setwd("/Volumes/GoogleDrive/My Drive/Laksh I project/design & data/Data Entry/Laksh R files/Panicker_2019_Cetaceans_Lakshadweep/") 

# Required Packages and libraries
library(lubridate)
library(gdata)
library(plotrix)
library(psych)
library(devtools)
library(reshape2)
library(ggplot2)

# Reading in data files
effdat<-read.csv("Panicker_et_al_2019_surveyeffortdataset.csv", header = T, stringsAsFactors = F) # effort dataset
cetenvdat<-read.csv("Panicker_et_al_2019_cetaceandataset.csv", header = T, stringsAsFactors = F) # cetacean dataset
effbeaudist<-read.csv("Panicker_et_al_2019_beaufortdataset.csv", header = T, stringsAsFactors = F) # beaufort dataset

# Data preparation for analysis

# Creating a dataset of routenames and corresponding route numbers
routenames<-data.frame((c(1,2,3,4,5,6,7,8,9,10,11,12)),(c("AGT-AKC","KVT-AGT","BTR-AKC","KVT-AKC","CHT-AKC","KLT-AKC","KVT-AND","BTR-KVT","KLT-CHT","KVT-KLP","AMN-KDT","KVT-KLT")))
colnames(routenames) <- c("routeno", "route")

# Creating a subset for the identified species
idcetenvdat<-subset(cetenvdat, !cetenvdat$species == "blackfish" & !cetenvdat$species == "uniddolphin" & !cetenvdat$species == "unidcet" & !cetenvdat$species=="unidsmallwhale" & !cetenvdat$species == "unidwhale" & !cetenvdat$species == "unidsmalldolphin" )

# Creating a subset for on survey effort sightings
oncetenvdat<-subset(cetenvdat, !cetenvdat$eff == "off")

# Removing baleen and pgymykiller whale from identified species as sample size is 1 for these species (This step needed for stats analysis)
idcetenvdatstats<-drop.levels(subset(idcetenvdat,!idcetenvdat$species == "baleen" & !idcetenvdat$species == "pygmykiller"))

# Creating a subset for identified commonly sighted species (n>10 namely, spinner dolphin, bottlenose dolphin and pilot whale)
comidcetenvdatstats<-rbind(subset(idcetenvdat, idcetenvdat$species == "spinner"), subset(idcetenvdat, idcetenvdat$species == "bottlenose"), subset(idcetenvdat, idcetenvdat$species == "pilot"))

# Creating a subset for identified species in northeast monsoon season
idcetenvdatstatsne<-subset(idcetenvdatstats, idcetenvdatstats$seas == 'ne')

# Creating a subset for identified species in northeast monsoon season for commonly sighted species (n>10 namely, spinner dolphin, bottlenose dolphin and pilot whale)
comidcetenvdatstatsne<-subset(comidcetenvdatstats, comidcetenvdatstats$seas == 'ne')

# Creating a subset for identified species in inter monsoon season
idcetenvdatstatsinter<-subset(idcetenvdatstats, idcetenvdatstats$seas == 'inter')

# Creating a subset for identified species in inter monsoon season for commonly sighted species (n>10 namely, spinner dolphin, bottlenose dolphin and pilot whale)
comidcetenvdatstatsinter<-subset(comidcetenvdatstats, comidcetenvdatstats$seas == 'inter')

print ("SETUP DONE AND DATA READ INTO R")
```

--------------------------------------------------------------------------

#### Table of Contents

[Structure of data] 

- *[Effort dataset (effdat)]*
- *[Cetacean dataset (cetenvdat)]*

[No. of sightings, survey effort and beaufort states] 

- *[Survey effort]*
- *[No. of sightings]*
- *[Beaufort states recorded during survey effort]*

[Encounter rates] 

- *[Overall Encounter rate]*
- *[Route-wise encounter rate (per 100 km)]*

[Species list] 

- *[Species list and most commonly sighted]*
- *[Season-wise species lists]*

[Group sizes (Table 1)] 

- *[Average group size, standard errors and minimum and maximum group sizes for each species]*

[Seafloor depths at sighting locations(Table 1)] 

- *[Average depths, standard errors and minimum and maximum group sizes for each species]*

[Cetacean sightings across season (Figure 4)]

- *[Barplot of observed and expected cetacean sightings across season (on survey effort)]*
- *[Non parameteric statistical test - Chi-Square Goodness of Fit]*

[Cetacean species across distance to nearest landmass]

- *[Boxplot of distance to nearest landmass for all species]*
- *[Non parameteric statistical test - Kruskal-Wallis test]*

[Cetacean species across seafloor slope gradient] 

- *[Boxplot of slope distribution for all species]*
- *[Non parameteric statistical test - Kruskal-Wallis test]*

[Cetacean species across seafloor depth] 

- *[Boxplot between cetacean species and seafloor depth]*
- *[Non parameteric statistical test species and depth - Kruskal-Wallis test]*

[Plot of cetacean species across nearest distance to land and slope gradient (Figure 5)] 

<br>

--------------------------------------------------------------------------

### Structure of data

_**Note**_: Column header details are given in ReadMe file. 

#### Effort dataset (effdat)
```{r structure effort dataset}
# structure of effort
effdat$date<-as.Date(effdat$date, "%m/%d/%y") # setting data format
str(effdat)
```

#### Cetacean dataset (cetenvdat)
```{r structure sightings dataset}
str(cetenvdat)
```

<br>

--------------------------------------------------------------------------

### No. of sightings, survey effort and beaufort states

Refer to subheading of results section of manuscript

#### Survey effort
```{r Total survey effort details}
# Total hours spent on survey effort
toteffhours<-sum(effdat$hr.dec)
paste ("Total hours spent on survey effort =", toteffhours)

# Total distance covered during survey effort
toteffdist<-sum(effdat$dist)
paste("Total distance (km) covered during survey effort = ", toteffdist)

# Number of surveys hours each day
paste("Average number of hours on survey effort per day = ", round(mean(effdat$hr.dec),2), "+/-", round(std.error(effdat$hr.dec),1))
paste("Min hours on survey effort in a day = ", round(min(effdat$hr.dec),1))
paste("Max hours on survey effort in a day = ", round(max(effdat$hr.dec),1))
```

#### No. of sightings

```{r No. of sightings}
# note: Off effort and opporunistic is the same!
# Total number of cetacean group sightings (including opportunistic sightings)
paste ("Total no of cetacean group sightings (on and off survey effort) =", nrow(cetenvdat))

# No: of cetacean group sightings (on survey effort)
paste ("No. of cetacean group sightings on survey effort =", length(which(cetenvdat == "on")))

# No: of cetacean group sightings (off survey effort/opportunistic)
paste ("No. of cetacean group sightings off survey effort =", length(which(cetenvdat == "off")))

# Total number of identified cetacean groups (on and off survey effort)
paste ("Total no of identified cetacean groups (on and off survey effort) =", nrow(idcetenvdat))

# Number of identified cetacean groups (on survey effort)
paste ("No. of identified cetacean groups on survey effort =", length(which(idcetenvdat == "on")))
```

#### Beaufort states recorded during survey effort

Table of distance (km) for each beaufort state
```{r Beaufort in Km}
beau.table<-tapply((effbeaudist$Distance),(effbeaudist$Beaufort),FUN=sum)
beau.table
```

Percentages of the distances with respect to the total distance traveled in a particular beaufort state (e.g. 80.63% distance traveled in 0-3 state, note that this includes not just 0 but 0-3, 15.5% in beaufort 4, 2.96% in beaufort 5 and 0.89% in beaufort 6)
```{r Beaufort Percentages}
beau.one.two.three.percent<-round(((beau.table[1]+beau.table[2]+beau.table[3]+beau.table[4])/sum(beau.table))*100,1)
beau.one.two.three.percent
beau.four.percent<-round(((beau.table[5])/sum(beau.table))*100,1)
beau.four.percent
beau.five.percent<-round(((beau.table[6])/sum(beau.table))*100,1)
beau.five.percent
beau.six.percent<-round(((beau.table[7])/sum(beau.table))*100,1)
beau.six.percent
rm(beau.table, beau.one.two.three.percent, beau.four.percent, beau.five.percent, beau.six.percent)
```

<br>

--------------------------------------------------------------------------

### Encounter rates

##### Overall Encounter rate
```{r enc rate/dist tot}
totencratedist<-round (((nrow(oncetenvdat)/toteffdist)*100),2)
paste("Overall encounter rate (per 100km) of cetacean groups for the entire study area =", totencratedist)
```

##### Route-wise encounter rate (per 100 km)

```{r enc rate/100km routes}
step1<-aggregate(effdat$dist ~ effdat$routeno, FUN=sum) # extracting sum of distances for routes
step2<-aggregate(effdat$no.sight ~ effdat$routeno, FUN=sum) # extracting sum of sightings for routes
step3<-step2$`effdat$no.sight`/step1$`effdat$dist` # encrate/km for each route = sighting/distance for each route, step dataframes can be removed from workspace after running. 
route_encrate<-round(step3*100,2) # enc rate/distance * 100 kms and rounded to two decimal points
rm(step1, step2, step3)
route_encrate <- data.frame(routenames,route_encrate)
route_encrate 
```

<br>
--------------------------------------------------------------------------

### Species list

Refer to the second subheading of the results section

#### Species list and most commonly sighted

Eight toothed whale and one baleen whale were encountered.
*Note*: *Balaenoptera* sp identified till genus, *Tursiops* sp (includes aduncus and truncatus)
```{r diversity}

species<- c("Baleen whale", "Blackfish", "Bottlenose dolphin", "False killer whale", "Pilot whale", "Pygmy killer whale", "Risso's dolphin", "Spinner dolphin","Spotted dolphin", "Striped dolphin", "Unidentified cetacean", "Unidentified dolphin", "Unidentified small dolphin", "Unidentified small whale", "Unidentified whale")

freq<-table(cetenvdat$species)
freq<-as.vector(freq)
species_counts<-data.frame(species, freq)
species_counts
rm(freq)
```

#### Season-wise species lists

Baleen; Baleen whale; *Balaenoptera sp*, Bottlenose; *Tursiops sp*, False Killer whale; *Pseudorca crassidens*, pilot; *Globicephala machrorhynchus*, pygmykiller; *Feresa attenuata*, risso; *Grampus griseus*, spinner; * Stenella longirostris*, spotted; *Stenella attenuata*, striped; *Stenella coruleoalba*, unidcet; unidentified cetacean, uniddolphin; unidentified dolphin, unidsmalldolphin; unidentified small dolphin, unidsmallwhale; unidentified small whale, unidwhale; unidentified whale. 

Species sighted during the northeast monsoon season
```{r sps diversity ne}
# No: of species in ne monsoon season
table(idcetenvdatstatsne$species)
```

Species sighted during the inter monsoon season
```{r sps diversity inter}
# No: of species in inter monsoon season
table(idcetenvdatstatsinter$species)
```

<br>

--------------------------------------------------------------------------

### Group sizes (Table 1)

#### Average group size, standard errors and minimum and maximum group sizes for each species

```{r mean group size - species wise}
meangrpsize<-round(tapply(cetenvdat$grpsize,cetenvdat$species, mean, na.rm = TRUE),2)
meangrpsize<-as.vector(meangrpsize)
SEgrpsize<-round(tapply(cetenvdat$grpsize,cetenvdat$species, std.error, na.rm = TRUE),2)
SEgrpsize<-as.vector(SEgrpsize)
SEgrpsize<-round(SEgrpsize,1)
mingrpsize<-tapply(cetenvdat$grpsize,cetenvdat$species, min, na.rm = TRUE)
mingrpsize<-as.vector(mingrpsize)
maxgrpsize<-tapply(cetenvdat$grpsize,cetenvdat$species, max, na.rm = TRUE)
maxgrpsize<-as.vector(maxgrpsize)
species_grpsize<-data.frame(species, meangrpsize, SEgrpsize, mingrpsize, maxgrpsize)
species_grpsize<-species_grpsize[1:10,]
species_grpsize
rm(meangrpsize, SEgrpsize, mingrpsize, maxgrpsize)
```


<br>
--------------------------------------------------------------------------

### Seafloor depths at sighting locations(Table 1)

#### Average depth for cetacean sightings in the entire study area (in metres)

```{r depth total}
paste("Average depth =", round(mean(cetenvdat$depth),2), "+/-" ,round(std.error(cetenvdat$depth),2))
paste("Minimum depth =", round(min(cetenvdat$depth),2))
paste("Maximum  depth =", round(max(cetenvdat$depth),2))
```

#### Average depths, standard errors and minimum and maximum group sizes for each species
```{r depth species mean}
meandepth<-round(tapply(cetenvdat$depth,cetenvdat$species, mean, na.rm = TRUE),2)
meandepth<-as.vector(meandepth)
SEdepth<-round(tapply(cetenvdat$depth,cetenvdat$species, std.error, na.rm = TRUE),2)
SEdepth<-as.vector(SEdepth)
mindepth<-tapply(cetenvdat$depth,cetenvdat$species, min, na.rm = TRUE)
mindepth<-as.vector(mindepth)
maxdepth<-tapply(cetenvdat$depth,cetenvdat$species, max, na.rm = TRUE)
maxdepth<-as.vector(maxdepth)
species_depth<-data.frame(species, meandepth, SEdepth, mindepth, maxdepth)
species_depth<-species_depth[1:10,]
species_depth
rm(maxdepth, meandepth, mindepth, SEdepth)
```

<br>

--------------------------------------------------------------------------

### Cetacean sightings across season (Figure 4)

Cetacean sightings in northeast monsoon and inter monsoon season (only on survey effort)

```{r season all sight on effort}
# on effort sightings
table(oncetenvdat$seas)
```

Effort (in km) across season

```{r Effort across season}

# Distances (effort) in each season
test<-subset(effdat, effdat$seas == "ne")
test2<-subset(effdat, !effdat$seas == "ne")
paste("Dist (km) in north east monsoon = ", sum(test$dist), "and Dist (km) in inter monsoon = ", sum(test2$dist))

# Proprotion of effort in each season
paste("Proportion traveled in north east monsoon = ", round((sum(test$dist)/toteffdist)*100,2), "%", "and Proportion traveled in inter monsoon = ", round((sum(test2$dist)/toteffdist)*100,2), "%")

rm(test2)
rm(test)
```


##### Barplot of observed and expected cetacean sightings across season (on survey effort)

Expected ratio's correspond to distance traveled in a season

```{r plot all sight on effort}
par(mar = c(7,6,4,4)) # sets the outer margin, order = left,bottom,top,right
#par(oma=c(0,0,0,5.8))
obsexpseas<-as.matrix(data.frame(inter=c(27, 43.41), ne=c(51,34.59))) # creating a matrix with observed and expected cetacean sighting values for intermonsoon (27 & 43.41) and ne monsoon (51,34.59)
onseasbarplot<-barplot(obsexpseas, beside=TRUE, 
                       width = 0.5, 
                       xlab = "Season", 
                       ylab="No: of cetacean sightings", 
                       names.arg=c("Inter monsoon (n=27)","Northeast monsoon (n=51)"), 
                       cex.lab = 0.9, 
                       cex.axis = 1, 
                       cex.names = 0.7, 
                       col=c("grey80", "grey20"), 
                       density = c(100,100), 
                       ylim = c(0,80), 
                       xlim=c(0.3,5))
legend("topleft",  c("Cetacean sightings on survey", "Expected cetacean sighting based on effort"), fill = c("grey80", "grey20"), cex=0.7, bty='n')
```

##### Non parameteric statistical test - Chi-Square Goodness of Fit

Chi square goodness of fit test between observed and expected values. Expected values are calculated as effort (distance traveled in each season). The test is significant and the null hypothesis is rejected. 

Null hypotheis: Sightings are equally distributed across the two seasons as per the distance traveled during survey effort.

Alternate Hypothesis: Sightings are differentially distributed across the two seasons as per the distance traveled during survey effort.

```{r chi square goodness on effort all sightings}
# chi square test for on effort sightings in each season - goodness of fit according to distance traveled
effdatne<-subset(effdat, effdat$seas == 'ne') # effort dataset for only ne monsoon
effdatinter<-subset(effdat, effdat$seas == 'inter') # effort dataset for only inter monsoon
toteffdistne<-sum(effdatne$dist) # total distance traveled for northeast monsoon
toteffdistinter<-sum(effdatinter$dist) # total distance traveled for inter monsoon
# table(oncetenvdat$seas) # sightings per season on effort
# chisquare goodness of fit test for no: of sightings in each season to expected number of sightings based on distance travelled.
table(oncetenvdat$seas)
seasallchisq<-chisq.test( (table(oncetenvdat$seas)),
             p=c((toteffdistinter/toteffdist),(toteffdistne/toteffdist)) ) 
seasallchisq
paste('Expected value for inter monsoon period is', round(seasallchisq$expected[1],2))
paste('Expected values for north east monsoon is', round(seasallchisq$expected[2],2))
# formula chi.sq.test ( no: of seas sightings 27 (inter), 51(ne), probability = 0.5565068 (inter), 0.4434932 (ne) )

```

<br>

--------------------------------------------------------------------------

### Cetacean species across distance to nearest landmass

##### Boxplot of distance to nearest landmass for all species

Species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin
```{r nearest distance}
par(mar = c(5,5,3,3)) # sets the outer margin, order = left,bottom,top,right
neardistboxplot<-boxplot(idcetenvdatstats$neardistkm~idcetenvdatstats$species, xlab = "Species (n)", ylab="Nearest distance (km)", main = "Nearest distance to land for each species", names=c("T (18)","PC(4)","GM(13)","GG(7)","SL(22)","SA(3)","SC(5)"), cex.lab = 1.2, cex.main = 1, col="mediumaquamarine")
# species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin
```

##### Non parameteric statistical test - Kruskal-Wallis test

Null hypothesis: All species are uniformly distributed with respect to the nearest landmass on the ferry routes.

Alternate hypothesis: Species are differentially distributed with respect to the nearest landmass on the ferry routes. 

Kruskal Wallis test was performed and test was significant. Therefore the Null hypothsis is rejected.

```{r non para neardist all sps}

# for all sps (except baleen and pygmy)
kruskal.test(as.factor(idcetenvdatstats$neardistkm)~as.factor(idcetenvdatstats$species), data=idcetenvdatstats)
```


Medians for nearest and furthest species

```{r Medians for nearest and furthest species}
paste ("Spinner dolphins median (in km) =", round(neardistboxplot$stats[3,5],2))
paste ("Spotted dolphins median (in km) =", round(neardistboxplot$stats[3,6],2))
paste ("Striped dolphins median (in km) =", round(neardistboxplot$stats[3,7],2))
```

<br>
--------------------------------------------------------------------------

### Cetacean species across seafloor slope gradient

##### Boxplot of slope distribution for all species

Species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin

```{r slope plot all species}
par(mar = c(5,5,3,3)) # sets the outer margin, order = left,bottom,top,right
slopeboxplot<-boxplot(idcetenvdatstats$slope~idcetenvdatstats$species, xlab = "Species (n)", ylab="Slope", main = "Slope distribution for each species", names=c("T (18)","PC(4)","GM(13)", "GG(7)","SL(22)","SA(3)","SC(5)"), cex.lab = 1.2, cex.main = 1, col="mediumaquamarine")
# species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin
```

##### Non parameteric statistical test - Kruskal-Wallis test

Null hypothesis: All species are uniformly distributed across slope gradients along the ferry routes.

Alternate hypothesis: Species are differentially distributed across slope gradients along the ferry routes. 

Kruskal Wallis test was performed and test was significant. Therefore the Null hypothsis is rejected.

```{r non para slope all species}
# for common sps
kruskal.test(as.factor(idcetenvdatstats$slope)~as.factor(idcetenvdatstats$species), data=idcetenvdatstats)

```

Medians for species in the flatest and steepest slopes

```{r Medians for flat and steep slope species}
paste ("Spinner dolphins median steepest slope =", round(slopeboxplot$stats[3,5],2))
paste ("Spotted dolphins median flat slope =", round(slopeboxplot$stats[3,6],2))
paste ("Striped dolphins median flat slope =", round(slopeboxplot$stats[3,7],2))
```

<br>
--------------------------------------------------------------------------

### Cetacean species across seafloor depth

##### Boxplot between cetacean species and seafloor depth 

Species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin

```{r depth plot all species}

# All species and depth
par(mar = c(5,5,3,3)) # sets the outer margin, order = left,bottom,top,right
depthboxplot<-boxplot(idcetenvdatstats$depth~idcetenvdatstats$species, xlab = "Species (n)", ylab="Seafloor depth(m)", main = "Seafloor depth distribution for each species", names=c("T (18)","PC(4)","GM(13)","GG(7)","SL(22)","SA(3)","SC(5)"), cex.lab = 1.2, cex.main = 1, col="mediumaquamarine") # xlab = x label, ylab = y label, cex.lab = label font size, cex.main = title font size, main = title
# species codes; bal = Baleen whale, T = Bottlenose dolphin, PC = False killer whale, GM = Short finned pilot whale, FA = Pgymy killer whale, GG = Rissos dolphin, SL = Spinner dolphin, SA = Spotted dolphin, SC = Striped dolphin
```

##### Non parameteric statistical test species and depth - Kruskal-Wallis test

Null hypothesis: All species are uniformly distributed away across seafloor depths.

Alternate hypothesis: Species are differentially distributed across seafloor depths. 

Kruskal Wallis test was performed and test was not significant. Therefore the Null hypothsis is not rejected.

```{r depth non para all species}

# for all sps (except baleen and pygmy)
kruskal.test(as.factor(idcetenvdatstats$depth)~as.factor(idcetenvdatstats$species), data=idcetenvdatstats)
```

<br>
--------------------------------------------------------------------------

### Plot of cetacean species across nearest distance to land and slope gradient (Figure 5)

```{r Slope and nearest distance to land across species}
### Plotting significant results together 
##### Slope and nearest distance to land across species

# collapse the data frame
x<-melt(data = idcetenvdatstats, id.vars = "species", measure.vars = c("slope", "neardistkm"))
require(ggplot2)
slopedistplot<-ggplot(data = x, aes(x=species, y=value)) + 
  # specify boxplot and variable to be filled in this case 'slope' & 'neardist'
  geom_boxplot(aes(fill=variable)) +
  # name x and y axis
  xlab("Species (n)") + ylab("Slope (percent)") +
  # add a second y axis on the right and specify values w.r.t to x axis (*1x)
  scale_y_continuous(sec.axis = sec_axis(~.*1, name = "Distance to nearest land (km)")) +
  
  theme(text = element_text(size=12), axis.text.x = element_text(angle=90)) +
  
  theme(plot.margin=unit(c(1,0.8,0.5,0.5),"cm")) +
  
  # change the legend labels
  scale_fill_discrete(name="", labels=c("Slope", "Distance to nearest land")) +
  # change the legend position (0,0) = bottom left & (1,1) = top right
  theme(legend.position = c(0.16, 0.9)) +
  # size of legend text
  theme(legend.text=element_text(size=rel(0.5))) +
  # remove the legend background
  theme(legend.background = element_blank()) +
  # adding sample sizes to the plot
  scale_x_discrete(labels=c("bottlenose" = "bottlenose dolphin (18)", 
                              "falsekiller" = "false killer whale(4)", 
                                "pilot" = "pilot whale(13)",
                            "risso" = "Risso's dolphin (7)",
                            "spinner" ="spinner dolphin(22)",
                            "spotted" ="spotted dolphin(3)",
                            "striped" = "striped dolphin(5)")) 
#stat_summary(fun.y=mean,shape=1,col='red',geom='point')

slopedistplot
rm(x)


```


<br>

--------------------------------------------------------------------------

### Route Information
```{r Route information}
step1<-table(effdat$route) # No. of times a route was traversed
step1<-step1[c(1,8,3,9,5,6,10,4,7,11,2,12)] # reorder to match corresponding columns in route_encrate dataframe
step1<-as.vector(step1) # making into a vector
step2<-aggregate(effdat$dist ~ effdat$routeno, FUN=sum) # extracting sum of distances for routes
step3<-aggregate(effdat$no.sight ~ effdat$routeno, FUN=sum) # extracting sum of sightings for routes
route_info<-data.frame(route_encrate, step1, step2, step3) # Making a data frame with route no, route name, no: of times a route was tranversed, route distance, route sightings and route enc rate
route_info<-route_info[c(1,2,4,6,8,3)] # reorder
colnames(route_info)<-c("Route No.", "Route name", "No. of times", "Distance (km)", "No. of sightings", "Encounter rate")
route_info
rm(step1, step2, step3)
#write.csv(route_info, file = "/Volumes/GoogleDrive/My Drive/Laksh I project/design & data/Data Entry/Laksh R files/Panicker_2019_LD_analysis_manuscript_link/Sightings and effort for each route.csv")
```

<br>

--------------------------------------------------------------------------

### THE END







