Install or update the latest version of the packages `mgcv`

and `itsadug`

:

`install.packages(c("mgcv", "itsadug"))`

Load libraries:

```
library(mgcv)
library(itsadug)
# indicate that messages of itsadug must be printed to the command line
infoMessages("on")
```

Check R version and package versions:

`R.version.string`

`## [1] "R version 3.2.1 (2015-06-18)"`

`packageVersion("mgcv")`

`## [1] '1.8.6'`

`packageVersion("itsadug")`

`## [1] '1.0.1'`

If these versions do not agree with the ones included in the Lab Assignment file, please let us know.

Download the source file `nonlinear_0.1.tar.gz`

and install the packages by the following commands in R:

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/packages/nonlinear_0.1.tar.gz", "nonlinear_0.1.tar.gz")
install.packages("nonlinear_0.1.tar.gz", type="source", repos=NULL)
```

If the package is installed succesfully, you could remove the source file (delete it).

Run the demo about random effects as follows:

```
library(nonlinear)
demoRandom()
```

Download the data `pupilsize_S1.rds`

and load the data in R:

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/LSA2015/Data/pupilsize_S1.rds", "pupilsize_S1.rds", mode="wb")
dat <- readRDS('pupilsize_S1.rds')
```

Inspect the data frame:

```
head(dat)
str(dat)
```

The data set is a subset of the data, with only 10 participants included in order to run some more complex models.

The column `Pupil`

contains the raw pupil dilation measure (blinks are removed). The data is sampled with 500 Hz, but downsampled to 50 Hz (i.e., a measurement each 20 ms).

Plot the grandaverage of the data. You can use your own code, or complete the code in the R block below.

```
# First we group the timestamps in bins of 100 ms:
dat$Timebin <- timeBins(dat$Time, 100)
# Calculate the grand averages with tapply or aggregate for the different timebins.
# Complete the code:
# - replace the dots in line A with the column of dat
# that contains the data to be averaged.
# - replace the dots in line B with the column(s) of dat
# thatgroup the data.
avg <- aggregate( ... , # line A
list( ... ), # line B
mean, na.rm=TRUE)
# Plot the averages:
# Complete the code:
# - replace the dots in line A with the values to plot on the x-axis
# - replace the dots in line B with the values to plot on the y-axis
plot( ... , # line A
... , # line B
type='l', main="Grand average",
xlab='Time (ms)', ylab='Pupil size',
ylim=c(1000,1200),
bty='n')
# Note: see help(par) for info about the different parameters
```

What does the

`Timebin`

column represent? In case you want to see the code for calculating the timebins type`timeBins`

in the command line. Why would it sometimes be useful to plot`Timebin`

rather than raw timestamps?- Inspect the plot. Why would there be such a sudden increase in dilation after 3000 ms? Hint: type
`tapply(dat$Time, list(dat$Event), max)`

in the commandline.

Check the numer of rows in the data. With more than 10.000 data points you should use the function `bam()`

rather than `gam()`

. Also when you’re planning to include many random smooths and other complex model terms, `bam()`

is more efficient.

`nrow(dat)`

Run the model with a nonlinear effect for `Time`

. Replace the dots `...`

with a nonlinear smooth for `Time`

.

`m1 <- bam( Pupil ~ ... , data=dat )`

Inspect the summary (`summary(m1)`

).

What does the intercept represent?

What does the summary of smooth terms tell about the wigglyness of the smooth? (See the column

`edf`

)- What does it mean that the smooth term is significant?

Use the following code to visualize the smooth.

```
par(mfrow=c(1,2), cex=1.1)
# Partial effect plot:
plot(m1, select=1, shade=TRUE)
abline(h=0)
# Fitted values:
plot_smooth(m1, view="Time")
```

What is the difference between the two plots? What do the plots represent?

Participants differ considerably in pupil dilation size. Visualize the individual patterns. You could use the code below, or use another plotting method.

```
boxplot(Pupil ~ Subject, data=dat)
# or alternatively:
library(lattice)
xyplot(Pupil ~ Time | Subject, pch=16, cex=.5, data = dat)
```

To take into account differences between participants we could add a random intercept for each subject to the model. Replace the dots `...`

with the random intercepts for `Subject`

and random intercepts for `Item`

.

`m2 <- bam( Pupil ~ s(Time) + ... , data=dat )`

Inspect the summary (`summary(m2)`

).

What does the intercept represent?

- What does a random intercept terms represent?

Use the following code to visualize the smooth.

```
par(mfrow=c(1,3), cex=1.1)
# Partial effect plot:
plot(m2, select=1, shade=TRUE)
abline(h=0)
# Fitted values:
plot_smooth(m2, view="Time")
```

What do the plots represent? (Also pay attention to the summary in the command window for the

`plot_smooth`

function.)- Run the
`plot_smooth`

function again, but now add`rm.ranef=TRUE`

as argument:`plot_smooth(m2, view="Time", rm.ranef=TRUE)`

. Why did the plot change so much?

We now inspect the random effects. Run the following code.

```
par(mfrow=c(1,3), cex=1.1)
# Partial effect plot:
plot(m2, select=2)
# get the values for individual subjects:
rand <- get_random(m2)
# note that get_random returns a list with
# the random adjustments for each random
# intercept or slope:
rand
# similar plot with subj names added:
xy <- qqnorm(rand[[1]], main='s(Subject)',
col='steelblue', pch=16, bty='n') # <- some additional arguments that could be left out
qqline(rand[[1]])
text( xy$x, rand[[1]], labels=names( rand[[1]] ),
pos=3, col="steelblue", xpd=TRUE) # <- some additional arguments that could be left out
# same for items:
xy <- qqnorm(rand[[2]], main='s(Item)',
col='steelblue', pch=16, bty='n') # <- some additional arguments that could be left out
qqline(rand[[2]])
text( xy$x, rand[[2]], labels=names( rand[[2]] ),
pos=3, col="steelblue", xpd=TRUE) # <- some additional arguments that could be left out
```

What does this plot represent? What values are on the x-axis and what values are on the y-axis?

- Describe what these plots tell about the pupil dilation of subjects
`s05`

,`s10`

and`s06`

.

The following plotting function produces plots that compare the *fitted* effects with the data. Run the code. The most important line of the function is the first command `dat$fit <- fitted(model)`

: The function stores the fitted values (model predictions) as a separate column in the data.

```
# This is a simple function for comparing the
# fitted effect with the data:
plot.fitted <- function(model, subject, n=3, column="Pupil"){
dat$fit <- fitted(model)
# extract the items for this subject:
items <- unique( as.character(dat[dat$Subject==subject, "Item"] ))
# if n > length of items, then take length of items:
n <- min(n, length(items))
# select n random items
items <- sample(items,n)
# select these items from the data:
tmp <- droplevels( dat[dat$Subject==subject & dat$Item %in% items,
c("Time", "Item", column, "fit")])
# set up empty plot window
emptyPlot(range(tmp$Time), range(tmp[,column]), main=subject)
# plot the items and their fitted effects:
for(i in items){
lines(tmp[tmp$Item==i,]$Time, tmp[tmp$Item==i,column], col='darkgray', xpd=TRUE)
lines(tmp[tmp$Item==i,]$Time, tmp[tmp$Item==i,]$fit, col='red', xpd=TRUE)
}
}
## end of function
```

You can run the function with a subject as input. The optional argument `n`

tells the function how many items to plot of that particular subject (defaults to 3):

```
par(mfrow=c(1,1))
plot.fitted(m2, 's01')
plot.fitted(m2, 's10', n=15)
```

Why does the fitted effect pattern look different in each new plot?

- What model terms are the same for all subjects and what is different?

Run the following model (ignore the warning) with *random smooths* over `Time`

for each participant. Include similar random smooths for Items.

`m3 <- bam( Pupil ~ s(Time) + ... , data=dat )`

Inspect the summary (`summary(m3)`

).

What can we conclude about the wigglyness of the smooth for

`Time`

? Is it still significant?- Why do the random smooths terms have such high
`edf`

value?

Use the following code to visualize the smooth.

```
par(mfrow=c(1,3), cex=1.1)
# Model term 1
plot(m3, select=1, shade=TRUE, scale=0)
abline(h=0)
# Model term 2
plot(m3, select=2)
abline(h=0)
# Summed effect of time:
plot_smooth(m3, view="Time", plot_all="Subject", rug=FALSE)
```

What do the different plots represent?

How do the random smooths change the effect of pupil dilation over time?

- Why did we not include a random intercept for subjects together with the random smooth for subjects?

Use the same function as before to inspect the fitted effects:

```
par(mfrow=c(1,1))
plot.fitted(m3, 's01', n=10)
```

Although the random smooths capture some of the variation *between* participants, they do not capture all the variation *within* subjects.

In addition to the random smooths over `Time`

, we could include random *intercept* for `Event`

(unique Trial and Subject combination).

Run the following model (may take some time):

```
m4 <- bam( Pupil ~ s(Time)
+ s(Time, Subject, bs='fs', m=1)
+ s(Time, Item, bs='fs', m=1)
+ s(Event, bs='re') , data=dat )
```

Use the following code to visualize the smooth terms

```
par(mfrow=c(1,3), cex=1.1)
# Model term 1
plot(m4, select=1, shade=TRUE, scale=0)
abline(h=0)
# Model term 2
plot(m4, select=2)
abline(h=0)
# Model term 3
plot(m4, select=3)
abline(h=0)
# Model term 4
plot_smooth(m4, view="Time", rm.ranef=TRUE)
```

Inspection of the fitted effects:

```
par(mfrow=c(1,1))
plot.fitted(m4, 's01', n=15)
```

Why is it useful to add intercepts for each

`Event`

rather than for each`Item`

? Why would the random effects of`Subject`

and`Item`

in many time series data not be sufficient?- What is your opinion on the model fit? What random effect would improve the model further?

**Assumptions of regression:**

1. Residuals are normally distributed

2. There is no structure in the residuals (independent observations)

To test whether this model meets these assumptions, we inspect the residuals of the model.

Plot the distribution of the residuals in a QQ-norm plot:

```
par(mfrow=c(1,1))
qqnorm(resid(m4))
qqline(resid(m4))
```

Plot and ACF plot of the residuals:

```
par(mfrow=c(1,1))
acf(resid(m4))
```

Shortly describe the QQ plot: what do the axis represent, and how could a QQ plot be interpreted.

Describe the pattern in the QQ plot. What do you conclude with respect to the distribution of the residuals? Is this pattern good or bad?

Shortly describe the ACF: what do the axis represent, and how could an ACF plot be interpreted.

- Describe the pattern in the ACF plot. What do you conclude with respect to the independence of the residuals? Is there structure in the residuals?

To account for auctorrelation in residuals, we could include an AR1 model in the GAMM.

First, we need to generate a ‘start event’ column:

`dat <- start_event(dat, event="Event")`

- Inspect the new column. What does the column indicate? Why does the AR1 model need this information?

Second, we need to have a start value for the parameter `rho`

, which indicates the amount of autocorrelation in the residuals. A good starting point is the Lag 1 value:

`rho <- acf(resid(m4), plot=FALSE)$acf[2]`

Run the model again with AR1 model included:

```
m4b <- bam( Pupil ~ s(Time)
+ s(Time, Subject, bs='fs', m=1)
+ s(Time, Item, bs='fs', m=1)
+ s(Event, bs='re')
, data=dat, AR.start=dat$start.event, rho=rho )
```

Inspect the summary and the plots to see what has changed in the model.

Two plots of the new residuals:

```
par(mfrow=c(1,2))
acf(resid(m4b), main='Residuals')
acf_resid(m4b, main='Corrected residuals')
```

What do these plots represent? Did the AR1 model reduce the autocorrelation?

- Use the function
`compareML`

to compare the two models,`m4`

and`m4b`

. Which one is preferred?

With the argument `n`

, the function `acf_resid`

compares the ACF Lag 1 value for different events.

`acf_resid(m4b, split_pred=list(Event=dat$Event), n=9)`

Each of the panels is representive for a quantile of the data (1/9th part). These plots indicate that there are some extreme events with high autocorrelation, for which including the AR1 model did not work.

- The current implementation of AR1 model may be too simple to account for all autocorrelation problems in the data. Given the 9 plots, what would be a useful extansion / adjustment of the AR1 model?

Hand in the R code, R output, the plots, and the answers on the questions in the section ‘d. Model criticism’.

Deadline: before next class.

Format: html document (R Markdown)

An easy way to include everything in a single document is by using R markdown.

Quick instructions for R Markdown in R Studio: www.sfs.uni-tuebingen.de/~jvanrij/Rinfo.html)

When using *NOT* using R Studio, try the following steps:

```
download.file('http://www.let.rug.nl/~wieling/statscourse/lecture4/lab/answers/lab-including-answers.Rmd', 'analysis.Rmd')
library(rmarkdown)
render('analysis.Rmd') # generates html file with results
browseURL(paste('file://', file.path(getwd(),'analysis.html'),
sep='')) # shows result
```

Alternatively you could hand in a pdf that contains all the information.

The assignments can be send by mail to vanrij.jacolien ‘AT’ gmail.com, with “LSA lab 1” in the subject line.**Note:** nonlinear interactions and the use of the `ti()`

smooth will be explained in the next class.

Often we are interested in the change in pupil size after a stimulus. In these case it might be useful to work with the *baselined* pupil size. Instead, we could also include baseline as a fixed-effect predictor to account for pupil size at the start of the analysis window:

```
m5 <- bam( Pupil ~ s(Time)
+ s(baseline) + ti(baseline, Time)
+ s(Time, Subject, bs='fs', m=1)
+ s(Time, Item, bs='fs', m=1)
+ s(Event, bs='re') , data=dat )
```

Note that the effects are highly similar to a model of baselined pupil dilation:

```
m5b <- bam( Pupil_base ~ s(Time)
+ s(baseline) + ti(baseline, Time)
+ s(Time, Subject, bs='fs', m=1)
+ s(Time, Item, bs='fs', m=1)
+ s(Event, bs='re') , data=dat )
```

The predictor `baseline`

and the interaction between `baseline`

and `Time`

are included to account for differences in the trend of pupil dilation depending on the size of the pupil at the start of the trial.

To account for the effect of *gaze position* on pupil dilation, we include an interaction between X position and Y position of the fixation. As both predictors are on the same scale (pixels on screen), we could use the function `s()`

for modeling the interaction.

```
m6 <- bam( Pupil ~ s(Time)
+ s(baseline) + ti(baseline, Time)
+ s(Xgaze, Ygaze)
+ s(Time, Subject, bs='fs', m=1)
+ s(Time, Item, bs='fs', m=1)
+ s(Event, bs='re') , data=dat )
```

Note that this is not included as a random effect by subject, but rather as a fixed effect to reduce the processing time for the model.

Inspect the interaction:

```
par(mfrow=c(1,2), cex=1.1)
plot(m6, select=2)
# same plot but now with colors:
pvisgam(m6, select=4, view=c("Xgaze", "Ygaze"))
# make white the areas where no data was observed:
fadeRug(dat$Xgaze, dat$Ygaze)
# indicate the center of the screen:
abline(v=1024/2, lty=3)
abline(h=768/2, lty=3)
```