Install or update the latest version of the packages `mgcv`

and `itsadug`

:

`install.packages(c("mgcv", "itsadug"))`

Load libraries:

```
library(mgcv)
library(itsadug)
# indicate that messages of itsadug must be printed to the command line
infoMessages("off")
```

Check R version and package versions:

`R.version.string`

`## [1] "R version 3.2.1 (2015-06-18)"`

`packageVersion("mgcv")`

`## [1] '1.8.6'`

`packageVersion("itsadug")`

`## [1] '1.0.2'`

If these versions do not agree with the ones included in the Lab Assignment file, please let us know.

Download the source file `nonlinear_0.1.tar.gz`

and install the packages by the following commands in R:

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/packages/nonlinear_0.1.tar.gz", "nonlinear_0.1.tar.gz")
install.packages("nonlinear_0.1.tar.gz", type="source", repos=NULL)
```

If the package is installed succesfully, you could remove the source file (delete it).

Run the demo about random effects as follows:

```
library(nonlinear)
demoRandom()
```

Download the data `pupilsize_S1.rds`

and load the data in R:

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/LSA2015/Data/pupilsize_S1.rds", "pupilsize_S1.rds", mode="wb")
dat <- readRDS('pupilsize_S1.rds')
```

Inspect the data frame:

```
head(dat)
str(dat)
```

The data set is a subset of the data, with only 10 participants included in order to run some more complex models.

The column `Pupil`

contains the raw pupil dilation measure (blinks are removed). The data is sampled with 500 Hz, but downsampled to 50 Hz (i.e., a measurement each 20 ms).

Plot the grandaverage of the data. You can use your own code, or complete the code in the R block below.

```
# First we group the timestamps in bins of 100 ms:
dat$Timebin <- timeBins(dat$Time, 100)
# Calculate the grand averages with tapply or aggregate for the different timebins.
# Complete the code:
# - replace the dots in line A with the column of dat
# that contains the data to be averaged.
# - replace the dots in line B with the column(s) of dat
# thatgroup the data.
avg <- aggregate( dat$Pupil, # line A
list( Timebin=dat$Timebin ), # line B
mean, na.rm=TRUE)
# Plot the averages:
# Complete the code:
# - replace the dots in line A with the values to plot on the x-axis
# - replace the dots in line B with the values to plot on the y-axis
plot( avg$Timebin , # line A
avg$x , # line B
type='l', main="Grand average",
xlab='Time (ms)', ylab='Pupil size',
ylim=c(1000,1200),
lwd=2,
bty='n')
# Note: see help(par) for info about the different parameters
```

Alternative ways to use `aggregate`

to plot the data:

```
avg <- aggregate( Pupil ~ Timebin, data=dat, mean, na.rm=TRUE)
# or:
avg <- aggregate( Pupil ~ Timebin + Context, data=dat, mean, na.rm=TRUE)
```

Example of plotting two lines, one for each condition:

```
# First we group the timestamps in bins of 100 ms:
dat$Timebin <- timeBins(dat$Time, 100)
# Calculate the grand averages with tapply or aggregate for the different timebins.
# Complete the code:
# - replace the dots in line A with the column of dat
# that contains the data to be averaged.
# - replace the dots in line B with the column(s) of dat
# thatgroup the data.
avg <- aggregate( dat$Pupil, # line A
list( Timebin=dat$Timebin, Context=dat$Context), # line B
mean, na.rm=TRUE)
# Plot the averages:
# Complete the code:
# - replace the dots in line A with the values to plot on the x-axis
# - replace the dots in line B with the values to plot on the y-axis
plot( avg[avg$Context=="AP", ]$Timebin , # line A
avg[avg$Context=="AP", ]$x , # line B
type='l', main="Grand averages",
xlab='Time (ms)', ylab='Pupil size',
ylim=c(1000,1200),
lwd=2,
bty='n')
lines(avg[avg$Context=="PA", ]$Timebin , # line A
avg[avg$Context=="PA", ]$x, lwd=2, lty=2 )
legend("bottomright",
legend=c("AP", "PA"),
lwd=2, lty=c(1,2))
# Note: see help(par) for info about the different parameters
```

- What does the
`Timebin`

column represent? In case you want to see the code for calculating the timebins type`timeBins`

in the command line. Why would it sometimes be useful to plot`Timebin`

rather than raw timestamps?

The timebin column groups the timestamps in larger time bins of 100 ms. This makes the line smoother and allows for inspection of the major effects.

- Inspect the plot. Why would there be such a sudden increase in dilation after 3000 ms? Hint: type
`tapply(dat$Time, list(dat$Event), max)`

in the commandline.

Most time series run until 3000 ms, so the averages after 3000 ms are based on limited data points. That’s why these might go more extreme.

Check the numer of rows in the data. With more than 10.000 data points you should use the function `bam()`

rather than `gam()`

. Also when you’re planning to include many random smooths and other complex model terms, `bam()`

is more efficient.

`nrow(dat)`

`## [1] 44093`

Run the model with a nonlinear effect for `Time`

. Replace the dots `...`

with a nonlinear smooth for `Time`

.

`m1 <- bam( Pupil ~ s(Time), data=dat )`

Inspect the summary (

`summary(m1)`

).
`summary(m1)`

```
##
## Family: gaussian
## Link function: identity
##
## Formula:
## Pupil ~ s(Time)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1090.137 1.963 555.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Time) 5.013 6.114 58.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.00808 Deviance explained = 0.819%
## fREML = 3.2806e+05 Scale est. = 1.6979e+05 n = 44093
```

- What does the intercept represent?
The intercept is a contant value to be added to the other modelterms (here only
`s(Time)`

). Because we don’t have any categorical predictors included, it is roughly the mean value of pupil dilation (1089.92).

- What does the summary of smooth terms tell about the wigglyness of the smooth? (See the column
`edf`

)The

`edf`

is 5, which means that the smooth is not a straight line (in that case, the edf would be 1). - What does it mean that the smooth term is significant?
The smooth is at some point significantly different from a flat horizontal zero line. The confidence intervals of the smooth do not contain the zero line.

Use the following code to visualize the smooth.

```
par(mfrow=c(1,2), cex=1.1)
# Partial effect plot:
plot(m1, select=1, shade=TRUE)
abline(h=0)
# Fitted values:
plot_smooth(m1, view="Time")
```

What is the difference between the two plots? What do the plots represent?

The first plot visualizes the *partial* effect of the smooth of `Time`

. So it selectively visualizes the first line of the smooth term summary, but no other terms are included. The second plot visualizes the *summed* or *fitted* effects, which are the model’s predictions. This plot sets every value in the model to a certain value and sums all these effects.

Participants differ considerably in pupil dilation size. Visualize the individual patterns. You could use the code below, or use another plotting method.

`boxplot(Pupil ~ Subject, data=dat)`

```
# or alternatively:
library(lattice)
xyplot(Pupil ~ Time | Subject, pch=16, cex=.5, data = dat)
```

To take into account differences between participants we could add a random intercept for each subject to the model. Replace the dots `...`

with the random intercepts for `Subject`

and random intercepts for `Item`

.

```
m2 <- bam( Pupil ~ s(Time)
+ s(Subject, bs='re')
+ s(Item, bs='re'), data=dat )
```

Inspect the summary (`summary(m2)`

).

What does the intercept represent? The intercept is a contant value to be added to the other modelterms (here only

`s(Time)`

).- What does a random intercept terms represent?
*Adjustments*from the intercept for each subject and item, drawn from a normal distribution. For example, a negative value of a random intercept moves the whole fitted effects for that particular subject downwards. These random effects account for differences between subjects and items in average pupil dilation size.

Use the following code to visualize the smooth.

```
# make sure to have messages on:
infoMessages("on")
# split plo window in two:
par(mfrow=c(1,2), cex=1.1)
# Partial effect plot:
plot(m2, select=1, shade=TRUE)
abline(h=0)
# Fitted values:
plot_smooth(m2, view="Time")
```

```
## Summary:
## * Time : numeric predictor; with 30 values ranging from 0.000000 to 3243.000000.
## * Subject : factor; set to the value(s): s08.
## * Item : factor; set to the value(s): 30.
```

What do the plots represent? (Also pay attention to the summary in the command window for the

`plot_smooth`

function.) The first plot visualizes the*partial*effect of`Time`

, i.e. the first line of the smooth term summary. The second plot visualizes the*summed*effects over time, i.e., the intercept and values for`Subject`

and`Item`

are included. The message in the commandline suggests that the plot is the fitted effect of Time for subject “s08” and item “30”.Run the

`plot_smooth`

function again, but now add`rm.ranef=TRUE`

as argument:`plot_smooth(m2, view="Time", rm.ranef=TRUE)`

. Why did the plot change so much?

`plot_smooth(m2, view="Time", rm.ranef=TRUE)`

```
## Summary:
## * Time : numeric predictor; with 30 values ranging from 0.000000 to 3243.000000.
## * Subject : factor; set to the value(s): s08. (Might be canceled as random effect, check below.)
## * Item : factor; set to the value(s): 30. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(Subject),s(Item)
##
```

Now the effects of subject and item (intercept adjustments) have been canceled out. Now the confidence intervals have increased, because this plot now shows the uncertainty around the interval. In the previous plot the confidence bands were smaller because the model was more certain about the intercept of the specific subject and item combination.

We now inspect the random effects. Run the following code.

```
par(mfrow=c(1,3), cex=1.1)
# Partial effect plot:
plot(m2, select=2)
# get the values for individual subjects:
rand <- get_random(m2)
# note that get_random returns a list with
# the random adjustments for each random
# intercept or slope:
rand
```

```
## $`s(Subject)`
## s01 s02 s03 s04 s05 s07
## -101.35582 -236.97476 39.30879 69.15500 -702.93196 -203.94801
## s08 s09 s10 s06
## 386.88710 -55.72687 849.04654 -43.46002
##
## $`s(Item)`
## 1 2 3 4 5 6
## -22.382871 -45.003660 26.324149 17.031028 14.617727 -39.481939
## 7 8 9 10 11 12
## -30.403224 -23.352711 27.480479 26.427325 24.066908 82.179005
## 13 14 15 16 17 18
## -79.583049 7.162514 53.285312 110.270764 -43.767303 -51.976279
## 19 20 21 22 23 24
## -62.005568 -48.999765 13.031326 18.448216 76.034841 -17.816977
## 25 26 27 28 29 30
## -37.125977 16.777733 -6.531431 -28.830664 23.932075 34.731534
## 31 32
## -42.589029 8.049511
```

```
# similar plot with subj names added:
xy <- qqnorm(rand[[1]], main='s(Subject)',
col='steelblue', pch=16, bty='n') # <- some additional arguments that could be left out
qqline(rand[[1]])
text( xy$x, rand[[1]], labels=names( rand[[1]] ),
pos=3, col="steelblue", xpd=TRUE) # <- some additional arguments that could be left out
# same for items:
xy <- qqnorm(rand[[2]], main='s(Item)',
col='steelblue', pch=16, bty='n') # <- some additional arguments that could be left out
qqline(rand[[2]])
text( xy$x, rand[[2]], labels=names( rand[[2]] ),
pos=3, col="steelblue", xpd=TRUE) # <- some additional arguments that could be left out
```