-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathREADME.Rmd
842 lines (647 loc) · 43.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
---
title: "Readme for DLCAnalyzer"
output:
html_document:
keep_md: true
theme: united
toc: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
fig.path = "README_figs/README-"
)
```
## DLCAnalyzer
DLCAnalyzer is a code collection that allows loading and processing of DeepLabCut (DLC) .csv files (https://github.com/DeepLabCut/DeepLabCut). It can be used for simple analyses such as zone visits, distance moved etc. but can also be integrated with supervised machine learning and unsupervised clustering methods to extract complex behaviors based on point data information. DLCAnalyzer is intended for interactive use in R. The following document will highlight the most common type of analyses that can be performed with this collection of functions. In the function glossary all main functions and auxiliary functions are listed with a short description of their action. This code package has been written for a publication that is currently in the revision process. A pre-print version can be found under at: https://www.biorxiv.org/content/10.1101/2020.01.21.913624v1
## Getting started
This collection of code is not available as package, since certain dependencies rely on installs that are independent of R, so in order to ensure smooth operation please follow the steps described here.
Following libraries are used by this package (with information about tested versions) and should be installed and loaded before executing any commands
```{r results='hide', message=FALSE, warning=FALSE}
library(sp) #tested with v1.3-2
library(imputeTS) #tested with v3.0
library(ggplot2) #tested with v3.1.0
library(ggmap) #tested with v3.0.0
library(data.table) #tested with v1.12.8
library(cowplot) #tested with v0.9.4
library(corrplot) #tested with v0.84
library(keras) #REQUIRES TENSORFLOW INSTALL. tested with v2.2.5.0
```
Additionally, this package requires a working installation of tensorflow for R, which itself requires a working installation of Anaconda. Please follow the installation protocol in the following link for all steps:
https://tensorflow.rstudio.com/installation/
While this package might be a bit harder to install it is only required for functions that are needed for machine learning. For each section of this document it will be indicated when tensorflow is required. Everything else works without the install.
## Loading and processing a single file
Download the contents of this repository and keep the folder structure unchanged. First, set your working directory to this specified folder and then ensure that the file with all the code is sourced
```{r, include=FALSE}
#source('R/DLCAnalyzer_Functions_final.R')
source("R/DLCAnalyzer_Functions_final.R")
```
```{r, eval=FALSE}
setwd("PathToDLCFolder")
source('R/DLCAnalyzer_Functions_final.R')
```
To load a DLC .csv file (here an example file of an open field test (OFT) tracking) insert the path of the file (from the working directory):
```{r}
Tracking <- ReadDLCDataFromCSV(file = "example/OFT/DLC_Data/OFT_3.csv", fps = 25)
```
This command loads the DLCdata and orders it in an object that allows easy access and manipulation. It is crucial to set the correct frames per second (fps), otherwise many downstream metrics will be distorted. If you do not set fps it will be set to 1 frame per second!
Let's inspect the contents of the tracking object:
```{r}
names(Tracking$data)
```
As we can see the object contains a sub-object '$data' that has multiple further sub-objects, one for each point. The points have the same name as they had in the DLC network that was used for point tracking. Since this depends heavily on each user, most code of DLCAnalyzer is compatible with custom point names. Wherever functions were written for our specific network we will indicate this in the document
Let's have a look at the point `bodycentre`:
```{r}
head(Tracking$data$bodycentre)
```
As we can see the information from DLC (frame,x,y, and likelihood) of point `bodycentre` can be accessed easily
We can also plot one or multiple points using:
```{r}
PlotPointData(Tracking, points = c("nose","bodycentre","tailbase","neck"))
```
As we can see the tracking was not perfect, and all points had some tracking problems every now and then. If we want to remove and interpolate the outliers based on low likelihood from DLC we can do this by:
```{r}
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
PlotPointData(Tracking, points = c("nose","bodycentre","tailbase","neck"))
```
This looks already better, but there are still some points that were tracked incorrectly with high likelihood. When inspecting these OFT videos it becomes apparent that the video runs on after the trial is over which leads to mislabeling when the light in the camber turns on and the mouse is picked up. Let's get rid of the last 250 frames (=10 seconds) and see if this solves the issue
```{r}
Tracking <- CutTrackingData(Tracking, end = 250)
PlotPointData(Tracking, points = c("nose","bodycentre","tailbase","neck"))
```
As we can see this removed the artefacts in the data sufficiently
Next, we want to calibrate our Tracking data to transform it from a pixel dimension into a metric dimension. In this case we measured the physical distance of the area in cm (42 x 42) which is span by the points `tl`, `tr`, `br` and `bl`. For this, we use:
```{r}
Tracking <- CalibrateTrackingData(Tracking, method = "area",in.metric = 42*42, points = c("tl","tr","br","bl"))
Tracking$px.to.cm
PlotPointData(Tracking, points = c("nose","bodycentre","tailbase","neck"))
```
As you can see now `the px.to.cm` ratio was calculated and the data was automatically transformed into a metric dimension. The same process is possible with a pre-defined ratio or with a distance measurement between two points
## OFT analysis of a single file
Here we will explore how we can perform an OFT (open field test) analysis on a single file
Let's start by loading and pre-processing the data:
```{r, results='hide'}
Tracking <- ReadDLCDataFromCSV(file = "example/OFT/DLC_Data/OFT_3.csv", fps = 25)
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
Tracking <- CutTrackingData(Tracking,start = 100, end = 250)
Tracking <- CalibrateTrackingData(Tracking, method = "area",in.metric = 42*42, points = c("tl","tr","br","bl"))
```
In an OFT analysis we want to quantify how much / fast an animal moves and how much time it spends in different zones.
Let's start by defining the zones. DLCAnalyzer has a built in function to create a set of OFT zones based on the median tracking data from the 4 corners of the arena.
To create the OFT zones we use:
```{r}
Tracking <- AddOFTZones(Tracking, scale_center = 0.5,scale_periphery = 0.8 ,scale_corners = 0.4, points = c("tl","tr","br","bl"))
PlotZones(Tracking)
```
As we see, the function created the OFT Zones and stored them as part of the `Tracking` object.
Now we can resolve whenever a body point is in a certain zone:
```{r}
PlotZoneVisits(Tracking,point = c("bodycentre","nose","tailbase"))
```
However, we might be interested in adding new zones that are independent of any method. If we would want to add 2 new triangular zones and only get a readout for them can do this with:
```{r}
Tracking <- AddZones(Tracking,z = data.frame(my.zone.1 = c("tl","tr","centre"),
my.zone.2 = c("bl","br","centre")))
PlotZones(Tracking, zones=c("my.zone.1","my.zone.2", "arena"))
PlotZoneVisits(Tracking,point = c("bodycentre","nose","tailbase"),zones = c("my.zone.1","my.zone.2"))
```
In order to get metrics such as speed or movement we use:
```{r}
Tracking <- CalculateMovement(Tracking, movement_cutoff = 5, integration_period = 5)
head(Tracking$data$bodycentre)
```
The `movement_cutoff` defines at which cutoff (units / s, here cm) we are considering a point to be moving. The `integration_period` is important for detecting transitions. It will define over how many frames (+- `integration_period`) transitions between zones are analyzed and for how long an animal has to be moving to be considered doing so. This can help to remove noisy interpretations, i.e. where a point jumps over the zone line multiple times in short succession.
Now that we have calculated all metrics which are important for our analysis, we can create a density plot that encapsulates speed, time spent and position for selected points (here we only want to plot `bodycentre`, `nose` and `tailbase`):
```{r}
plots <- PlotDensityPaths(Tracking,points = c("bodycentre","nose","tailbase"))
plots$bodycentre
plots$nose
```
We can add our zones to the plot:
```{r}
plots <- AddZonesToPlots(plots,Tracking$zones)
plots$bodycentre
```
We are interested what happens in the center over the whole recording. For thi,s we can generate a zone report:
```{r}
Report <- ZoneReport(Tracking, point = "bodycentre", zones = "center")
t(data.frame(Report))
```
we can also generate a report based on a combination of zones. In the following example we create a report for whenever the `bodycentre` is neither in `my.zone.1` or in `my.zone.2` (by setting `invert = TRUE` we look at the inversion of the combined zone)
```{r}
Report <- ZoneReport(Tracking, point = "bodycentre", zones = c("my.zone.1","my.zone.2"), zone.name = "OutsideMyZones", invert = TRUE)
t(data.frame(Report))
```
Now we get a report for what happens when the animal is not in `my.zone.1` or `my.zone.2`. If we want to check if our zone was set correctly we can quickly do this with:
```{r}
PlotZoneSelection(Tracking, point = "bodycentre", zones = c("my.zone.1","my.zone.2"), invert = TRUE)
```
For an easy, preset OFT analysis of one or multiple points, you can also use the command:
```{r, warning=FALSE}
Tracking <- OFTAnalysis(Tracking, points = "bodycentre" ,movement_cutoff = 5, integration_period = 5)
t(data.frame(Tracking$Report))
```
## OFT analysis of multiple files
In order to run multiple files the best practice is to first define a pipeline of commands, and then execute it for all files in a folder.
First we define the path to our folder (`input_folder`) of interest and find all files:
```{r, warning=FALSE}
input_folder <- "example/OFT/DLC_Data/"
files <- list.files(input_folder)
files
```
Now, we define our processing pipeline. In order to check if it is running appropriately it is advisable to first run it for a single file interactively
```{r, warning=FALSE}
pipeline <- function(path){
Tracking <- ReadDLCDataFromCSV(path, fps = 25)
Tracking <- CutTrackingData(Tracking,start = 100, end = 300)
Tracking <- CalibrateTrackingData(Tracking, "area",in.metric = 42*42, points = c("tr","tl","bl","br"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
Tracking <- AddOFTZones(Tracking, scale_center = 0.5,scale_periphery = 0.8 ,scale_corners = 0.4)
Tracking <- OFTAnalysis(Tracking, movement_cutoff = 5, integration_period = 5, points = "bodycentre")
return(Tracking)
}
```
Now that the pipeline is defined, we can execute it for all files and combine them into a list of Tracking objects:
```{r, results='hide', message=FALSE, warning=FALSE}
TrackingAll <- RunPipeline(files,input_folder,FUN = pipeline)
```
From the list we can access individual results with the `$` operator. for example if we want to plot the zone visits for the first file we can simply use:
```{r}
Tracking <- TrackingAll$`OFT_1.csv`
PlotZoneVisits(Tracking, point = "bodycentre")
```
To get a combined report of all files we can use the following command (here, we only display the first 6 columns):
```{r}
Report <- MultiFileReport(TrackingAll)
Report[,1:6]
```
Additionally, we can create PDF files with multiplots for all analyses. They will appear in the working directory:
```{r, eval=FALSE}
PlotDensityPaths.Multi.PDF(TrackingAll,points = c("bodycentre"), add_zones = TRUE)
PlotZoneVisits.Multi.PDF(TrackingAll,points = c("bodycentre","nose","tailbase"))
```
## EPM analysis of a single file
Here we will describe how a EPM (elevated plus maze) analysis for a single file can be performed using DLCAnalyzer
Similar to previous sections, we start by loading and cleaning up our data.
```{r}
Tracking <- ReadDLCDataFromCSV(file = "example/EPM/DLC_Data/EPM_1.csv", fps = 25)
names(Tracking$data)
```
As you we see, the EPM network contains additional/different points than the OFT network. Many of these additional points are used to track the maze, which allows automatic reconstruction of the zones for each file.
```{r}
Tracking <- CutTrackingData(Tracking,start = 100, end = 250)
Tracking <- CalibrateTrackingData(Tracking, method = "distance",in.metric = 65.5, points = c("tl","br"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
PlotPointData(Tracking,points = c("nose","bodycentre","tailbase","neck"))
```
As we can see, the data looks messy even after the likelihood cutoff. We will get back to that with a further trick. However, for the EPM analysis we also need a number of zones that describe the maze (open arms, closed arms etc.). Rather than constructing them in all independently there is the option to use a template file that describes the zones and then adds them to the object automatically.
First, we load the template .csv file:
```{r}
zoneinfo <- read.table("example/EPM/EPM_zoneinfo.csv", sep = ";", header = T)
zoneinfo
```
As we can see this file contains a number of columns that each contain a list or point names from our DLCnetwork. Each column of points describes a zone which they span. We can add this info to the Tracking data:
```{r}
Tracking <- AddZones(Tracking,zoneinfo)
PlotZones(Tracking)
```
Our zone file also contains one zone `arena` that describes the whole area:
```{r}
Tracking$zones$arena
```
We can use this to further clean up our data. we scale it by a factor 1.8 to define an inclusion zone:
```{r}
inclusion.zone <- ScalePolygon(Tracking$zones$arena, factor = 1.8)
```
Now we can use this zone to further clean up our data. every point that falls outside of it will be removed and interpolated
```{r}
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95,existence.pol = inclusion.zone)
PlotPointData(Tracking,points = c("nose","bodycentre","tailbase","neck"))
```
We can now perform an EPM analysis. This will record time in zones and other metrics. if `nosedips` is enabled it will automatically detect nose dips. This will only works if the correctly named points and zones are present in the DLCnetwork and zoneinfo file, otherwise it will omit the nose dip analysis and report a warning.
```{r}
Tracking <- EPMAnalysis(Tracking, movement_cutoff = 5,integration_period = 5,points = "bodycentre", nosedips = TRUE)
t(data.frame(Tracking$Report[1:6]))
```
We can create a time resolved plot of all nose dips. for this, we use:
```{r}
PlotLabels(Tracking)
```
## EPM analysis of mutliple files
Here we will cover an EPM (elevated plus maze) analysis for multiple files.
Similar to multiple OFT files, we first define a pipeline and then execute it for all EPM files
```{r, warning=FALSE}
input_folder <- "example/EPM/DLC_Data/"
files <- list.files(input_folder)
files
```
```{r, warning=FALSE,results='hide'}
pipeline <- function(path){
Tracking <- ReadDLCDataFromCSV(file = path, fps = 25)
Tracking <- CutTrackingData(Tracking,start = 100, end = 250)
Tracking <- CalibrateTrackingData(Tracking, method = "distance",in.metric = 65.5, points = c("tl","br"))
zoneinfo <- read.table("example/EPM/EPM_zoneinfo.csv", sep = ";", header = T)
Tracking <- AddZones(Tracking,zoneinfo)
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95,existence.pol = ScalePolygon(Tracking$zones$arena, 1.8))
Tracking <- EPMAnalysis(Tracking, movement_cutoff = 5,integration_period = 5,points = "bodycentre", nosedips = TRUE)
return(Tracking)
}
TrackingAll <- RunPipeline(files,input_folder,FUN = pipeline)
```
Again, we can create a report for all files using the following command (here we only show the first 6 columns):
```{r, warning=FALSE,}
Report <- MultiFileReport(TrackingAll)
Report[,1:6]
```
We can see that animal 3 has very few nose dips, whereas animal 4 seems to have a lot. We can quickly compare both with the following commands to create overview plots for them:
```{r, fig.asp = 1.5, figures-side, fig.show="hold", out.width="50%"}
OverviewPlot(TrackingAll$EPM_4.csv,"bodycentre")
OverviewPlot(TrackingAll$EPM_3.csv,"bodycentre")
```
Indeed, it becomes apparent that animal 3 was hiding in the bottom closed arm for most of the test, whereas animal 4 showed a rich explorative behavior.
Additionally, we can create PDF files with multiplots for all analyses. They will appear in your working directory.
```{r, eval=FALSE}
PlotDensityPaths.Multi.PDF(TrackingAll,points = c("bodycentre"), add_zones = TRUE)
PlotZoneVisits.Multi.PDF(TrackingAll,points = c("bodycentre","nose","tailbase"))
PlotLabels.Multi.PDF(TrackingAll)
```
## FST analysis of a single file
Here we will describe how a FST (forced swim test) analysis for a single file can be performed
Similar to previous sections, we start by loading and cleaning up our data:
```{r,results='hide'}
Tracking <- ReadDLCDataFromCSV(file = "example/FST/DLC_Data/FST_1.csv", fps = 25)
Tracking <- CutTrackingData(Tracking,start = 300, end = 100)
Tracking <- CalibrateTrackingData(Tracking, "distance",in.metric = 20, points = c("t","b"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
```
And we inspect our data to control the integrity:
```{r}
PlotPointData(Tracking,points = c("nose","bodycentre","tailbase","neck"))
```
As we can see, the data looks good already. We can now proceed to run the analysis. But first, for an FST analysis we need to specify further information. For this type of analysis, we have to measure the movement of all points that are part of the mouse and analyze them in order to detect floating behavior.
If we inspect the `point.info` of our object
```{r}
Tracking$point.info
```
We can see that there is not yet any info that describes which point belongs to the mouse. Here, we will load this info from a .csv file and add it to the object: (however, this could also be done manually by directly changing `Tracking$point.info`)
```{r}
pointinfo <- read.table("example/FST/FST_pointinfo.csv", sep = ";", header = T)
Tracking <- AddPointInfo(Tracking, pointinfo)
Tracking$point.info
```
Now that the `Tracking` object has the additional information about point type we can run an FST analysis. We specify a `cutoff_floating` of 0.03, for which we found the algorithm to track floating behavior optimally. We also specify that the `Object` we want to track is of type `"Mouse"`.
```{r}
Tracking <- FSTAnalysis(Tracking,cutoff_floating = 0.03,integration_period = 10, Object = "Mouse", points = "bodycentre")
Tracking$Report
PlotLabels(Tracking)
```
As we can see, in this case the floating behavior of the mouse increased in the later part of the swim test.
## FST analysis of multiple files
Here we will describe how multiple FST (forced swim test) files can be analyzed together.
Similar to multiple OFT files, we first set the path, define a pipeline and then execute it for all FST files
```{r, warning=FALSE}
input_folder <- "example/FST/DLC_Data/"
files <- list.files(input_folder)
files[1:5]
```
We define the processing pipeline
```{r, warning=FALSE}
pipeline <- function(path){
Tracking <- ReadDLCDataFromCSV(file = path, fps = 25)
Tracking <- CutTrackingData(Tracking,start = 300, end = 100)
Tracking <- CalibrateTrackingData(Tracking, "distance",in.metric = 20, points = c("t","b"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
pointinfo <- read.table("example/FST/FST_pointinfo.csv", sep = ";", header = T)
Tracking <- AddPointInfo(Tracking, pointinfo)
Tracking <- FSTAnalysis(Tracking,cutoff_floating = 0.03,integration_period = 10, Object = "Mouse", points = "bodycentre")
return(Tracking)
}
```
We execute the pipeline for all files and combine them into a list of Tracking objects. We can then get a final report:
```{r, warning=FALSE,results='hide'}
TrackingAll <- RunPipeline(files,input_folder,FUN = pipeline)
Report <- MultiFileReport(TrackingAll)
```
```{r, warning=FALSE}
Report[1:5,1:5]
```
Additionally, we can create a PDF with all the label plots using the following command. By default it will be placed in the working directory
```{r, eval=FALSE}
PlotLabels.Multi.PDF(TrackingAll)
```
## Runing a bin analysis
Often, in behavioral research readouts are required in time bins. DLCAnalyzer has a fully integrated approach that allows analyses within bins. Here, we will do a bin analysis for floating behavior to see if it increases in animals at later time-points.
```{r, warning=FALSE, results='hide'}
Tracking <- ReadDLCDataFromCSV(file = "example/FST/DLC_Data/FST_1.csv", fps = 25)
Tracking <- CutTrackingData(Tracking,start = 300, end = 100)
Tracking <- CalibrateTrackingData(Tracking, "distance",in.metric = 20, points = c("t","b"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
pointinfo <- read.table("example/FST/FST_pointinfo.csv", sep = ";", header = T)
Tracking <- AddPointInfo(Tracking, pointinfo)
```
We did not yet run any analysis, since first we need to add the information about the bins. Here we add 1 minute long bins.
```{r, warning=FALSE}
Tracking <- AddBinData(Tracking,unit = "minute", binlength = 1)
Tracking$bins
```
Optionally, we can also load bins from a `data.frame`. this is especially interesting if we have unequally sized bins. In this example we have a first and last bin of 1.5 minutes and a longer intermediate bin of 3 minutes
```{r, warning=FALSE}
my.bins <- read.table("example/FST/FST_BinData.csv", sep = ";", header = T)
my.bins
Tracking <- AddBinData(Tracking,bindat = my.bins, unit = "minute")
Tracking$bins
```
Now, we can perform a bin analysis using:
```{r, warning=FALSE}
BinReport <- BinAnalysis(Tracking, FUN = FSTAnalysis ,cutoff_floating = 0.03,integration_period = 10, Object = "Mouse", points = "bodycentre")
BinReport[,1:5]
```
It is important to note that the function `BinAnalysis()` takes another function as argument (`FSTAnalysis`). Therefore it is crucial to pass all the arguments the function `FSTAnalysis()` requires along whenever using `BinAnalysis()`, In this case these were: `cutoff_floating`, `integration_period`, `Object` and `points`. The same applies to other functions such as `OFTAnalysis()` or `EPMAnalysis()` for the respective tests, which are also compatible with the `BinAnalysis()` function.
We can do the same for multiple files (here again in our custom bins):
```{r, warning=FALSE, results='hide'}
input_folder <- "example/FST/DLC_Data/"
files <- list.files(input_folder)
pipeline <- function(path){
Tracking <- ReadDLCDataFromCSV(file = path, fps = 25)
Tracking <- CutTrackingData(Tracking,start = 300, end = 100)
Tracking <- CalibrateTrackingData(Tracking, "distance",in.metric = 20, points = c("t","b"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
pointinfo <- read.table("example/FST/FST_pointinfo.csv", sep = ";", header = T)
Tracking <- AddPointInfo(Tracking, pointinfo)
my.bins <- read.table("example/FST/FST_BinData.csv", sep = ";", header = T)
Tracking <- AddBinData(Tracking,bindat = my.bins, unit = "minute")
return(Tracking)
}
TrackingAll <- RunPipeline(files,input_folder,FUN = pipeline)
BinReportAll <- MultiFileBinanalysis(TrackingAll, FUN = FSTAnalysis ,cutoff_floating = 0.03,integration_period = 10, Object = "Mouse", points = "bodycentre")
```
And we can see that the bins for each file are now included in one `data.frame`
```{r, warning=FALSE}
BinReportAll[1:6,1:5]
```
We can now answer our question if floating increases in the later bins of the tests:
```{r, warning=FALSE}
ggplot(data = BinReportAll, aes(bin,percentage.floating, color = bin)) + geom_boxplot() + geom_point(position = position_jitterdodge())
```
As we can see here, the time floating seems to increase in the later bins compared to the earlier bin.
## Training a machine learning classifier on a single file
This section requires a working install of the `keras` library which itself requires a working anaconda and tensorflow install!
Here we will explore how a neural network can be trained to recognize complex behaviors. We will work with the forced swim test data (FST).
First, we will see how we can generate features from one single file, and add the labeling data from a human experimenter to it. We start with our standard pipeline to load, clean and calibrate the data:
```{r, warning=FALSE}
file <- "FST_3.csv"
path <- "example/FST/DLC_Data/"
Tracking <- ReadDLCDataFromCSV(paste(path,file,sep = ""), fps = 25)
Tracking <- CutTrackingData(Tracking,start = 300,end = 300)
Tracking <- CalibrateTrackingData(Tracking, "distance",in.metric = 20, c("t","b"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
PlotPointData(Tracking,points = c("nose","bodycentre","tailbase","neck"))
```
As we can see, the data looks good. Now, let's add the labeling data. For this, we first load it from a file which containes labeling data of only one experimenter:
```{r, warning=FALSE}
labeling.data <- read.table("example/FST/Lables/FSTLables_Oliver.csv",sep = ";", header = T)
head(labeling.data)
```
As you can see, the data is prepared in a specific way that links onset and offset of behaviors to a specific file. To get the data from this one file and add it to our object we use:
```{r, warning=FALSE}
labeling.data <- labeling.data[labeling.data$CSVname == Tracking$file,]
Tracking <- AddLabelingData(Tracking, labeling.data)
PlotLabels(Tracking)
```
Now that we have our labeling data, we have to think about how to create our feature data from our tracking data. Here we will use the information about point acceleration to train a floating classifier. For this, we first have to calculate accelerations for each of our point using:
```{r, warning=FALSE}
Tracking <- CalculateAccelerations(Tracking)
```
Now we use a preset function that extracts our features from the point data and stitches them into a single feature `data.frame`
```{r, warning=FALSE}
Tracking <- CreateAccelerationFeatures(Tracking)
dim(Tracking$features)
head(Tracking$features)
```
As we see, our feature data now contains acceleration data of 11 points, in this case all points except the tail points. It is important to note here, that the function `CreateAccelerationFeatures()` is very specific for a certain DLC network, so any network that produces different points, or points with different names will not work properly!
Additionally, we also want to incorporate temporal information. For this, we use the following command. In the process data from previous and following frames will be added to our features at each frame, depending on a specified integration period, here set to +- 20 frames.
```{r, warning=FALSE}
Tracking <- CreateTrainingSet(Tracking, integration_period = 20)
dim(Tracking$train_x)
head(Tracking$train_y)
```
As we can see, now the `train_x` data is much bigger, having 451 features (= 41 x 11) that represent acceleration of 11 points over a window of +- 20 frames. We can now use the following command to prepare the data so it can directly be used with keras. We set `shuffle` to `TRUE` so the data get's randomly shuffled before training a network with it:
```{r, warning=FALSE, eval = FALSE}
MLData <- CombineTrainingsData(Tracking, shuffle = TRUE)
```
Now, we define the architecture of the network that we want to train. We will not go into the details of the model used here, it is advisable to read the documentation on tensorflow.org for specific details.
```{r, warning=FALSE, eval = FALSE}
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(MLData$parameters$N_input),kernel_regularizer = regularizer_l2(l = 0)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu',kernel_regularizer = regularizer_l2(l = 0)) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = MLData$parameters$N_features, activation = 'softmax')
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
```
We train the model with our data using the following command:
```{r, warning=FALSE, eval = FALSE}
history <- model %>% fit(
MLData$train_x, MLData$train_y,
epochs = 5, batch_size = 32,
validation_split = 0
)
```
We can evaluate the model with the same data we trained on using the following function:
```{r, include = FALSE}
#Here we are reading in pre-processed labeling data. set eval to TRUE for the above chunks and disable this command to interactively run ML training
Tracking$labels <- readRDS("example/knitdat.ML1.Rds")
```
```{r, warning=FALSE, eval = FALSE}
Tracking <- ClassifyBehaviors(Tracking,model,MLData$parameters)
```
```{r}
PlotLabels(Tracking)
```
As we can see, the classifier performs closely to the original manual tracking. However, we have to be very careful here. We just trained with the same data that we tested on, and we used data from one single video. The results are therefore not solid. In the next section, we will see how we can use a one vs. all approach to create multiple training and testing sets, that can be used to gain a fairly accurate read-out of our models performance.
## Training a classifier and cross validating it
In order to train more robust classifiers we need more trainings data. Here we again use the FST example dataset, for which we have 5 annotated videos. In general, this amount of trainings data is not sufficient to reach a good performance, but it will serve as a practical example that can be run in a short enough time here. We start by defining our `pipeline()`.
```{r, warning=FALSE}
labeling.data <- read.table("example/FST/Lables/FSTLables_Oliver.csv",sep = ";", header = T)
pointinfo <- read.table("example/FST/FST_pointinfo.csv", sep = ";", header = T)
files <- unique(labeling.data$CSVname)
path <- "example/FST/DLC_Data/"
pipeline <- function(path){
Tracking <- ReadDLCDataFromCSV(path, fps = 25)
Tracking <- CutTrackingData(Tracking,start = 300,end = 300)
Tracking <- CalibrateTrackingData(Tracking, "distance",in.metric = 20, c("t","b"))
Tracking <- CleanTrackingData(Tracking, likelihoodcutoff = 0.95)
Tracking <- AddLabelingData(Tracking, labeling.data[labeling.data$CSVname == Tracking$filename,])
Tracking <- CalculateAccelerations(Tracking)
Tracking <- CreateAccelerationFeatures(Tracking)
Tracking <- CreateTrainingSet(Tracking, integration_period = 20)
Tracking <- AddPointInfo(Tracking,pointinfo)
Tracking <- FSTAnalysis(Tracking, cutoff_floating = 0.03, integration_period = 5, points = "bodycentre", Object = "Mouse")
return(Tracking)
}
```
And now, we run this pipeline for all files and save the results in a `list()` of tracking objects
```{r, warning=FALSE, results = 'hide'}
TrackingAll <- RunPipeline(files,path,FUN = pipeline)
```
For the one vs. all evaluation, we write a loop that trains a model for each file (using all the other files as training data) and then evaluates this model with the one single file. It will repeat this process for each file and add the classification results to each file.
```{r, warning=FALSE, results = 'hide', eval = FALSE}
evaluation <- list()
for(i in names(TrackingAll)){
#combine training data from all files except the one we want to evaluate
MLData <- CombineTrainingsData(TrackingAll[names(TrackingAll) != i],shuffle = TRUE)
#initialize model
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(MLData$parameters$N_input),kernel_regularizer = regularizer_l2(l = 0)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu',kernel_regularizer = regularizer_l2(l = 0)) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = MLData$parameters$N_features, activation = 'softmax')
#define optimizer
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
#train model
history <- model %>% fit(
MLData$train_x, MLData$train_y,
epochs = 5, batch_size = 32,
validation_split = 0
)
#we now use this model to classify the behavior in the cross validation file
TrackingAll[[i]] <- ClassifyBehaviors(TrackingAll[[i]],model,MLData$parameters)
}
```
We quickly inspect how it performed for the first 2 files:
```{r, include = FALSE}
#Here we are reading in pre-processed labeling data. set eval to TRUE for the above chunks and disable this command to interactively run ML training
labs <- readRDS(file = "example/LabsFST.Rds")
for(i in names(TrackingAll)){
TrackingAll[[i]]$labels$classifications <- labs[[i]]$classifications
}
```
```{r}
PlotLabels(TrackingAll$FST_1.csv)
PlotLabels(TrackingAll$FST_2.csv)
```
As we can see, on the first glance the classifications look fairly accurate. To gain a more descriptiv readout we can evaluate our results within files and for the overall experiment. The second command will create a PDF document with all LabelPlots in the working directory.
```{r}
EvaluateClassification(TrackingAll)
PlotLabels.Multi.PDF(TrackingAll)
```
As we can see, for floating we achieve an overall accuracy of ~80% and for None ~94%
If we want to investigate how well the classifier imitates the human experimenter across multiple files we can create a correlation plot of the final readouts. Here we are interested in the floating time for the different labeling approaches:
```{r}
CorrelationPlotLabels(TrackingAll, include = c("manual.Floating.time","classifications.Floating.time","cutoff.floating.Floating.time"))
```
As we can see, training with 4 files only leads to a classifier that correlates worse to the manual scoring than the pre-set cutoff. To increase the correlation we would want to increase the size of trainings data substantially.
DLCAnalyzer can also run unsupervised clusterin methods. Here we use kmeans clustering on our machine learning data to see if this approach can resolve any behavioral syllables that are floating like.
```{r, results = 'hide', width=10, fig.height=8}
TrackingAll <- UnsupervisedClusteringKmeans(TrackingAll,N_clusters = 10,Z_score_Normalize = TRUE)
PlotLabels(TrackingAll$FST_2.csv)
```
As we see there is one cluster that seems to be very similar to the classifier, manual and floating cutoff. Let's plot a correlation matrix to see if this holds over all files.
```{r, fig.width=10, fig.height=8}
CorrelationPlotLabels(TrackingAll, include = c("manual.Floating.time",
"classifications.Floating.time",
"cutoff.floating.Floating.time",
"unsupervised.1.time",
"unsupervised.2.time",
"unsupervised.3.time",
"unsupervised.4.time",
"unsupervised.5.time",
"unsupervised.6.time",
"unsupervised.7.time",
"unsupervised.8.time",
"unsupervised.9.time",
"unsupervised.10.time"))
```
## Functions Glossary
Main functions, intended to be used by user:
Function | Description
------------- | -------------
`AddBinData()` | Adds bin data to an object of type TrackingData
`AddLabelingData()` | Adds labels to an object of type TrackingData
`AddOFTZones()`| Adds OFT zones to an object of type TackingData
`AddPointInfo()` | Adds point info to an object of type Trackingdata
`AddZones()` | Adds zones to an object of type TrackingData
`AddZonesToPlots()` | Adds zones to a density path plot. Requires existing zones
`BinAnalysis()` | Runs a analysis on an object of type TrackingData in bins. Requires existing bin data
`CalculateAccelerations()` | Adds accelerations to an object of type TrackingData
`CalculateMovement()` | Adds movement metrics to an object of type TrackingData
`CalibrateTrackingData()` | Calibrates an object of Type TrackingData
`ClassifyBehaviors()` | Classifies behaviors in an object of type TrackingData using a trained keras model. Requires existing trainings/testing data. REQUIRES KERAS LIBRARY!
`CleanTrackingData()` | Cleanes up an object of type TrackingData
`CombineLabels()` | Combines labels in an object of type TrackingData. Requires existing labels
`CombineTrainingsData()` | Combines and prepares trainings data of multiple objects of type Trackingdata. Requires existing trainings data. REQUIRES KERAS LIBRARY!
`CorrelationPlotLabels()` | Creates a correlation plot of different labels in a list of TrackingData objects. Requires existing labels
`CreateAccelerationFeatures()` | Creates acceleration based features for an object of type TrackingData. Requires any DLC network of original publication. Requires existing acceleration data
`CreateSkeletonData()` | Creates features based on skeleton data for an object of type TrackingData. Requires any DLC network of original publication
`CreateSkeletonData_FST_v2()` | Creates features based on skeleton data for an object of type TrackingData. Requires FST DLC network from original publication. Requires existing acceleration data
`CreateSkeletonData_OFT()` | Creates features based on skeleton data for an object of type TrackingData. Requires OFT DLC network from original publication. Requires existing OFT zones
`CreateSkeletonData_OFT_v2()` | Creates features based on skeleton data for an object of type TrackingData. Requires OFT DLC network from original publication. Requires existing OFT zones. requires existing acceleration data
`CreateSkeletonData_OFT_v3()` | Creates features based on skeleton data for an object of type TrackingData. Requires OFT DLC network from original publication. Requires existing OFT zones. Requires existing acceleration data
`CreateTestSet()` | Adds machine learning test data for an object of type TrackingData. requires existing features
`CreateTrainingSet()`| Adds machine learning training data for an object of type Trackingdata. Requires existing features and labeling data.
`CutTrackingData()`| Cuts an object of type TrackingData
`EPMAnalysis()`| Performs a EPM analysis on an object of type TrackingData. requires EPM zones.
`EvaluateClassification()`| Evaluates classification performance/accuracies of and object or a list of objects of type TrackingData
`ExtractLabels()`| Extract labels from a long format data frame by multiple selection criterias
`FSTAnalysis()`| Runs a FST analysis on an object of type TrackingData. Requires
`GetAngleClockwise()`| Gets the clock wise angle between a vector pair from an object of type TrackingData at each frame
`GetAngleTotal()`| Gets the total angle between two vectors from an object of type TrackingData at each frame
`GetDistances()`| Gets the distance between two points from an object of type TrackingData at each frame
`GetDistanceToZoneBorder()`| Gets the distance between a point an the border of a zone from an object of type TrackingData at each frame
`GetPolygonAreas()`| Gets area of a polygon from an object of type TrackingData at each frame
`HeadAngleAnalysis()`| Performs a head angle analysis for an object of type TrackingData
`IsInZone()`| Checks if a point is in a zone for an object of type TrackingData at each frame
`IsTrackingData()`| Checks if object is of type TrackingData
`LabelReport()`| Creates a Label report for an object of type TrackingData. requires labels.
`MedianMouseArea()`| Calculates the median mouse area for an object of type TrackingData
`MedianMouseLength()`| Calculates the median mouse length for an object of type TrackingData
`MultiFileBinanalysis()`| Performs a bin analysis for a list() of files of type TrackingData. requires bin data.
`MultiFileReport()`| Produces a combined report for a list() of files of type TrackingData.
`NormalizeZscore()`| Performs a Z score normalization on a matrix by columns
`NormalizeZscore_median()`| Performs a median Z score normalization on a matrix by columns
`OFTAnalysis()`| Performs a OFT analysis on an object of type TrackingData. requires OFT zones.
`OverviewPlot()`| Creates an overview plot for an object of type TrackingData
`PlotDensityPaths()`| For an object of type TrackingData, plots the density path of a point
`PlotDensityPaths.Multi.PDF()`| For a list() of objects of type TrackingData. Prints density path plots to a pdf.
`PlotLabels()`| For an object of type TrackingData, plots label data
`PlotLabels.Multi.PDF()`| For a list() of objects of type TrackingData. Prints label plots to a pdf.
`PlotPointData()`| Plots point data for an object of type TrackingData.
`PlotZones()`| Plots zones of an object of type TrackingData. requires zones
`PlotZoneSelection()`| Plots the zone selection for an object of type TrackingData. requires zones
`PlotZoneVisits()`| Plots zone visits for an object of type TrackingData. requires zones
`PlotZoneVisits.Multi.PDF()`| For a list() of objects of type TrackingData. Prints zone visit plots to a pdf. requires zones
`PrepareMLData()`| Prepares x_train and y_train data for a keras network. REQUIRES KERAS LIBRARY!
`ReadDLCDataFromCSV()`| Creates an object of type TrackingData from a DLC .csv output file
`RotateTrackingData()`| Rotates an object of type TrackingData
`RunPipeline()`| Runs a specified pipeline that loads and processes multiple objects of type TrackingData
`ScaleFeatures()`| Linerly scales feature data of an object of type TrackingData. requries feature data.
`SmoothLabels()`| Smooths all labeling data of an object of type TrackingData. requires labels
`UnsupervisedClusteringKmeans()`| Performs a k means clustering on an object (or list() of objects) of type TrackingData
`ZoneReport()`| Creates a zone report for an object of type TrackingData
`ZscoreNormalizeFeatures()`| Performs Z score normalization on features of an object of type TrackingData. Requires Features
Auxiliary functions:
Function | Description
------------- | -------------
`AreaPolygon2d()` | Calculates the area of a polygon
`AreaPolygon3d()` | Calculates the area of a polygon at each frame
`avgbool()` | Logical rolling mean of a boolean vector
`avgmean()` | Numeric rolling mean of a numeric vector
`CalculateTransitions()` | Calculates transitions for a boolean vector
`Distance2d()`| Distance between two points
`DistanceToPolygon()`| Distance from a point to the closest polygon edge
`EqualizeTrainingSet()`| Equalizes a training set
`integratevector()`| integration of a numeric vector
`periodsum()`| For a numeric vector, sums up total values within a time period at each element
`RecenterPolygon()`| Recenters a polygon
`ScalePolygon()`| Linearly scales a polygon
`SmoothLabel()`| Smooths a single character vector with label data