The aim of this project is to develop a high-quality video annotation tool for computer vision and machine learning applications with the following desiderata:
- Simple and efficient to use for a non-expert.
- Supports multiple annotation types including temporal segments, object bounding boxes, semantic and instance regions, tracklets, and human pose (skeleton).
- Runs in a browser without external libraries or need for server-side processing. But easy to plug-in a back-end for heavy "in-the-loop" processing (e.g., segments from bounding boxes or frame completion from partial labels).
- Integrates easily with crowd-sourced annotation services (e.g., Amazon Mechanical Turk).
- Compatible with all (most) modern browsers and operating systems including tablets.
- Secure. Data does not need to leave the local machine (since there is no server-side processing).
- Open-source.
Just open Host and open a local video, you are good to go!
You need to deploy Vidat first, and then use URL parameters to load the video into Vidat. Please note that Vidat does not support online YouTube videos due to Cross-Origin Resource Sharing (CORS).
- Prepare tasks
- Deploy Vidat on a server which can access to the videos and annotation (config) files.
- Generate URLs for each task,
e.g.
https://example.com?annotation=task1.json&submitURL=http%3A%2F%2Fexample2.com%3Ftoken%3D123456
.
- Dispatch tasks on MTurk
- Create a new MTurk task with survey template, replace the survey link with task link.
- Create a batch with generated URLs.
- Collect submissions
- Build up an independent API backend (see
/tools/backend/
for a simple implementation) that handles submissions.
- Build up an independent API backend (see
Submission API:
Request
POST <submitURL>
content-type: application/json
<annotation>
Respond
content-type: application/json
{
type: '' // color: "primary" (default) | "secondary" | "accent" | "dark" | "positive" | "negative" | "info" | "warning"
message: '' // notify the user (required)
clipboard: '' // copy to user's clipboard (optional)
}
Note that this is only necessary if you want to do development or host your own version of the tool. If you just want to label videos then you can use one of the host servers linked to above (data will remain on your local machine; it will not be sent to the host server).
- Download our latest release. Note that the
pre-release
is automatically generated and should not be used in production. - Unzip all files and put them behind a web server (Nginx, Apache,
etc.). Note that open
index.html
in your explorer does not work. - Open in your favourite browser.
Click to expand
All the keys and values are not case-sensitive.
Note if you are using an external URL for
annotation
,video
,config
orsubmitURL
, please make sure you are following Cross-Origin Resource Sharing (CORS). And they need to be URL encoded if there is any special characters.
Default null
Example /annotation/exmaple.json
, http://exmaple/static/annotation/exmaple.json
Path to the annotation file. Vidat will load the video, annotation and configurations from this file. This parameter has
higher priority than video
, config
, defaultFPS
and defaultFPK
. Please refer
to File Formats - Annotation for format details.
Default null
Example /video/exmaple.mp4
, http://exmaple/static/video/exmaple.json
Path to the video file. Please refer to decoder
for more information.
Default null
Example /config/exmaple.json
, http://exmaple/static/config/exmaple.json
Path to the video file. Please refer to File Formats - Config for format details.
Default null
Example object
| region
| skeleton
Specify current mode for Vidat.
Default false
Example true
| false
Whether toggle zoom on.
Default hasTouch ? 10 : 5
Example Integer >= 1
When detecting points / edges, the number of pixel(s) between you mouse and the annotation.
Default 10
Example 1 <= Integer <= 60
The default frame per second used when extracting frames from the given video.
Default 50
Example Integer >= 1
The default frame per keyframe used when generating keyframes.
Default auto
Example auto
| v1
| v2
The video decoder used for frame extracting.
v1
uses <canvas>
as a video decoder,
by pause
- draw
- play
- wait for timeupdate
strategy. It is
the most reliable and compatible methods for most cases. But it is slow and computational inefficient.
v2
uses WebCodecs.VideoDecoder
, it takes the
advantages of native video decoder built inside the browser. It is way faster than v1
but lack of support from old
browsers.
auto
Vidat will determine which one to use for you.
See VideoLoader Wiki for details.
Default true
Example true
| false
Whether to show object
mode related components.
Default true
Example true
| false
Whether to show region
mode related components.
Default true
Example true
| false
Whether to show skeleton
mode related components.
Default true
Example true
| false
Whether to show action
related components.
Default true
Example true
| false
Whether to mute the video when playing.
Default false
Example true
| false
Whether to grayscale the video.
Default true
Example true
| false
Whether to show quick popup when finishing annotating an object/region/skeleton.
Default null
Example submitURL=http%3A%2F%2Fexample.com%3Ftoken%3D123456
URL used for submitting annotation.
http://example.com?showObjects=false&showRegions=false&showSkeletons=false
This will show action only.
http://example.com?mode=skeleton&showPopup=false
This will set the current mode to skeleton and disable popup window.
http://example.com/index.html?submitURL=http%3A%2F%2Fexample.com%3Ftoken%3D123456
There will be a button shown in the side menu which will POST
the annotation file to
http://example.com?token=123456
.
Config
{
"objectLabelData": [
{
"id": 0,
"name": "default",
"color": "<color>"
}
],
"actionLabelData": [
{
"id": 0,
"name": "default",
"color": "<color>",
"objects": [0]
}
],
"skeletonTypeData": [
{
"id": 0,
"name": "default",
"description": "",
"color": "<color>",
"pointList": [
{
"id": 0,
"name": "point 1",
"x": -10,
"y": 0
},
{
"id": 0,
"name": "point 2",
"x": 10,
"y": 0
}
],
"edgeList": [
{
"id": 0,
"from": 0,
"to": 1
}
]
}
]
}
See public/config/example.json
for am example.
Annotation
{
"version": "2.0.0",
"annotation": {
"video": {
"src": "<path to video>",
"fps": "fps",
"frames": 0,
"duration": 0,
"height": 0,
"width": 0
},
"keyframeList": [0],
"objectAnnotationListMap": {
"0": [
{
"instance": 0,
"score": 0,
"labelId": 0,
"color": "<color>",
"x": 0,
"y": 0,
"width": 0,
"height": 0
}
]
},
"regionAnnotationListMap": {
"0": [
{
"instance": 0,
"score": 0,
"labelId": 0,
"color": "<color>",
"pointList": [
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
},
{
"x": 0,
"y": 0
}
]
}
]
},
"skeletonAnnotationListMap": {
"0": [
{
"instance": 0,
"score": 0,
"centerX": 0,
"centerY": 0,
"typeId": 0,
"color": "<color>",
"_ratio": 1,
"pointList": [
{
"id": -1,
"name": "center",
"x": 0,
"y": 0
},
{
"id": 0,
"name": "point 1",
"x": -10,
"y": 0
},
{
"id": 1,
"name": "point 2",
"x": 10,
"y": 0
}
]
}
]
},
"actionAnnotationList": [
{
"start": 0,
"end": 0,
"action": 0,
"object": 0,
"color": "<color>",
"description": ""
}
]
},
"config": "<config>"
}
See public/annotation/example.json
for am example.
See Wiki for details.
See Design Wiki for details.
If you use Vidat for your research and wish to reference it, please use the following BibTex entry:
@misc{zhang2020vidat,
author = {Jiahao Zhang and Stephen Gould and Itzik Ben-Shabat},
title = {Vidat---{ANU} {CVML} Video Annotation Tool},
howpublished = {\url{https://github.com/anucvml/vidat}},
year = {2020}
}