Skip to content

Commit

Permalink
update data docs
Browse files Browse the repository at this point in the history
  • Loading branch information
phython96 committed Dec 1, 2024
1 parent 1cb7d6c commit 0c11ac2
Show file tree
Hide file tree
Showing 19 changed files with 276 additions and 42 deletions.
16 changes: 15 additions & 1 deletion docs/source/_static/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,18 @@
/* 调整右侧工具栏宽度 */
.bd-toc {
width: 15rem; /* 右侧工具栏宽度 */
}
}

div.admonition.admonition-youtube {
border-color: hsl(0deg 100% 50%); /* YouTube red */
}

div.admonition.admonition-youtube > .admonition-title {
background-color: hsl(0deg 99% 18%);
color: white;
}

div.admonition.admonition-youtube > .admonition-title::after {
color: hsl(0deg 100% 50%);
content: "\f26c"; /* fa-solid fa-tv */
}
File renamed without changes.
8 changes: 8 additions & 0 deletions docs/source/data/dataset-event.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<!--
* @Date: 2024-12-01 08:37:03
* @LastEditors: caishaofei caishaofei@stu.pku.edu.cn
* @LastEditTime: 2024-12-01 08:39:05
* @FilePath: /MineStudio/docs/source/data/dataset-event.md
-->

# Event Dataset
8 changes: 8 additions & 0 deletions docs/source/data/dataset-raw.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<!--
* @Date: 2024-12-01 08:37:10
* @LastEditors: caishaofei caishaofei@stu.pku.edu.cn
* @LastEditTime: 2024-12-01 08:39:13
* @FilePath: /MineStudio/docs/source/data/dataset-raw.md
-->

# Raw Dataset
118 changes: 116 additions & 2 deletions docs/source/data/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,121 @@
<!--
* @Date: 2024-11-29 08:08:34
* @LastEditors: caishaofei caishaofei@stu.pku.edu.cn
* @LastEditTime: 2024-11-29 08:10:36
* @LastEditTime: 2024-12-01 08:38:35
* @FilePath: /MineStudio/docs/source/data/index.md
-->
# Data
# Data

We design a trajectory structure for storing Minecraft data. Based on this data structure, users are able to store and retrieve arbitray trajectory segment in an efficient way.

```{toctree}
:caption: MineStudio Data
dataset-raw
dataset-event
```

## Quick Start
````{include} quick-data.md
````

## Data Structure

We classify and save the data according to its corresponding modality, with each modality's data being a sequence over time. Sequences from different modalities can be aligned in chronological order. For example, the "action" modality data stores the mouse and keyboard actions taken at each time step of the trajectory; the "video" modality data stores the observations returned by the environment at each time step of the trajectory.

```{note}
The data of different modalities is stored independently. The benefits are: (1) Users can selectively read data from different modalities according to their requirements; (2) Users are easily able to add new modalities to the dataset without affecting the existing data.
```

For the sequence data of each modality, we store it in segments, with each segment having a fixed length (e.g., 32), which facilitates the reading and storage of the data.

```{note}
For video data, the efficiency of random access is usually low because decoding is required during the reading process. An extreme case would be to save it as individual images, which would allow for high read efficiency but take up a large amount of storage space.
We adopt a compromise solution by saving the video data in video segments, which allows for relatively high read efficiency while not occupying too much storage space. When user wants to read a sequence of continuous frames, we only need to retrieve the corresponding segments and decode them.
```

![](./read_video_fig.png)

````{dropdown} <i class="fa-solid fa-lightbulb" height="35px" width="20px"></i> Learn more about the details
Segmented sequence data is stored in individual [lmdb](https://lmdb.readthedocs.io/en/release/) files, each of which contains the following metadata:
```python
{
"__num_episodes__": int, # the total number of episodes in this lmdb file
"__num_total_frames__": int, # the total number of frames in this lmdb file
"__chunk_size__": int, # the length of each segment (e.g. 32)
"__chunk_infos__": dict # save the information of the episode part in this lmdb file, e.g. the start and end index, episode name.
}
```
Once you know the episode name and which segment you want to read, you can identify the corresponding segment bytes in the lmdb file and decode it to get the data.
```python
with lmdb_handler.begin() as txn:
key = str((episode_idx, chunk_id)).encode()
chunk_bytes = txn.get(key)
```
```{hint}
In fact, you don't need to worry about these at all, as we have packaged these operations for you. You just need to call corresponding API. The class that is responsible for managing these details is `minestudio.data.minecraft.core.LMDBDriver`.
```
With ``LMDBDriver``, you can do these operations to a lmdb file:
- Get the trajectory list:
```python
trajectory_list = lmdb_driver.get_trajectory_list()
```
- Get the total frames of several trajectories:
```python
lmdb_driver.get_total_frames([
"trajectory_1",
"trajectory_2",
"trajectory_3"
])
```
- Read a sequence of frames from a trajectory:
```python
frames, mask = lmdb_driver.read_frames(
eps="trajectory_1",
start_frame=11,
win_len=33,
merge_fn=merge_fn,
extract_fn=extract_fn,
padding_fn=padding_fn,
)
```
```{note}
``merge_fn``, ``extract_fn``, and ``padding_fn`` are functions that are used to process the data and are specific to the data modality.
```
````

### Built-in Modalities

We provide the following built-in modalities for users to store data:

| Modality | Description | Data Format |
| --- | --- | --- |
| video | Observations returned by the environment | np.ndarry |
| action | Mouse and keyboard actions | Dict |
| contractor info | Information of the contractor | Dict |
| segment info | Information of the segment | Dict |


````{admonition} Video and Segmentation Visualization
:class: dropdown admonition-youtube
<!--
An video example generated by our tool to show video and the corresponding segmentation sequences. -->
```{youtube} QYBUxus3esI
```
````


### Build Dataset from Your Collected Trajectories


80 changes: 80 additions & 0 deletions docs/source/data/quick-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<!--
* @Date: 2024-12-01 08:30:33
* @LastEditors: caishaofei caishaofei@stu.pku.edu.cn
* @LastEditTime: 2024-12-01 08:41:00
* @FilePath: /MineStudio/docs/source/data/quick-data.md
-->
Here is a minimal example to show how we load a trajectory from the dataset.

```python
from minestudio.data import RawDataset

dataset = RawDataset(
dataset_dirs=['/nfs/data/contractors/dataset_7xx'],
enable_video=True,
enable_action=True,
frame_width=224,
frame_height=224,
win_len=128,
split='train',
split_ratio=0.9,
verbose=True
)
item = dataset[0]
print(item.keys())
```

You may see the output like this:
```
[08:14:15] [Kernel] Driver video load 4617 episodes.
[08:14:15] [Kernel] Driver action load 4681 episodes.
[08:14:15] [Kernel] episodes: 4568, frames: 65291168.
dict_keys(['text', 'timestamp', 'episode', 'progress', 'env_action', 'agent_action', 'env_prev_action', 'agent_prev_action', 'image', 'mask'])
```

```{button-ref} ./dataset-raw
:color: primary
:outline:
:expand:
Learn more about Raw Dataset
```

Alternatively, you can also load trajectories that have specific events, for example, loading all trajectories that contain the ``kill entity`` event.

```python
from minestudio.data import EventDataset

dataset = EventDataset(
dataset_dirs=['/nfs/data/contractors/dataset_7xx'],
enable_video=True,
enable_action=True,
frame_width=224,
frame_height=224,
win_len=128,
split='train',
split_ratio=0.9,
verbose=True,
event_regex='minecraft.kill_entity:.*'
)
item = dataset[0]
print(item.keys())
```

You may see the output like this:
```
[08:19:14] [Kernel] Driver video load 4617 episodes.
[08:19:14] [Kernel] Driver action load 4681 episodes.
[08:19:14] [Kernel] episodes: 4568, frames: 65291168.
[08:19:14] [Event Kernel] Number of loaded events: 58.
[08:19:14] [Event Dataset] Regex: minecraft.kill_entity:.*, Number of events: 58, number of items: 19652
dict_keys(['text', 'env_action', 'agent_action', 'env_prev_action', 'agent_prev_action', 'image', 'mask'])
```

```{button-ref} ./dataset-event
:color: primary
:outline:
:expand:
Learn more about Event Dataset
```
Binary file added docs/source/data/read_video_fig.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
35 changes: 26 additions & 9 deletions docs/source/overview/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!--
* @Date: 2024-11-29 08:08:13
* @LastEditors: caishaofei caishaofei@stu.pku.edu.cn
* @LastEditTime: 2024-11-30 13:33:38
* @LastEditTime: 2024-12-01 08:33:39
* @FilePath: /MineStudio/docs/source/overview/getting-started.md
-->
# Getting Started
Expand Down Expand Up @@ -34,24 +34,41 @@ Before you start, make sure you have installed [MineStudio](https://github.com/p
## MineStudio Libraries Quickstart

Click on the dropdowns for your desired library to get started:
```{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Simulator: Customizable Minecraft Environment
```{include} quick-simulator.md
````{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Simulator: Customizable Minecraft Environment
```{include} ../simulator/quick-simulator.md
```
```{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Data: Flexible Data Structures and Efficient Data Processing
```{include} quick-data.md
```{button-ref} ../simulator/index
:color: primary
:outline:
:expand:
Learn more about MineStudio Simulator
```
```{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Models: Policy Template and Baseline Models
````

````{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Data: Flexible Data Structures and Fast Dataloaders
```{include} ../data/quick-data.md
```
````

````{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Models: Policy Template and Baselines
```{include} quick-models.md
```
```{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Train: Training Policy with Offline Data
````

````{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Train: Training Policy with Offline Data
```{include} quick-train.md
```
```{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Inference: Parallel Inference and Record Trajectories
````

````{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Inference: Parallel Inference and Record Demonstrations
```{include} quick-inference.md
```
```{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Benchmark: Benchmarking and Evaluation
````

````{dropdown} <img src="../_static/logo-no-text-gray.svg" alt="minestudio" width="35px"> Benchmark: Benchmarking and Evaluation
```{include} quick-benchmark.md
```
````

## Papers

Expand Down
Empty file removed docs/source/overview/quick-data.md
Empty file.
16 changes: 3 additions & 13 deletions docs/source/simulator/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!--
* @Date: 2024-11-29 08:09:07
* @LastEditors: caishaofei caishaofei@stu.pku.edu.cn
* @LastEditTime: 2024-11-30 05:12:53
* @LastEditTime: 2024-12-01 08:27:24
* @FilePath: /MineStudio/docs/source/simulator/index.md
-->

Expand All @@ -16,19 +16,9 @@ general-information
design-principles
```

## Hello World
## Quick Start

Here is a minimal example of how to use the simulator:

```python
from minestudio.simulator import MinecraftSim

sim = MinecraftSim(action_type="env")
obs, info = sim.reset()
for _ in range(100):
action = sim.action_space.sample()
obs, reward, terminated, truncated, info = sim.step(action)
sim.close()
```{include} quick-simulator.md
```

## Basic Arguments
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<!--
* @Date: 2024-11-30 05:44:44
* @LastEditors: caishaofei caishaofei@stu.pku.edu.cn
* @LastEditTime: 2024-11-30 05:55:49
* @FilePath: /MineStudio/docs/source/overview/quick-simulator.md
* @LastEditTime: 2024-12-01 08:28:50
* @FilePath: /MineStudio/docs/source/simulator/quick-simulator.md
-->

Here is a minimal example of how to use the simulator:
Expand Down Expand Up @@ -63,11 +63,3 @@ for i in range(100):
obs, reward, terminated, truncated, info = sim.step(action)
sim.close()
```

```{button-ref} ../simulator/index
:color: primary
:outline:
:expand:
Learn more about MineStudio Simulator
```
File renamed without changes.
7 changes: 4 additions & 3 deletions minestudio/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
'''
Date: 2024-11-11 15:59:37
LastEditors: caishaofei-mus1 1744260356@qq.com
LastEditTime: 2024-11-12 14:02:41
LastEditors: caishaofei caishaofei@stu.pku.edu.cn
LastEditTime: 2024-12-01 08:11:22
FilePath: /MineStudio/minestudio/data/__init__.py
'''
from minestudio.data.datamodule import MineDataModule
from minestudio.data.datamodule import MineDataModule
from minestudio.data.minecraft import RawDataset, EventDataset
4 changes: 2 additions & 2 deletions minestudio/data/datamodule.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
'''
Date: 2024-11-10 12:31:33
LastEditors: caishaofei caishaofei@stu.pku.edu.cn
LastEditTime: 2024-11-28 16:18:25
LastEditTime: 2024-12-01 08:06:07
FilePath: /MineStudio/minestudio/data/datamodule.py
'''

Expand Down Expand Up @@ -120,7 +120,7 @@ def val_dataloader(self):
),
batch_size=8,
num_workers=8,
train_shuffle=True,
shuffle_episodes=True,
prefetch_factor=4,
)
data_module.setup()
Expand Down
Loading

0 comments on commit 0c11ac2

Please sign in to comment.