allenai · Mar 25, 2018 · Mar 25, 2018 · Mar 25, 2018 · Mar 28, 2018 · Mar 28, 2018
diff --git a/README.md b/README.md
@@ -29,6 +29,7 @@ python -m squad.prepro
 ```
 
 ## 2. Training
+
 The model has ~2.5M parameters.
 The model was trained with NVidia Titan X (Pascal Architecture, 2016).
 The model requires at least 12GB of GPU RAM.
@@ -56,8 +57,8 @@ Note that during the training, the EM and F1 scores from the occasional evaluati
 The printed scores are not official (our scoring scheme is a bit harsher).
 To obtain the official number, use the official evaluator (copied in `squad` folder, `squad/evaluate-v1.1.py`). For more information See 3.Test.
 
+## 3. Test on the SQuAD dataset
 
-## 3. Test
 To test, run:
 ```
 python -m basic.cli
@@ -99,43 +100,35 @@ If you are unfamiliar with CodaLab, follow these simple steps (given that you me
   ```
   If you want to run on GPU, you should run the script sequentially by removing '&' in the forloop, or you will need to specify different GPUs for each run of the for loop.
 
-## Results
-
-### Dev Data
+## 4. Train over MS-MARCO
 
-Note these scores are from the official evaluator (copied in `squad` folder, `squad/evaluate-v1.1.py`). For more information See 3.Test.
-The scores appeared during the training could be lower than the scores from the official evaluator. 
+To train over MS-MARCO, copy the documents in `marco-data` within the repository to `$HOME/data/marco`. These documents were created from running the MArcoToSquadConverter tool under `tools` upon the original downloaded MS-MARCO dataset. This filters the questions that we want to study, where the answer is a subspan of the passage, and furthermore modifies the format so it matches that of the SQuAD dataset, and fits the input of the bi-directional attention flow implementation.
 
-|          | EM (%) | F1 (%) |
-| -------- |:------:|:------:|
-| single   | 67.7   | 77.3   |
-| ensemble | 72.6   | 80.7   |
+Preprocess the MS-Marco data:
 
-### Test Data
+```
+python -m marco.prepro
+```
 
-|          | EM (%) | F1 (%) |
-| -------- |:------:|:------:|
-| single   | 68.0   | 77.3   |
-| ensemble | 73.3   | 81.1   |
+Before training, it is recommended to first try the following code to verify everything is okay and memory is sufficient:
+```
+python -m marco.cli --mode train --debug
+```
 
-Refer to [our paper][paper] for more details.
-See [SQuAD Leaderboard][squad] to compare with other models.
+Then, train the existing model. It is important to not specify `--noload` so that it will load the network we trained with the SQuAD dataset:
 
+```
+python -m marco.cli --mode train --debug
+```
 
-<!--
-## Using Pre-trained Model
+To test on MS-MARCO, you can run:
 
-If you would like to use pre-trained model, it's very easy! 
-You can download the model weights [here][save] (make sure that its commit id matches the source code's).
-Extract them and put them in `$PWD/out/basic/00/save` directory, with names unchanged.
-Then do the testing again, but you need to specify the step # that you are loading from:
 ```
-python -m basic.cli --mode test --batch_size 8 --eval_num_batches 0 --load_step ####
+python -m marco.cli --len_opt --cluster
 ```
--->
 
+## 5. Multi-GPU Training & Testing
 
-## Multi-GPU Training & Testing
 Our model supports multi-GPU training.
 We follow the parallelization paradigm described in [TensorFlow Tutorial][multi-gpu].
 In short, if you want to use batch size of 60 (default) but if you have 3 GPUs with 4GB of RAM,

diff --git a/docs/Transfer learning for Machine reading comprehension.pdf b/docs/Transfer learning for Machine reading comprehension.pdf
diff --git a/docs/Transfer learning for Machine reading comprehension.pptx b/docs/Transfer learning for Machine reading comprehension.pptx
diff --git a/docs/acl2018.bib b/docs/acl2018.bib
@@ -0,0 +1,74 @@
+@article{conneau:2017,
+	author = {A. Conneau and others},
+	year = "2017",
+	title = {Supervised Learning of Universal Sentence Representations from Natural Language Inference Data}
+}
+
+@article{bidaf:2017,
+	author = {Minjoon Seo and others},
+	year = "2017",
+	title = {Bi-directional attention flow for machine comprehension}
+}
+
+@article{imagenet,
+	author = {Jia Deng and others},
+	title = {ImageNet: A Large-Scale Hierarchical Image Database}
+}
+
+
+
+@article{transfertextclassification,
+	author = {Chuong Do and Andrew Ng},
+	title = {Transfer learning for text classification}
+}
+
+
+@article{msmarco:2016,
+	author = {Tri Nguyen and others},
+	year = "2016",
+	title = {MS MARCO: A Human Generated MAchine Reading COmprehension Dataset}
+}
+
+
+@article{domainadaption,
+	author = {John Blitzer and others},
+	title = {Domain Adaptation with Structural Correspondence Learning}
+}
+
+
+@article{deepcontextualizedwr,
+	author = {Matthew Peters and others},
+	title = {Deep contextualized word representations}
+}
+
+
+@article{squad:2016,
+	author = {Pranav Rajpurkar and others},
+	year = "2016",
+	title = {SQuAD: 100,000+ Questions for Machine Comprehension of Text}
+}
+
+
+@article{nliwithlstm,
+	author = {Shuohang Wang and others},
+	year = "2016",
+	title = {Learning Natural Language Inference with LSTM}
+}
+
+
+@article{rnet,
+	author = {Natural Language Computing Group, Microsoft Research Asia},
+	title = {R-NET: Machine reading comprehension with self-matching networks}
+}
+
+
+@article{matchlstm,
+	author = {Wang and others},
+	year = "",
+	title = {Machine comprehension using match-LSTM and answer pointer}
+}
+
+@article{surveytransferlearning,
+	author = {Sinno Jialin Pan and Qiang Yang},
+	title = {A Survey on Transfer Learning}
+}