Skip to content

Latest commit

 

History

History
27 lines (20 loc) · 708 Bytes

TODO.md

File metadata and controls

27 lines (20 loc) · 708 Bytes

Todo

  • Find datasets for multiple languages

    • English (Zahra)
    • Italian (Paolo)
    • Whatever is on Kaggle (Ettore)
  • Fix the vm (might be memory config)

  • Parse the data into raw txt containing text only

  • Write a version of the program that works and remember to add

    • Combiner and In-Mapper Combiner
    • Setup and Cleanup methods
    • Add experiments with different number of reducers
  • Carry out experiments

    • Letter frequency per language
    • Statistics on execution
      • execution time
      • Memory usage
      • Number of InputSplits
      • Impact of In-Mapper Combiner
  • Fix bugs

  • Write a short report (LaTeX)