Skip to content

PySpark and term-frequencies support for large datasets

Compare
Choose a tag to compare
@thammegowda thammegowda released this 14 Jun 22:29
· 75 commits to master since this release
3c732d3
  • Option to accept term frequencies as input
  • PySpark backend to compute word and char frequencies
  • --min-co-ev of BPE is CLI arg