Skip to content

Latest commit

 

History

History
107 lines (85 loc) · 3.63 KB

readme.md

File metadata and controls

107 lines (85 loc) · 3.63 KB

srtgen

Generate subtitles for video file

Using the paid Google Cloud Speech-To-Text API

This program requires a Google account and an API key: Create project on Google Cloud

usage

$ ./srtgen.py 
usage
  srtgen.py --apikey path/to/keyfile.json path/to/input-video.mp4

environment variables
  GOOGLE_APPLICATION_CREDENTIALS=path/to/keyfile.json srtgen.py path/to/input-video.mp4

keyfile
  This program requires a Google account and an API key
  https://console.cloud.google.com/projectcreate

subtitle is written to stdout and output/xxxxxx-input-video.mp4/output_file.srt
where xxxxxx is the sha1 hash of the input video file

temporary files are stored in output/xxxxxx-input-video.mp4/

features

  • workaround size limit in google API
    • no need for Google Cloud Storage = gs protocol
    • duration is limited to 60 seconds
    • file size is limited to 10485760 bytes

dependencies

  • ffmpeg
  • python
    • pydub
    • google.cloud.speech
      • API key
      • pricing
        • speech recognition needs lots of space and time = there is no free lunch
        • https://cloud.google.com/speech-to-text/pricing#pricing_table
          • first hour is free
            • TODO one hour per month or one hour per google account?
          • Speech Recognition without Data Logging: $0.006 / 15 seconds = $0.024 / 1 minute = about $1.50 / 1 hour
          • Speech Recognition with Data Logging: $0.004 / 15 seconds = $0.016 / 1 minute = about $1.00 / 1 hour
          • Data Logging = feedback of manually corrected text to improve quality of service
            • TODO implement upload of corrected text
      • TODO Automatic punctuation

related

based on

postprocessing tools

similar tools

todo

  • use speech_recognition module, so srtgen can use multiple backend services
  • hybrid of offline and online speech recognition
    • deepspeech for offline speech recognition
    • google for online speech recognition
    • can deepspeech return confidence values?
    • run deepspeech with different models? (and manually select the best result?)
  • automatic postprocessing
    • reduce manual work
    • split long sentences
    • merge short sentences