optim

jianchang512 · Oct 29, 2023 · 6b38324 · 6b38324
1 parent 2a8dd4e
commit 6b38324
Show file tree

Hide file tree

Showing 6 changed files with 97 additions and 65 deletions.
diff --git a/README.md b/README.md
@@ -12,6 +12,8 @@
 4. 视频原始语言：选择待翻译视频里的语言种类
 5. 翻译目标语言：选择希望翻译到的语言种类   
 6. 选择配音：选择翻译目标语言后，可从配音选项中，选择配音角色；
+
+   嵌入字幕：是否翻译后将字幕嵌入视频 （该参数和“选择配音”必须至少设置其中一个,也就是不能“既不嵌入字幕又不选择配音角色”）
 7. 文字识别模型: 选择 base/small/medium/large, 识别效果越来越好，但识别阅读越来越慢，第一次将需要下载模型，默认 base   
 8. 配音语速：填写 -10到+90 之间的数字，同样一句话在不同语言语音下，所需时间是不同的，因此配音后可能声画字幕不同步，可以调整此处语速，负数代表降速，正数代表加速播放。
 9. 自动加速: 选择Yes或者No，如果翻译后的语音时长大于原时长，并且这里选择“Yes”，那么将强制加速播放该片段，以缩小时长
@@ -71,7 +73,9 @@
 
 **--proxy**：填写 http 代理地址，默认 None,如果所在地区无法访问google，需要填写，例如: `http://127.0.0.1:10809`
 
-**--voice_replace**：根据所选目标语言代码，填写对应的角色名，注意角色名的前2个字母需要和目标语言代码的前2个字母一致，如果不知道该怎么填写，执行`python cli.py show_vioce` 将显示每种语言对应可用的角色名称
+**--insert_subtitle**：是否翻译后将字幕嵌入视频 （该参数和 --voice_role 必须至少设置其中一个,也就是不能“既不嵌入字幕又不选择配音角色”）
+
+**--voice_role**：根据所选目标语言代码，填写对应的角色名，注意角色名的前2个字母需要和目标语言代码的前2个字母一致，如果不知道该怎么填写，执行`python cli.py show_vioce` 将显示每种语言对应可用的角色名称
 
     af: af-ZA-AdriNeural, af-ZA-WillemNeural
     sq: sq-AL-AnilaNeural, sq-AL-IlirNeural
@@ -168,7 +172,7 @@
     cy: cy-GB-AledNeural, cy-GB-NiaNeural
     zu: zu-ZA-ThandoNeural, zu-ZA-ThembaNeural
 
-**--voice_rate**：负数降低配音语速，正数加快配音语速，默认`10`,即加快
+**--voice_rate**：负数降低配音语速，正数加快配音语速，默认`0`,即加快
 
 **--voice_silence**: 输入100-2000之间的数字，表示静音段的最小毫秒，默认为300。
 
@@ -178,6 +182,8 @@
 
 **--remove_background**：是否移除背景音，如果传入该参数即代表去除背景音
 
+
+
 **cli示例**
 
 `python cli.py --source_mp4 "D:/video/ex.mp4" --source_language en --target_language zh-cn --proxy "http://127.0.0.1:10809" --voice_replace zh-CN-XiaoxiaoNeural`

diff --git a/README_ENG.md b/README_ENG.md
@@ -14,6 +14,7 @@ The speech recognition is based on the offline model 'openai-whisper', the text
 5. Original video language: Select the language of the video to be translated.
 6. Target translation language: Select the desired language for translation.
 7. Select dubbing: After selecting the target translation language, you can choose a dubbing role from the dubbing options.
+   Embedding Subtitle: embedding subtitles to video, meaning ‘neither embedding subtitles nor selecting voiceover characters’ is not allowed
 8. Text recognition model: Choose base/small/medium/large. The recognition effect improves as the model size increases, but the reading recognition speed slows down. The base model is the default and needs to be downloaded for the first time.
 9. Dubbing speed: Enter a number between -10 and +90. The length of the same sentence varies under different language synthesizations. Therefore, the dubbing may not be synchronized with the subtitles. Adjust the speed here, where negative numbers indicate slowing down and positive numbers indicate speeding up.
 10. Auto acceleration: Select Yes or No. If the duration of the translated speech is longer than the original duration and you select "Yes" here, the segment will be forced to be accelerated to reduce the length.
@@ -68,7 +69,9 @@ The speech recognition is based on the offline model 'openai-whisper', the text
 
 **--proxy**: Specify an HTTP proxy address. Default is None. If you are unable to access Google from your location, you need to provide a proxy address. For example: `http://127.0.0.1:10809`
 
-**--voice_replace**: Provide the corresponding character name based on the target language code. Make sure the first two letters of the character name match the first two letters of the target language code. If you are unsure how to fill in this parameter, run `python cli.py show_voice` to display the available character names for each language.
+**--insert_subtitle**：Whether to embed subtitles in the video after translation (either this parameter or --voice_role must be set, meaning "neither embedding subtitles nor selecting voiceover characters" is not allowed).
+
+**--voice_role**: Provide the corresponding character name based on the target language code. Make sure the first two letters of the character name match the first two letters of the target language code. If you are unsure how to fill in this parameter, run `python cli.py show_voice` to display the available character names for each language.
 
 
     af: af-ZA-AdriNeural, af-ZA-WillemNeural
@@ -167,7 +170,7 @@ The speech recognition is based on the offline model 'openai-whisper', the text
     zu: zu-ZA-ThandoNeural, zu-ZA-ThembaNeural
 
 
-**--voice_rate**: Adjust the speed of the voice dubbing. Use negative numbers to decrease the speed and positive numbers to increase it. The default value is `10`, which represents an increase in speed.
+**--voice_rate**: Adjust the speed of the voice dubbing. Use negative numbers to decrease the speed and positive numbers to increase it. The default value is `0`, which represents an increase in speed.
 
 **--remove_background**: Specify this parameter to remove the background music.
 
@@ -180,12 +183,12 @@ The speech recognition is based on the offline model 'openai-whisper', the text
 
 **CLI Example**
 
-`python cli.py --source_mp4 "D:/video/ex.mp4" --source_language en --target_language zh-cn --proxy "http://127.0.0.1:10809" --voice_replace zh-CN-XiaoxiaoNeural`
+`python cli.py --source_mp4 "D:/video/ex.mp4" --source_language en --target_language zh-cn --proxy "http://127.0.0.1:10809" --voice_role zh-CN-XiaoxiaoNeural`
 
 In the above example, it translates the video located at "D:/video/ex.mp4" from English to Chinese, sets the proxy to "http://127.0.0.1:10809", and uses the voice replacement of "zh-CN-XiaoxiaoNeural".
 
 `python cli.py --source_mp4 "D:/video/ex.mp4" --source_language zh-cn --target_language en  --proxy "http://127.0.0.1"1080
-9"  --voice_replace en-US-AriaNeural --voice_autorate  --whisper_model small`
+9"  --voice_role en-US-AriaNeural --voice_autorate  --whisper_model small`
 
 The above means to translate the video D:/video/ex.mp4 with the source language as Chinese to the target language as English. Set the proxy as http://127.0.0.1:10809 and use the voiceover role en-US-AriaNeural. If the translated audio duration is longer than the original audio, it will automatically be accelerated. The text recognition model for speech recognition is set to use the small model.
 

diff --git a/cli.py b/cli.py
@@ -59,36 +59,38 @@ def error(text):
 
 
 def init_args():
-    print(sys.argv)
     parser = argparse.ArgumentParser(prog='video_translate',
                                      description='Seamlessly translate mangas into a chosen language')
 
-    parser.add_argument('--source_mp4', required=False, default=None, type=str, help='The path of the MP4 video to '
+    parser.add_argument('-mp4','--source_mp4', required=False, default=None, type=str, help='The path of the MP4 video to '
                                                                                      'be translated.')
-    parser.add_argument('--target_dir', default='', type=lower, help='Translated Video Save Directory')
+    parser.add_argument('-td','--target_dir', default='', type=lower, help='Translated Video Save Directory')
 
-    parser.add_argument('--source_language', default='en', type=lower, help='Original Language of the Video')
-    parser.add_argument('--target_language', default='zh-cn', type=lower,
+    parser.add_argument('-sl','--source_language', default='en', type=lower, help='Original Language of the Video')
+    parser.add_argument('-tl','--target_language', default='zh-cn', type=lower,
                         help='Target Language of the Video Translation')
 
-    parser.add_argument('--proxy', type=lower, default=None, help='Internet Proxy Address like http://127.0.0.1:10809')
+    parser.add_argument('-p','--proxy', type=lower, default=None, help='Internet Proxy Address like http://127.0.0.1:10809')
 
-    parser.add_argument('--voice_silence', default='300', type=int, help='the minimum length for any silent section')
-    parser.add_argument('--voice_autorate', action='store_true', help='If the translated audio is longer, can it be '
+    parser.add_argument('-vs','--voice_silence', default='300', type=int, help='the minimum length for any silent section')
+    parser.add_argument('-va','--voice_autorate', default=False, action='store_true', help='If the translated audio is longer, can it be '
                                                                       'automatically accelerated to align with the '
                                                                       'original duration?')
-    parser.add_argument('--whisper_model', default='base', help='From base to large, the effect gets better and the '
+    parser.add_argument('-wm','--whisper_model', default='base', help='From base to large, the effect gets better and the '
                                                                 'speed slows down.')
 
-    parser.add_argument('--voice_replace', default='No', type=str, help='Select Voiceover Character Name')
+    parser.add_argument('-vro','--voice_role', default='No', type=str, help='Select Voiceover Character Name')
 
-    parser.add_argument('--voice_rate', default='0', type=str,
+    parser.add_argument('-vr','--voice_rate', default='0', type=str,
                         help='Specify Voiceover Speed, positive number for acceleration, negative number for '
                              'deceleration')
 
-    parser.add_argument('--remove_background', action='store_true', help='Remove Background Music')
+    parser.add_argument('-rb','--remove_background', action='store_true', help='Remove Background Music')
+    parser.add_argument('-is','--insert_subtitle', action='store_true', help='Insert subtitle to video')
 
     args = vars(parser.parse_args())
+    # print(args)
+    # exit()
 
     if not args['source_mp4'] or not os.path.exists(args['source_mp4']) or not args['source_mp4'].lower().endswith(
             ".mp4"):
@@ -99,12 +101,16 @@ def init_args():
             f"The original language and target language for the video must be selected from the following options: {','.join(lang_code.keys())}")
 
     voice_role = set_default_voice(args['target_language'])
-    if args['voice_replace'] != 'No' and (args['voice_replace'].lower() not in voice_role_lower):
+    if args['voice_role'] != 'No' and (args['voice_role'].lower() not in voice_role_lower):
         rolestr = "\n".join(voice_role[1:])
         error(
             f"The voice role does not exist..\nList of available voice roles\n{rolestr}")
-    elif args['voice_replace'].lower() in voice_role_lower:
-        args['voice_replace'] = voice_role[voice_role_lower.index(args['voice_replace'].lower())]
+    elif args['voice_role'].lower() in voice_role_lower:
+        args['voice_role'] = voice_role[voice_role_lower.index(args['voice_role'].lower())]
+
+    if not args['insert_subtitle'] and args['voice_role']=='No':
+        error("The --insert_subtitle and --voice_role parameters need to be set at least one of them. \nChoose either embedding subtitles or voiceover characters, at least one of them needs to be selected.")
+
     rate = int(args['voice_rate'])
     if rate >= 0:
         args['voice_rate'] = f"+{args['voice_rate']}%"
@@ -158,7 +164,7 @@ def running(p):
     if not os.path.exists(a_name):
         os.system(f"ffmpeg -i {dirname}/{mp4name} -acodec pcm_s16le -f s16le -ac 1  -f wav {a_name}")
     # 如果选择了去掉背景音，则重新整理为 a_name{voial}.wav
-    if config.video_config['voice_replace'] != 'No' and config.video_config['remove_background'] == 'Yes':
+    if config.video_config['voice_role'] != 'No' and config.video_config['remove_background']:
         import warnings
         warnings.filterwarnings('ignore')
         from spleeter.separator import Separator

diff --git a/config.py b/config.py
@@ -22,9 +22,11 @@
         "running": "执行中",
         "exit": "退出",
         "end": "已结束",
-        "stop": "已停止"
+        "stop": "已停止",
+        "subtitleandvoice_role":"不能既不嵌入字幕又不选择配音角色，二者至少选一"
     },
     "en": {
+        "subtitleandvoice_role":"embedding subtitles or selecting voiceover characters must be set, meaning ‘neither embedding subtitles nor selecting voiceover characters’ is not allowed.",
         "proxyerrortitle": "Proxy Error",
         "proxyerrorbody": "Failed to access Google services. Please set up the proxy correctly.",
         "softname": "Video Subtitle Translation and Dubbing",
@@ -83,7 +85,7 @@
                              enable_events=True
                              ),
                     sg.Text('选择配音', background_color="#e3f2fd", text_color='#212121'),
-                    sg.Combo(['None'], default_value="None", readonly=True, key="voice_replace", size=(18, None)),
+                    sg.Combo(['No'], default_value="No", readonly=True, key="voice_role", size=(18, None)),
                 ],
                 [
                     sg.Text('文字识别模型', background_color="#e3f2fd", text_color='#212121', tooltip="越大效果越好，识别速度越慢"),
@@ -98,13 +100,12 @@
                 ],
 
                 [
-                    sg.Text('自动加速?', background_color="#e3f2fd", text_color='#212121'),
-                    sg.Combo(['No', 'Yes'], tooltip="如果翻译后语音播放时长大于原时长，是否自动加速播放强制时间对齐",
-                             default_value=sg.user_settings_get_entry('voice_autorate', 'No'),
-                             readonly=True, key="voice_autorate", size=(18, None)),
-                    sg.Text('去除背景音?', background_color="#e3f2fd", text_color='#212121'),
-                    sg.Combo(['No', 'Yes'], default_value=sg.user_settings_get_entry('remove_background', 'No'),
-                             readonly=True, key="remove_background", size=(18, None)),
+                    # sg.Text('', background_color="#e3f2fd", text_color='#212121'),
+                    sg.Checkbox('自动加速?', background_color="#e3f2fd",text_color='#212121',tooltip="如果翻译后语音播放时长大于原时长，是否自动加速播放强制时间对齐",
+                             default=False, key="voice_autorate", size=(18, None)),
+                    # sg.Text('去除背景音?', background_color="#e3f2fd", text_color='#212121'),
+                    sg.Checkbox('去除背景音?', background_color="#e3f2fd",text_color='#212121',default=False, key="remove_background", size=(18, None)),
+                    sg.Checkbox('嵌入字幕到视频?', text_color='#212121',background_color="#e3f2fd",default=True, key="insert_subtitle", size=(18, None)),
                 ],
                 [
                     sg.Text('静音片段', tooltip="用于分割语音的静音片段时长，单位ms", background_color="#e3f2fd",
@@ -215,7 +216,7 @@
                                  enable_events=True
                                  ),
                         sg.Text('Select Voice Replacement', background_color="#e3f2fd", text_color='#212121'),
-                        sg.Combo(['No'], default_value="No", readonly=True, key="voice_replace", size=(18, None)),
+                        sg.Combo(['No'], default_value="No", readonly=True, key="voice_role", size=(18, None)),
                     ],
                     [
                         sg.Text('Whisper Model', background_color="#e3f2fd", text_color='#212121',
@@ -233,14 +234,19 @@
                     ],
 
                     [
-                        sg.Text('Automatic acceleration?', background_color="#e3f2fd", text_color='#212121'),
-                        sg.Combo(['No', 'Yes'],
-                                 tooltip="If the translated audio is longer, can it be automatically accelerated to align with the original duration?",
-                                 default_value=sg.user_settings_get_entry('voice_autorate', 'No'),
-                                 readonly=True, key="voice_autorate", size=(18, None)),
-                        sg.Text('Remove background sound?', background_color="#e3f2fd", text_color='#212121'),
-                        sg.Combo(['No', 'Yes'], default_value=sg.user_settings_get_entry('remove_background', 'No'),
-                                 readonly=True, key="remove_background", size=(18, None)),
+                        # sg.Text('Automatic acceleration?', background_color="#e3f2fd", text_color='#212121'),
+                        # sg.Combo(['No', 'Yes'],
+                        #          tooltip="If the translated audio is longer, can it be automatically accelerated to align with the original duration?",
+                        #          default_value=sg.user_settings_get_entry('voice_autorate', 'No'),
+                        #          readonly=True, key="voice_autorate", size=(18, None)),
+                        # sg.Text('Remove background sound?', background_color="#e3f2fd", text_color='#212121'),
+                        # sg.Combo(['No', 'Yes'], default_value=sg.user_settings_get_entry('remove_background', 'No'),
+                        #          readonly=True, key="remove_background", size=(18, None)),
+
+                        sg.Checkbox('Automatic acceleration?', text_color='#212121',background_color="#e3f2fd",tooltip="If the translated audio is longer, can it be automatically accelerated to align with the original duration",
+                             default=False, key="voice_autorate", size=(18, None)),
+                        sg.Checkbox('Remove background sound?', text_color='#212121',background_color="#e3f2fd",default=False, key="remove_background", size=(18, None)),
+                        sg.Checkbox('Embedding Subtitle?', text_color='#212121',background_color="#e3f2fd",default=True, key="insert_subtitle", size=(18, None)),
                     ],
                     [
                         sg.Text('minimum silent section', tooltip="split audio by this value /ms",
@@ -322,12 +328,12 @@
     "target_language": "zh-cn",
     "subtitle_language": "chi",
 
-    "voice_replace": "No",
+    "voice_role": "No",
     "voice_rate": "0",
 
-    "voice_autorate": "No",
     "voice_silence": "300",
     "whisper_model": "base",
-
-    "remove_background": "No"
+    "insert_subtitle":True,
+    "voice_autorate": False,
+    "remove_background": False
 }