-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat][Experimental] Support parallel computing (embarassingly parallel) #3685
base: develop
Are you sure you want to change the base?
[Feat][Experimental] Support parallel computing (embarassingly parallel) #3685
Conversation
Thanks for your contribution! |
对于并行这个问题。我最近在考虑是否有必要,在基类支持非generator行为的predict方法(假定为 |
return _executor | ||
|
||
|
||
def maybe_parallelize(func, /, *iterables, executor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为根据实际配置,有可能并行处理,也有可能不并行处理,只能保证“可能被并行处理”,所以用这个名字防止误解。
if executor is None: | ||
executor = _executor | ||
if executor is None: | ||
executor = joblib.Parallel(n_jobs=_get_default_num_jobs(), prefer="threads") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
经过实际测试,如果我们主要用在底层操作,并且操作中多涉及numpy数组的计算(numpy中许多函数是release GIL的),默认使用多线程比多进程性能更高,数据搬移的开销可能导致多进程比单线程串行处理还慢。
@@ -57,7 +57,8 @@ def resize_norm_img(self, img, max_wh_ratio): | |||
resized_w = int(math.ceil(imgH * ratio)) | |||
resized_image = cv2.resize(img, (resized_w, imgH)) | |||
resized_image = resized_image.astype("float32") | |||
resized_image = resized_image.transpose((2, 0, 1)) / 255 | |||
resized_image = resized_image.transpose((2, 0, 1)) | |||
resized_image /= 255 | |||
resized_image -= 0.5 | |||
resized_image /= 0.5 | |||
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有对OCRReisizeNormImg
进行并行化,是因为实测发现这样效率更低。
|
…ovic/PaddleX into feat/parallel_computing
本PR主要为PaddleX增加一个实验性质的并行加速功能,旨在优化代码库中广泛存在的类似:
[func(item) for item in data]
这样的过易并行的逻辑。PR内容
joblib
的并行加速功能。maybe_parallelize
函数,可用于标记代码库中的过易并行逻辑,并在配置满足要求时将其并行化。joblib.parallel_config
从库的外部定制PaddleX的默认并行计算行为。scipy.ndimage.rotate
的速度太慢了。使用OpenCV编写了替代的高效实现,初步测量对1024*2048的大图加速可以达到近百倍,但与原实现不是完全对齐(看起来可能主要是输出尺寸和align-corner方面有差别),需确认这个替代实现是否可接受。在我的机器上实验,使用新的实现,可以让OCR产线处理一个6页示例PDF的时间从13s降低到6s。 => 2025.3.26,替代实现的精度被相关同学评估为可以接受,为了让这个PR的内容更专一,这一点涉及的修改被移动到更相关的 opt processors #3714 。缺陷与待办