add SIMD #87

caxieyou · 2022-05-23T01:54:54Z

caxieyou
May 23, 2022
Collaborator

start from very scratch, start from reviewing the math crate now

most of the calculation should be f32 and most of the platform is x64 now

could consider add SIMD to speed up?

not quit sure if multicore could be applied here, didn't see a lot of for loop...

mikialex · 2022-05-23T07:07:40Z

mikialex
May 23, 2022
Maintainer

There are several questions here.

Why do we use f32 as the default scalar type parameter instead of f64?

On 64 bit machine, there is no difference between the performance cost of the f32 and f64. However, using f32 will reduce the type size (typically half) in most of math or geometry types, which improves the cache locality and theoretically means better performance in bandwidth perspective. Another consideration of this choice is I want the numeric behaviour/stability of my code as same as the GPU side when you really care about the error tolerance.(for example the handling self intersection in raytracer) , because you can make sure you have f32 on GPU but not the f64. It will helpfull when you migrate your code to gpu side later. But of course maybe we should able to change our default scalar type by cargo feature config in the future.

SIMD plan?

Yes I did have simd optimization plan. The current plan is create custom "wide" types using std simd and impl another system of computation method on it. the "wide" version or the vector version is pair with the scalar version. This is a good reference https://github.com/fu5ha/ultraviolet.

Also the current scalar version of math/geom types could have limited simd opt. I believe many of them have been covered by the compiler's auto vecterization. low priority i think.

Multi threading?

No. In primitive level, it's impossible and meaningless.

0 replies

caxieyou · 2022-05-30T02:03:15Z

caxieyou
May 30, 2022
Collaborator Author

Cool~

For the simd part, I think exposing the type and usage like ultraviolet is kind of wired... I mean everytime I have to point out it's Vec3x8 and fill data into it as filling data into an array then the ultraviolet optimize it with SIMD is too...not friendly.

In my mind, the user should not care anything about this, just define some Vec3 or Mat4, when the is not wide (such as float32, int32), inside will automatically figure it out and use SIMD to speed it up. But the ultraviolet could be a good reference of course~

finally, for the multithreading, I'm just thinking if we can use Rayon(https://docs.rs/rayon/latest/rayon/) to speed up the for loop~seems pretty handy

1 reply

mikialex May 30, 2022
Maintainer

If you want to use SIMD, you have to organize your data in array-like continuous memory with correct alignment, which means the best practice is just to use a fixed with wide type container... To release the full performance potential of SIMD, you have to implement different vectorized versions of your method instead of using the scalar version, and wide type container seems a good place for these implementations.

I don't know if there is another better way to use SIMD, but I'm open to other solutions.

Leveraging Rayon for multithreading is actually the application side consideration. For example, you can see how we use rayon to parallelize cpu ray tracing in rainray/src/renderer.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add SIMD #87

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

add SIMD #87

caxieyou May 23, 2022 Collaborator

Replies: 2 comments · 1 reply

mikialex May 23, 2022 Maintainer

caxieyou May 30, 2022 Collaborator Author

mikialex May 30, 2022 Maintainer

caxieyou
May 23, 2022
Collaborator

Replies: 2 comments 1 reply

mikialex
May 23, 2022
Maintainer

caxieyou
May 30, 2022
Collaborator Author

mikialex May 30, 2022
Maintainer