Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task04 Евгений Свирин ITMO #298

Closed
wants to merge 4 commits into from

Conversation

EvgenySvirin
Copy link

Локальный вывод

matrix_transpose:

OpenCL devices: Device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Using device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Data generated for M=4096, K=4096 [matrix_transpose_naive] GPU: 0.000922783+-1.25289e-06 s GPU: 18181.1 millions/s [matrix_transpose_local_bad_banks] GPU: 0.00070685+-1.91988e-05 s GPU: 23735.2 millions/s [matrix_transpose_local_good_banks] GPU: 0.000704517+-2.50583e-05 s GPU: 23813.8 millions/s

matrix_multiplication

OpenCL devices: Device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Using device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Data generated for M=1024, K=1024, N=1024 CPU: 2.31521+-0 s CPU: 0.863853 GFlops [naive, ts=4] GPU: 0.00959667+-7.45356e-07 s GPU: 208.406 GFlops Average difference: 0.000149043% [naive, ts=8] GPU: 0.0047985+-2.87228e-06 s GPU: 416.797 GFlops Average difference: 0.000149043% [naive, ts=16] GPU: 0.00407633+-4.71405e-07 s GPU: 490.637 GFlops Average difference: 0.000149043% [local, ts=4] GPU: 0.0106832+-0.00147418 s GPU: 187.21 GFlops Average difference: 0.000149043% [local, ts=8] GPU: 0.002946+-2.73435e-05 s GPU: 678.887 GFlops Average difference: 0.000149043% [local, ts=16] GPU: 0.0022445+-8.52834e-05 s GPU: 891.067 GFlops Average difference: 0.000149043% [local wpt, ts=4, wpt=2] GPU: 0.0123123+-4.89069e-05 s GPU: 162.439 GFlops Average difference: 0.000149043% [local wpt, ts=4, wpt=4] GPU: 0.0201325+-1.60728e-06 s GPU: 99.3419 GFlops Average difference: 0.000149043% [local wpt, ts=8, wpt=2] GPU: 0.002337+-5.7735e-07 s GPU: 855.798 GFlops Average difference: 0.000149043% [local wpt, ts=8, wpt=4] GPU: 0.00327633+-4.1696e-05 s GPU: 610.438 GFlops Average difference: 0.000149043% [local wpt, ts=8, wpt=8] GPU: 0.00561233+-0.000115237 s GPU: 356.358 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=2] GPU: 0.00149383+-3.72678e-07 s GPU: 1338.84 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=4] GPU: 0.00115217+-6.87184e-07 s GPU: 1735.86 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=8] GPU: 0.000981667+-4.71405e-07 s GPU: 2037.35 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=16] GPU: 0.00173333+-4.71405e-07 s GPU: 1153.85 GFlops Average difference: 0.000149043%

Вывод Github CI

matrix_transpose:

OpenCL devices: Device #0: CPU. AMD EPYC [7](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:7:8)763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Using device #0: CPU. AMD EPYC 7763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Data generated for M=4096, K=4096 [matrix_transpose_naive] GPU: 0.0150901+-[8](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:7:9).66783e-05 s GPU: 1111.8 millions/s [matrix_transpose_local_bad_banks] GPU: 0.0278532+-0.00045152 s GPU: 602.345 millions/s [matrix_transpose_local_good_banks] GPU: 0.0277682+-0.000530473 s GPU: 604.188 millions/s

matrix_multiplication

OpenCL devices: Device #0: CPU. AMD EPYC [7](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:8:8)763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Using device #0: CPU. AMD EPYC 7763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Data generated for M=1024, K=1024, N=1024 CPU: 6.43314+-0 s CPU: 0.310[8](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:8:9)9 GFlops [naive, ts=4] GPU: 0.256356+-0.00128988 s GPU: 7.80166 GFlops Average difference: 0.00014[9](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:8:10)043% [naive, ts=8] GPU: 0.262318+-0.00330994 s GPU: 7.62432 GFlops Average difference: 0.000149043% [naive, ts=16] GPU: 0.271294+-0.00518451 s GPU: 7.37207 GFlops Average difference: 0.000149043% [local, ts=4] GPU: 0.551742+-0.00170809 s GPU: 3.62488 GFlops Average difference: 0.000149043% [local, ts=8] GPU: 0.145986+-0.000258419 s GPU: 13.7 GFlops Average difference: 0.000149043% [local, ts=16] GPU: 0.09[10](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:8:11)955+-0.000269942 s GPU: 21.955 GFlops Average difference: 0.000149043% [local wpt, ts=4, wpt=2] GPU: 0.529485+-0.00160939 s GPU: 3.77725 GFlops Average difference: 0.000149043% [local wpt, ts=4, wpt=4] GPU: 0.460945+-0.000407938 s GPU: 4.33891 GFlops Average difference: 0.000149043% [local wpt, ts=8, wpt=2] GPU: 0.134053+-0.000816249 s GPU: 14.9195 GFlops Average difference: 0.000149043% [local wpt, ts=8, wpt=4] GPU: 0.148363+-0.000794369 s GPU: 13.4805 GFlops Average difference: 0.000149043% [local wpt, ts=8, wpt=8] GPU: 0.153269+-0.000191614 s GPU: 13.0489 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=2] GPU: 0.078937+-0.000759654 s GPU: 25.3367 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=4] GPU: 0.089023+-0.000379301 s GPU: 22.4661 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=8] GPU: 0.0806555+-0.00109715 s GPU: 24.7968 GFlops Average difference: 0.000149043% [local wpt, ts=16, wpt=16] GPU: 0.0780773+-0.000503[12](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:8:13)3 s GPU: 25.6156 GFlops Average difference: 0.000[14](https://github.com/EvgenySvirin/GPGPUTasks2024/actions/runs/12221487152/job/34090582750#step:8:15)9043%


barrier(CLK_LOCAL_MEM_FENCE);

as_t[i * M + j] = tile[local_j][local_i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

нет смысла в таком использовании локальной памяти, что записали то и считали

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

что в итоге приводит к некоалесд записи

Copy link
Author

@EvgenySvirin EvgenySvirin Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да, я там строчку со стобцом перепутал. Столбцы дожны быть TYLE_SIZE + 1.

Но теперь сделал большой рефактор (попытка сделать коалесед чтение и запись).

Локальный вывод

matrix_transpose:

OpenCL devices: Device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Using device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Data generated for M=4096, K=4096 [matrix_transpose_naive] GPU: 0.00080885+-1.58981e-06 s GPU: 20742.1 millions/s [matrix_transpose_local_bad_banks] GPU: 0.000732367+-0.00014281 s GPU: 22908.2 millions/s [matrix_transpose_local_good_banks] GPU: 0.00063305+-7.89478e-05 s GPU: 26502.2 millions/s

Вывод Github CI

matrix_transpose:

OpenCL devices: Device #0: CPU. AMD EPYC [7](https://github.com/GPGPUCourse/GPGPUTasks2024/actions/runs/12333311266/job/34422166124#step:7:8)763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Using device #0: CPU. AMD EPYC 7763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Data generated for M=4096, K=4096 [matrix_transpose_naive] GPU: 0.01752[8](https://github.com/GPGPUCourse/GPGPUTasks2024/actions/runs/12333311266/job/34422166124#step:7:9)4+-0.00137051 s GPU: 957.144 millions/s [matrix_transpose_local_bad_banks] GPU: 0.0135164+-0.000128746 s GPU: 1241.25 millions/s [matrix_transpose_local_good_banks] GPU: 0.0137078+-5.88634e-05 s GPU: 1223.[9](https://github.com/GPGPUCourse/GPGPUTasks2024/actions/runs/12333311266/job/34422166124#step:7:10)2 millions/s

@simiyutin
Copy link
Collaborator

Все хорошо, задача зачтена, 8/10 баллов 👍

@simiyutin simiyutin closed this Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants