-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task04 Евгений Свирин ITMO #298
Conversation
src/cl/matrix_transpose.cl
Outdated
|
||
barrier(CLK_LOCAL_MEM_FENCE); | ||
|
||
as_t[i * M + j] = tile[local_j][local_i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
нет смысла в таком использовании локальной памяти, что записали то и считали
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
что в итоге приводит к некоалесд записи
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Да, я там строчку со стобцом перепутал. Столбцы дожны быть TYLE_SIZE + 1.
Но теперь сделал большой рефактор (попытка сделать коалесед чтение и запись).
Локальный вывод
matrix_transpose:
OpenCL devices: Device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Using device #0: GPU. NVIDIA GeForce RTX 4060 Laptop GPU. Total memory: 7940 Mb Data generated for M=4096, K=4096 [matrix_transpose_naive] GPU: 0.00080885+-1.58981e-06 s GPU: 20742.1 millions/s [matrix_transpose_local_bad_banks] GPU: 0.000732367+-0.00014281 s GPU: 22908.2 millions/s [matrix_transpose_local_good_banks] GPU: 0.00063305+-7.89478e-05 s GPU: 26502.2 millions/s
Вывод Github CI
matrix_transpose:
OpenCL devices: Device #0: CPU. AMD EPYC [7](https://github.com/GPGPUCourse/GPGPUTasks2024/actions/runs/12333311266/job/34422166124#step:7:8)763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Using device #0: CPU. AMD EPYC 7763 64-Core Processor . Intel(R) Corporation. Total memory: 15991 Mb Data generated for M=4096, K=4096 [matrix_transpose_naive] GPU: 0.01752[8](https://github.com/GPGPUCourse/GPGPUTasks2024/actions/runs/12333311266/job/34422166124#step:7:9)4+-0.00137051 s GPU: 957.144 millions/s [matrix_transpose_local_bad_banks] GPU: 0.0135164+-0.000128746 s GPU: 1241.25 millions/s [matrix_transpose_local_good_banks] GPU: 0.0137078+-5.88634e-05 s GPU: 1223.[9](https://github.com/GPGPUCourse/GPGPUTasks2024/actions/runs/12333311266/job/34422166124#step:7:10)2 millions/s
… 1 instead of tyle rows
Все хорошо, задача зачтена, 8/10 баллов 👍 |
Локальный вывод
Вывод Github CI