Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#753: Syncing device host times for tracy profiler #8101

Merged
merged 2 commits into from
Jun 7, 2024

Conversation

mo-tenstorrent
Copy link
Contributor

@mo-tenstorrent mo-tenstorrent commented May 3, 2024

This is the PR for syncing device and host time for tracy.

The data is only used by tracy GUI. A tt_metal sync program is created that loads the sync kernel to device.

With the kernel running and waiting, host writes it time to a L1 location and device reads and tags the time with its own wall clock time. This happens for 249 iterations which is driven by profiler L1 buffer size.

Each sync program takes ~ 1s. i.e. 249 x 4ms (Sleep time between host time stamps) ~= 1s.

Multiple of these sync programs can happen per device. By default, at least 2 will happen per device, on at init_device and one at dump.

Host then post processes all the paired host-device timestamps and calculates the delay and frequency of the device.

Syncing is off by default and can be turned on by TT_METAL_PROFILER_SYNC=1

Best way to evaluate the precision is to note that I am roughly getting 5 if not 6 significant digits on my frequency calculation. Separate runs are producing frequencies that are equal up to 6 significant digits. That can be seen as microsecond precision on the sync. Certainly sub 10us.

Below shows FD1 dispatch core end to the host finish call end. We can see the diff of 2.46us. Part of this delay is real, it is the time for the message to travel. This is showing ~1us accuracy.

Screenshot 2024-05-03 at 1 10 27 PM

Green CI 🟢

Post commit: https://github.com/tenstorrent/tt-metal/actions/runs/9388576940
Profiler with latest rebase: https://github.com/tenstorrent/tt-metal/actions/runs/9421179535
Device perf: https://github.com/tenstorrent/tt-metal/actions/runs/9389438009
T3K profiler: https://github.com/tenstorrent/tt-metal/actions/runs/9421174522
uBenchmark: https://github.com/tenstorrent/tt-metal/actions/runs/9401574470

@TT-billteng
Copy link
Contributor

Syncing is off by default and can be turned on by TT_METAL_PROFILER_SYNC=1

Are you planning to turn it on by default in the future?

@TT-billteng
Copy link
Contributor

Perhaps it may be useful to show everyone the accuracy of your new approach with the data you showed me

@mo-tenstorrent
Copy link
Contributor Author

Syncing is off by default and can be turned on by TT_METAL_PROFILER_SYNC=1

Are you planning to turn it on by default in the future?

I was hoping to get some milage on it before turning it on by default.

@mo-tenstorrent mo-tenstorrent removed the request for review from vtangTT June 7, 2024 14:54
@mo-tenstorrent mo-tenstorrent force-pushed the mo/753_host_device_sync_2 branch from 8b2d430 to 63d8846 Compare June 7, 2024 15:21
@mo-tenstorrent mo-tenstorrent force-pushed the mo/753_host_device_sync_2 branch 2 times, most recently from a35c838 to aeef077 Compare June 7, 2024 17:55
mo-tenstorrent and others added 2 commits June 7, 2024 17:56
This data is only used by tracy GUI to align device and host zones.

It is disabled by default. It is enabled by setting TT_METAL_PROFILER_SYNC=1.
@mo-tenstorrent mo-tenstorrent force-pushed the mo/753_host_device_sync_2 branch from aeef077 to 46c7f2f Compare June 7, 2024 17:56
@mo-tenstorrent mo-tenstorrent merged commit 5c801ba into main Jun 7, 2024
9 checks passed
@github-actions github-actions bot deleted the mo/753_host_device_sync_2 branch December 13, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants