[GraphBolt] Fix gpu `NegativeSampler` for seeds. #7068

yxy235 · 2024-02-02T04:10:24Z

Description

move seeds, indexes, labels to GPU when sampling on GPU.
Add tests for NegativeSampler on GPU.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2024-02-02T04:10:51Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot · 2024-02-02T04:55:10Z

Commit ID: 1835ce3

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

mfbalin · 2024-02-02T08:39:30Z

@yxy235 If you ask for my review as well, it will be easier for me to keep track of what is changing when it comes to GPU GraphBolt support.

mfbalin

LGTM overall, suggested a minor improvement.

python/dgl/graphbolt/impl/uniform_negative_sampler.py

dgl-bot · 2024-02-02T10:05:49Z

Commit ID: 4c976f4ac12c462d8726678b4bde5ddb17fec99e

Build ID: 2

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

mfbalin · 2024-02-02T10:26:02Z

I was experimenting to see what is the best way to create such tensors, below, you can see what I did:

The output of the code below on colab is as follows:

(<torch.utils.benchmark.utils.common.Measurement object at 0x7d385ed635b0>
 f(10000000, 20000000)
 setup: import torch
   604.70 us
   1 measurement, 1000 runs , 1 thread,
 <torch.utils.benchmark.utils.common.Measurement object at 0x7d377182ef80>
 f(10000000, 20000000)
 setup: import torch
   133.53 us
   1 measurement, 1000 runs , 1 thread)

import torch
import torch.utils.benchmark as benchmark
 
def f(pos_num, neg_num, dtype=torch.bool, device="cuda:0"):
    return torch.cat(
        (
            torch.ones(
                pos_num,
                dtype=dtype,
                device=device,
            ),
            torch.zeros(
                neg_num,
                dtype=dtype,
                device=device,
            ),
        ),
    )

def g(pos_num, neg_num, dtype=torch.bool, device="cuda:0"):
    labels = torch.empty(pos_num + neg_num, dtype=dtype, device=device)
    labels[:pos_num] = 1
    labels[pos_num:] = 0
    return labels

assert torch.equal(f(10, 20), g(10, 20))

N = 10000000
neg_factor = 2

stmt = f'f({N}, {N * neg_factor})'

f_timer = benchmark.Timer(stmt=stmt, setup='import torch', globals={'f': f})
g_timer = benchmark.Timer(stmt=stmt, setup='import torch', globals={'f': g})

f_timer.timeit(1000), g_timer.timeit(1000)

mfbalin · 2024-02-02T10:27:29Z

My experiment and suggestion above is a nit, I just wanted to see what is the best way to do it.

yxy235 · 2024-02-02T10:38:57Z

My experiment and suggestion above is a nit, I just wanted to see what is the best way to do it.

I see. I will change it later for better performance.

mfbalin

LGTM with minor nit comments that don't need to be addressed for this PR. However, we might want to scan the whole code base and make similar improvements. I think such small improvements, when applied to the whole codebase, will make a meaningful difference in performance.

Ubuntu added 2 commits February 2, 2024 03:59

fix gpu

ea4acb8

fix dtype

1835ce3

yxy235 requested a review from frozenbugs February 2, 2024 04:10

mfbalin self-requested a review February 2, 2024 08:38

mfbalin reviewed Feb 2, 2024

View reviewed changes

python/dgl/graphbolt/impl/uniform_negative_sampler.py Outdated Show resolved Hide resolved

improve

a6fc9fe

mfbalin reviewed Feb 2, 2024

View reviewed changes

python/dgl/graphbolt/impl/uniform_negative_sampler.py Show resolved Hide resolved

mfbalin approved these changes Feb 2, 2024

View reviewed changes

yxy235 merged commit af0b63e into dmlc:master Feb 4, 2024
2 checks passed

yxy235 deleted the fix_negative_sampler_seeds_gpu branch February 6, 2024 07:39

yxy235 mentioned this pull request Mar 6, 2024

[GraphBolt] Modify labels dtype. #7200

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GraphBolt] Fix gpu `NegativeSampler` for seeds. #7068

[GraphBolt] Fix gpu `NegativeSampler` for seeds. #7068

yxy235 commented Feb 2, 2024

dgl-bot commented Feb 2, 2024

dgl-bot commented Feb 2, 2024

mfbalin commented Feb 2, 2024

mfbalin left a comment

dgl-bot commented Feb 2, 2024

mfbalin commented Feb 2, 2024

mfbalin commented Feb 2, 2024

yxy235 commented Feb 2, 2024

mfbalin left a comment

[GraphBolt] Fix gpu NegativeSampler for seeds. #7068

[GraphBolt] Fix gpu NegativeSampler for seeds. #7068

Conversation

yxy235 commented Feb 2, 2024

Description

Checklist

Changes

dgl-bot commented Feb 2, 2024

dgl-bot commented Feb 2, 2024

mfbalin commented Feb 2, 2024

mfbalin left a comment

Choose a reason for hiding this comment

dgl-bot commented Feb 2, 2024

mfbalin commented Feb 2, 2024

mfbalin commented Feb 2, 2024

yxy235 commented Feb 2, 2024

mfbalin left a comment

Choose a reason for hiding this comment

[GraphBolt] Fix gpu `NegativeSampler` for seeds. #7068

[GraphBolt] Fix gpu `NegativeSampler` for seeds. #7068