Skip to content

Commit

Permalink
expand on driver&CUDA version influence on the results
Browse files Browse the repository at this point in the history
  • Loading branch information
mjurbanski-reef committed Aug 23, 2024
1 parent dea88fb commit e44d014
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,10 @@ Through [Integration testing](docs/MANUAL_INTEGRATION_TESTING.md) we can see tha

Here is the summary of the results for vLLM running llama3 model:
* each card GPU model (combined with its vRAM configuration) has a different output, but is consistent across runs
* the output is consistent across different CUDA versions (more testing is needed here, only small range was tested)
* GPU interface (SXM4, PCIe) does not affect the output
* A100 80GB and A100X 80GB produce the same output
* 2x A100 40GB do not produce the same output as 1x A100 80GB
* driver versions 535.129.03 and 555.58.02 produce the same output
* driver&CUDA may influence results, especially in case of cards with higher "Compute Capacity" feature set, e.g. H100, as opposed to A100 which seems to produce same results with wider range of versions. More in-depth investigation is required here. This will likely depend on exactl ML model or to be more exact - features used to execute that model.

To learn more about this particular example, please refer to the [Integration testing](docs/MANUAL_INTEGRATION_TESTING.md) documentation and the [tests/integration/experiments/vllm_llama_3_70b_instruct_awq](tests/integration/experiments/vllm_llama_3_70b_instruct_awq) experiment code.

Expand Down

0 comments on commit e44d014

Please sign in to comment.