Sharding specs of line_all_gather for Llama3-TG #11172

kpaigwar · 2024-08-07T21:30:15Z

########################################################################################
# Spec 1
########################################################################################
fused_query_key_value = {'shape' : [1, 1, 32, 1280], 
                        'shard_shape' : (32, 32)}
all_gather_output = {'shape' : [4, 1, 32, 1280], 
                    'shard_shape' : (32*4, 32)}
output_mem_config = ttnn.create_sharded_memory_config(
                            shape=(32*4, 32),
                            core_grid=ttnn.CoreGrid(y=5, x=8),
                            strategy=ttnn.ShardStrategy.WIDTH,
                            orientation=ttnn.ShardOrientation.ROW_MAJOR,
                            use_height_and_width_as_shard_shape=True,
                        )
gathered_tensor = ttnn.line_all_gather(fused_query_key_value, dim=0, num_links=2, 
                                       cluster_axis=1, device_mesh=self.device_mesh, 
                                       memory_config=output_mem_config)
########################################################################################
# Spec 2
########################################################################################
attn_output_tensor = {'shape' : [1, 1, 32, 2048], 
                        'shard_shape' : [1, 1, 32, 64]}
all_gather_output = {'shape' : [8, 1, 32, 2048], 
                    'shard_shape' : (32*8, 64)}
output_mem_config = ttnn.create_sharded_memory_config(
                            shape=(32*8, 64),
                            core_grid=ttnn.CoreGrid(y=4, x=8),
                            strategy=ttnn.ShardStrategy.WIDTH,
                            orientation=ttnn.ShardOrientation.ROW_MAJOR,
                            use_height_and_width_as_shard_shape=True,
                        )
gathered_tensor = ttnn.line_all_gather(attn_output_tensor, dim=0, num_links=2, 
                                       cluster_axis=0, device_mesh=self.device_mesh, 
                                       memory_config=output_mem_config)

The text was updated successfully, but these errors were encountered:

kpaigwar · 2024-08-07T21:31:00Z

fyi @SeanNijjar @cglagovich

SeanNijjar · 2024-10-23T23:46:14Z

Closing. @kpaigwar confirmed functional correctness on TG

SeanNijjar self-assigned this Oct 21, 2024

SeanNijjar added bug Something isn't working P1 op_cat: ccl perf for issues tracking performance problems/improvements labels Oct 21, 2024

SeanNijjar closed this as completed Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding specs of line_all_gather for Llama3-TG #11172

Sharding specs of line_all_gather for Llama3-TG #11172

kpaigwar commented Aug 7, 2024 •

edited

Loading

kpaigwar commented Aug 7, 2024

SeanNijjar commented Oct 23, 2024

Sharding specs of line_all_gather for Llama3-TG #11172

Sharding specs of line_all_gather for Llama3-TG #11172

Comments

kpaigwar commented Aug 7, 2024 • edited Loading

kpaigwar commented Aug 7, 2024

SeanNijjar commented Oct 23, 2024

kpaigwar commented Aug 7, 2024 •

edited

Loading