Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blockwise Op #757

Closed

Conversation

brandonwillard
Copy link
Member

@brandonwillard brandonwillard commented Jan 17, 2022

This PR implements #695.

It's currently just an outline.

Comment on lines +180 to +181
def transform(var: "TensorVariable", client_node: Optional[Apply]) -> Variable:
"""Walk a graph and expand single gradient \"block\"s into their block-wise equivalents."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brandonwillard
Can you explain what transform function is and how it is used in computing L_Op?

Copy link
Member Author

@brandonwillard brandonwillard Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like its Elemwise counterpart, transform is supposed to use a "template" gradient graph for each input to construct broadcasted gradient graphs in which all the relevant Ops are Elemwise/Blockwise Ops applied to the original inputs.

Let's take a look at what's happening in Blockwise.L_op in first test_Blockwise_grad test.

First, the graph for which we want the L-op/gradient:

aesara.dprint(outputs)
# Blockwise{op=<tests.tensor.test_blockwise.DotBW object at 0x7f5e8236fd90>, signature=((('m', 'n'), ('n', 'p')), (('m', 'p'),))} [id A] <TensorType(float64, (None, None, None))>
#  |input 0 [id B] <TensorType(float64, (None, None, None))>
#  |input 1 [id C] <TensorType(float64, (None, None, None))>

It's a Blockwise dot product node with two 3D inputs named input 0 and input 1.

A "template" graph of the gradient is produced for each input and stored in core_inp_grads. Each element of core_inp_grads corresponds to the generic form of a single-block's gradient wrt. each input.

aesara.dprint(core_inp_grads, print_type=True)
# dot [id A]
#  |<TensorType(float64, (None, None))> [id B]
#  |InplaceDimShuffle{1,0} [id C]
#    |<TensorType(float64, (None, None))> [id D]
# dot [id E]
#  |InplaceDimShuffle{1,0} [id F]
#  | |<TensorType(float64, (None, None))> [id G]
#  |<TensorType(float64, (None, None))> [id B]

We can see that the gradient of a dot in a single block is just another dot, and that the original inputs aren't present; instead some stand-in variables are used and they're 2D (i.e. TensorTypes with (None, None) static shapes).
In other words, we've used the core dimensions specified by the Blockwise and its Op to remove the broadcasted dimensions (i.e. that determine each block) and produce the generic form of a single "block"'s L-op from an existing Op.[grad|L_op] implementation.

Now, we can't simply replace those stand-in inputs with input 0 and/or input 1, because the dots in the gradient graphs don't work block-wise and, as a result, cannot take the original inputs as inputs. Also, the InplaceDimShuffle applied to one of the inputs in each graph wouldn't work with an input containing an extra third dimension.

The idea is that we need to convert the templates' dots into Blockwise(dot)s and do something about the InplaceDimShuffles. My guess is that the first input's gradient graph would end up looking like the following after applying transform:

# Blockwise{op=<tests.tensor.test_blockwise.DotBW object at 0x7f5e8236fd90>, signature=((('m', 'n'), ('n', 'p')), (('m', 'p'),))} [id A]
#  |input 0 [id B]
#  |InplaceDimShuffle{1,0,2} [id C]
#    |input 1 [id D]

The DimShuffleed dimensions will probably require a little bit of calculation involving Blockwise.signature (i.e. to transpose the correct, core dimensions), but most other Ops should be Blockwise amenable—at least after we formalize and attach the relevant signature information to our Ops. DimShuffle is perhaps a special case in which we don't want to create a Blockwise Op, mostly because there's no point in literally applying a DimShuffle block-wise when a new, equivalent DimShuffle can be produced that accomplishes the same thing, but more succinctly.

Any Ops that can't be converted to a Blockwise form (e.g. because they don't provide signature information in some way or another) should result in a no-gradient error.

@brandonwillard
Copy link
Member Author

Closing in favor of #1215.

@brandonwillard brandonwillard deleted the add-blockwise-op branch January 28, 2023 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request important Op implementation Involves the implementation of an Op
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create an Op for NumPy's generalized ufuncs
3 participants