Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Documentation for env Parameter Usage* #6360

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion man/data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac
See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}}.
}

\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). Use \code{verbose} to preview constructed expressions. For more details see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}. }
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). For function names, you can use them as strings (e.g., \code{"sum"}) or pass function objects directly (e.g., \code{sum}). Use \code{verbose} to preview constructed expressions. For more details, see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}.}
Nj221102 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to:
For substitution of functions, you should use their names as strings (for example, \code{"sum"}) instead of passing function objects directly (for example, \code{sum})
(according to my reading of Jan's comments, we should discourage use of function objects, and instead encourage use of their names).

On second thought, instead of adding a sentence talking about function names specifically, we could just change the docs to say "List or environment, where names are symbols to find, and values are strings (names of objects), passed to substitute2..."

what do you think @jangorecki ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this explanation is oververbose. This is related not only to functions. Whatever symbol will be passed there, it will be substituted with what it refers to. I think just adding an example is enough. This is exactly how base substitute works, no? So if we refer to that we are covers.

Copy link
Member

@jangorecki jangorecki Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where names are symbols to find

Unless someone really want to substitute body of a function instead its name... Then that would not apply.


\item{showProgress}{ \code{TRUE} shows progress indicator with estimated time to completion for lengthy "by" operations. }
}
Expand Down Expand Up @@ -403,6 +403,7 @@ print(DT["b", v2:=84L, on="x"]) # subassign to new column by reference (NA

DT[, m:=mean(v), by=x][] # add new column by reference by group
# NB: postfix [] is shortcut to print()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please undo addition of empty lines

# advanced usage
DT = data.table(x=rep(c("b","a","c"),each=3), v=c(1,1,1,2,2,1,1,2,2), y=c(1,3,6), a=1:9, b=9:1)

Expand All @@ -418,6 +419,9 @@ DT[, list(MySum=sum(v),
MyMax=max(v)),
by=.(x, y\%\%2)] # by 2 expressions

DT[, .(f=f(a)), by=b, verbose=TRUE,
env=list(f="sum", b="x")] # subtitution via env arg

DT[, .(a = .(a), b = .(b)), by=x] # list columns
DT[, .(seq = min(a):max(b)), by=x] # j is not limited to just aggregations
DT[, sum(v), by=x][V1<20] # compound query
Expand Down
23 changes: 23 additions & 0 deletions vignettes/datatable-programming.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,29 @@ print(j)
DT[, j, env = list(j = j)]
```

### Injecting functions using strings in `env`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the term "inject" come from?
I think the more typical R term would be "substitute" (like the function of that name)


In `data.table`, you can inject functions into your expressions by passing their names as strings in the `env` parameter. This method allows you to use function names directly as strings, and `data.table` will automatically interpret these strings as the corresponding functions. This approach simplifies the process and makes it easier to work with function names dynamically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a sentence that says you should use env=list(f="mean") and not env=list(f=mean)


Suppose you want to calculate the total of `Sepal.Length` in the `iris` dataset using the `sum` function. You can inject the `sum` function by passing its name as a string in the `env` parameter.

```{r sum_example}
result <- DT[, f(Sepal.Length), env = list(f = "sum")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change vignette. Other examples in vignette already covers this use case. In the issue what was discussed was to extend manual, not vignette.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael commented #6360 (comment) that he wanted to add to vignette

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO other pieces of code in this vignette are sufficiently covering parametrizing function names

Copy link
Contributor Author

@Nj221102 Nj221102 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we need any changes to vignette or not, tbf while adding i also felt that eveything is already covered so i added a section for specific case mentioned in the issue.

Inconsistency is when someone will try to actually inject function body via passing function to env argument. Don't have reasonable use case for that but I don't think we can rule out such situation. For env it is actually sufficient to provide "sum" because it is automatically turned into name. So not even need to quote() or as.name(). What seems very reasonable is to add this use case to examples so it is documented to use "sum" rather than sum.

hi @MichaelChirico can you please help me out here, like what type of section are we thinking about adding in programming vignette?

print(result)
```

In this example, the string `"sum"` is mapped to the `f` function. The `env` parameter allows `data.table` to recognize this `f` as the `sum` function, so it computes the total of `Sepal.Length` accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok for a first example, but I believe the second example below could be better, by showing a more typical use case, such as

my_summarization <- function(DT, f, variable, by){
  DT[, f(variable), by=by, env=list(f=f, variable=variable, by=by)]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toby's example without it's usage is not enough. Try with by of length 2+


You can also inject multiple functions at once by specifying their names as strings in the `env` list. This is useful for performing various calculations within a single expression.

```{r multi_example}
result <- DT[, .(total = f(Sepal.Length), average = d(Sepal.Length)),
env = list(f = "sum", d = "mean")]
print(result)
```

In this example, both `"sum"` and `"mean"` are passed as strings. The `env` parameter maps these strings to their respective functions, allowing you to calculate both the total and the average of `Sepal.Length` in one go.

## Retired interfaces

In `[.data.table`, it is also possible to use other mechanisms for variable substitution or for passing quoted expressions. These include `get` and `mget` for inline injection of variables by providing their names as strings, and `eval` that tells `[.data.table` that the expression we passed into an argument is a quoted expression and that it should be handled differently. Those interfaces should now be considered retired and we recommend using the new `env` argument, instead.
Expand Down
Loading