Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues in translatable messages in R code #6532

Closed
13 tasks done
aitap opened this issue Sep 24, 2024 · 10 comments
Closed
13 tasks done

Issues in translatable messages in R code #6532

aitap opened this issue Sep 24, 2024 · 10 comments
Labels
translation issues/PRs related to message translation projects

Comments

@aitap
Copy link
Contributor

aitap commented Sep 24, 2024

Fragmented sentences

  • %s is elements of fun.list or ...arguments to measure or maybe measured columns, untranslated, coming from the group.desc argument, which is a documented ("for internal use") interface. Another %s is number of capture groups in pattern or max number of items after splitting column names, also untranslated.
    stopf("in measurev, %s must be named, problems: %s", group.desc, brackify(prob.i))

    stopf("%s should be uniquely named, problems: %s", err.what, brackify(names(bad.counts)))

    stopf("number of %s =%d must be same as %s =%d", group.desc, length(fun.list), type, N)
  • %1$s is keys or indices, untranslated.

    data.table/R/setops.R

    Lines 160 to 177 in 745160d

    return(gettextf(
    "Datasets have different %s. 'target': %s. 'current': %s.",
    "keys",
    if(length(k1)) brackify(k1) else gettextf("has no key"),
    if(length(k2)) brackify(k2) else gettextf("has no key")
    ))
    }
    # check index
    i1 = indices(target)
    i2 = indices(current)
    if (!identical(i1, i2)) {
    return(gettextf(
    "Datasets have different %s. 'target': %s. 'current': %s.",
    "indices",
    if(length(i1)) brackify(i1) else gettextf("has no index"),
    if(length(i2)) brackify(i2) else gettextf("has no index")
    ))
    }
  • %3$s is either error or warning, untranslated. That's relatively minor and could be worked around in translation (condition of class %s):
    warningf("Column '%s' was requested to be '%s' but fread encountered the following %s:\n\t%s\nso the column has been left as type '%s'", names(ans)[j], new_class, if (inherits(e, "error")) "error" else "warning", e$message, typeof(v))

Minor issues with correctness

  • This should have said POSIXct, not just POSIX (which could be read as POSIXt or POSIXlt, and we don't like POSIXlt):
    "'between' function the 'x' argument is a POSIX class while '%s' was not, coercion to POSIX failed with: %s", 'lower', e$message))

    "'between' function the 'x' argument is a POSIX class while '%s' was not, coercion to POSIX failed with: %s", 'upper', e$message))
  • Missing ] at the end:
    stopf("You have wrapped := with {} which is ok but then := must be the only thing inside {}. You have something else inside {} as well. Consider placing the {} on the RHS of := instead; e.g. DT[,someCol:={tmpVar1<-...;tmpVar2<-...;tmpVar1*tmpVar2}")
  • The last } needs to be replaced with ]
    "'%s' is not found in calling scope, but it is a column of type %s. If you wish to select rows where that column contains TRUE, or perhaps that column contains row numbers of itself to select, try DT[(col)], DT[DT$col], or DT[col==TRUE} is particularly clear and is optimized",
  • PR17478 is now FIXED CLOSED:
    stopf("The data_table.%s version (%s) does not match the package (%s). Please close all R sessions to release the old %s and reinstall data.table in a fresh R session. The root cause is that R's package installer can in some unconfirmed circumstances leave a package in a state that is apparently functional but where new R code is calling old C code silently: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478. Once a package is in this mismatch state it may produce wrong results silently until you next upgrade the package. Please help by adding precise circumstances to 17478 to move the status to confirmed. This mismatch between R and C code can happen with any package not just data.table. It is just that data.table has added this check.", dll, dllV, RV, toupper(dll))

Nitpicking

  • In R ≥ 4.2.0 there's the Sys.setLanguage function which (tries to) set the gettext language during runtime, maybe recommend it here?
    packageStartupMessagef("**********\nRunning data.table in English; package support is available in English only. When searching for online help, be sure to also check for the English error message. This can be obtained by looking at the po/R-<locale>.po and po/<locale>.po files in the package source, where the native language and English error messages can be found side-by-side\n**********")
  • Looks like the code accepts a single data.table/data.frame, maybe remove the s?
    stopf("x must be an atomic vector or data.frames/data.tables")
  • s/both libraries/both packages/
    warningf("The %1$s generic in data.table has been passed a %2$s and will attempt to redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no longer actively developed, and this redirection is now deprecated. To continue using melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace, i.e. reshape2::%1$s(%3$s). In the next version, this warning will become an error.", "melt", class1(data), data_name)
  • Maybe shouldn't be translated
    if (length(by.x) != length(by.y)) stopf("length(by.x) != length(by.y)")
  • Needs # nontranslate because it's from a different domain
    foreign = gettext("object '%s' not found", domain="R") != "object '%s' not found"
  • Perhaps ...vector **of** length %d?
    if (length(.SDcols)!=length(x)) stopf(".SDcols is a logical vector length %d but there are %d columns", length(.SDcols), length(x))
@aitap aitap added the translation issues/PRs related to message translation projects label Sep 24, 2024
@rikivillalba
Copy link
Contributor

rikivillalba commented Sep 24, 2024

err.args.groups = function(type, N){
  if (N != length(fun.list)) {
    stopf("number of %s =%d must be same as %s =%d", group.desc, length(fun.list), type, N)
  }
}

type is an argument that is not translated, i.e. in line 120
err.args.groups("number of capture groups in pattern", ncol(start))

err.args.groups("number of capture groups in pattern", ncol(start))

@MichaelChirico
Copy link
Member

@tdhock can you help me understand measurev() here? It has an argument group.desc= with a default value "elements of fun.list". But I only ever see measurev() called in one place, with a different value:

list(group.desc="... arguments to measure"))

Can we just drop the group.desc argument, since measurev() is unexported? Also, remind me why it is documented given that it's unexported?

@tdhock
Copy link
Member

tdhock commented Sep 24, 2024

since measure is exported, we should export measurev too. both are documented because they are meant to be used as the measure.vars argument to melt.
when I wrote measure/measurev, I was not aware of the issues created by breaking strings apart when translating messages.
so we should probably get rid of the group.desc argument, I will look at #6534

@MichaelChirico
Copy link
Member

OK, filed #6535.

@MichaelChirico
Copy link
Member

I see now in the tests that measurev() is called with the default group.desc=, I guess some NSE applies there.

@tdhock
Copy link
Member

tdhock commented Sep 24, 2024

the user could call either measure() or measurev() inside measure.vars.
measure() calls measurev() but we want specific error messages to help the user:

> melt(data.table(iris)[1], measure.vars=measure(part, part))
Error in err.names.unique(group.desc, names(fun.list)) : 
  ... arguments to measure should be uniquely named, problems: [part]
> melt(data.table(iris)[1], measure.vars=measurev(list(part=NULL, part=NULL)))
Error in err.names.unique(group.desc, names(fun.list)) : 
  elements of fun.list should be uniquely named, problems: [part]

@MichaelChirico
Copy link
Member

Yea, I agree it's not perfect, but it certainly looks actionable to me because part is clearly duplicated.

@rikivillalba
Copy link
Contributor

rikivillalba commented Sep 26, 2024

In deve.R I think we can merge
https://github.com/Rdatatable/data.table/blob/master/R/devel.R#L40:L44

    cat(sprintf("R %s package %s %s (%s)\n",
                pkg,
                c("is up-to-date at","has been updated to")[upg+1L],
                unname(read.dcf(system.file("DESCRIPTION", package=pkg, lib.loc=lib, mustWork=TRUE), fields=field)[, field]),
                utils::packageVersion(pkg, lib.loc=lib)))

into

    pkg_rev = unname(read.dcf(system.file("DESCRIPTION", package=pkg, lib.loc=lib, mustWork=TRUE), fields=field)[, field])
    if(upg) {
      catf("R %s package %s has been updated to (%s)\n",  pkg, pkg_rev, utils::packageVersion(pkg, lib.loc=lib))
    } else {
      catf("R %s package %s is up-to-date at (%s)\n",  pkg, pkg_rev, utils::packageVersion(pkg, lib.loc=lib))
    }

I think we can use catf despite the previous line unloads the package.

@aitap
Copy link
Contributor Author

aitap commented Sep 27, 2024

This is tricky code. Objects that live in package namespaces are "lazy-loaded": the namespace environment initially contains "promise" objects that, when first accessed, quickly seek to pre-remembered offsets in R/data.table.rdb and uncompress whatever they find there. This function unloads the namespace and installs a new package, replacing the *.rdb file with a different one, where the offsets have changed. If the catf function is still a promise at this point, it will at least cause a warning (internal error -3 in R_decompress1), maybe fail to evaluate (lazy-load DB is corrupt):

Downloaded package sources are in
        ‘/tmp/Rtmp3GzX83/downloaded_packages’
R data.table package 0c957c70d019ebf36c44c988d5b2a4def9c05c92 has been updated to (1.16.99)
Warning:
In get(method, envir = home) : internal error -3 in R_decompress1

It may be better to limit the "last on the way out" code to base-R functions only:

    pkg_rev = unname(read.dcf(system.file("DESCRIPTION", package=pkg, lib.loc=lib, mustWork=TRUE), fields=field)[, field])
    cat(sprintf(
      if (upg) gettext(
        "R %s package %s has been updated to (%s)\n", domain = "R-data.table"
      ) else gettext(
        "R %s package %s is up-to-date at (%s)\n", domain = "R-data.table"
      ),
      pkg, pkg_rev, utils::packageVersion(pkg, lib.loc=lib)
    ))

@MichaelChirico
Copy link
Member

Closed between #6534 and #6536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation issues/PRs related to message translation projects
Projects
None yet
Development

No branches or pull requests

4 participants