Skip to content

Column functions

Jolan Rensen edited this page Aug 2, 2022 · 3 revisions

Operator functions

Similar to the Scala API for Columns, many of the operator functions could be ported over. For example:

ds.select( col("colA") + 5 )
// datasets can also be invoked to get a column
ds.select( ds("colA") / ds("colB") )

dataset.where( col("colA") `===` 6 )
// or alternatively
dataset.where( col("colA") eq 6)

In short, all supported operators are:

  • ==
    • same as equals()
  • !=
    • same as !equals()
  • eq / `===`
    • in Scala: ===
    • in Java: equalTo()
  • neq / `=!=`
    • in Scala: =!=
    • in Java: notEqual()
  • -col(...)
    • same in Scala
    • in Java: negate(col())
  • !col(...)
    • same in Scala
    • in Java: not(col())
  • gt
    • in Scala: >
    • same in Java but also infix
  • lt
    • in Scala: <
    • same in Java but also infix
  • geq
    • in Scala: >=
    • same in Java but also infix
  • leq
    • in Scala: <=
    • same in Java but also infix
  • or
    • in Scala: ||
    • same in Java but also infix
    • `||` is unfortunately an illegal function name on Windows
  • and / `&&`
    • in Scala: &&
    • in Java: and()
  • +
    • same in Scala
    • in Java: plus()
  • -
    • same in Scala
    • in Java: minus()
  • *
    • same in Scala
    • in Java: multiply()
  • /
    • same in Scala
    • in Java: divide()
  • %
    • same in Scala
    • in Java: mod()

QOL additions

Secondly, there are some quality of life additions as well:

In Kotlin, Ranges are often used to solve inclusive/exclusive situations for a range. So, instead of between(a, b) you can now do:

dataset.where( col("colA") inRangeOf 0..2 )

Also, for columns containing map- or array-like types, instead of getItem() we have:

dataset.where( col("colB")[0] geq 5 )

Using reflection for Column names

Finally, thanks to Kotlin reflection, we can provide a type- and refactor safe way to create TypedColumns and with those, a new Dataset from pieces of another using the select() function:

val dataset: Dataset<YourClass> = ...
val newDataset: Dataset<Tuple2<TypeA, TypeB>> = dataset.select(col(YourClass::colA), col(YourClass::colB))

// Alternatively, for instance when working with a Dataset<Row>
val typedDataset: Dataset<Tuple2<String, Int>> = otherDataset.select(col<_, String>("a"), col<_, Int>("b"))