Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect formatting of optional values #265

Open
Ravikanth143143 opened this issue Feb 24, 2023 · 2 comments
Open

Incorrect formatting of optional values #265

Ravikanth143143 opened this issue Feb 24, 2023 · 2 comments

Comments

@Ravikanth143143
Copy link

Ravikanth143143 commented Feb 24, 2023

My Sample code to convert the JSON String(value) to Avro Binary is as follows:

native, _, err := schema.Codec().NativeFromTextual([]byte(value))
if err != nil {
	return nil, err
}

valueBytes, err := schema.Codec().BinaryFromNative(nil, native)
if err != nil {
	return nil, err
}

I have the following JSON Schema:
Schema-1:
This schema as per Avro specification, needs a mandatory username but optional colour & mass fields which defaults to null.

{
  "type": "record",
  "name": "user",
  "fields": [
    {
      "name": "username",
      "type": "string"
    },
    {
      "name": "colour",
      "type": [
        "null",
        "string"
      ],
      "default": null
    },
    {
      "name": "mass",
      "type": [
        "null",
        "int"
      ],
      "default": null
    }
  ]
}

My incoming JSON String is:
Ex-1:

{"username":"temp"}

This will be converted to Avro just fine. No issues. And the downstream systems like databases considers mass & colour as null for this record which is all good.

Ex-2:

{"username":"temp","colour":"red","mass":10}

This fails with the following error:

cannot decode textual record \"user\": cannot decode textual union: expected: '{'; actual: '\"' for key: \"colour\"", "message": "{\"username\":\"temp\",\"colour\":\"red\",\"mass\":10}"

I couldn't find a way to get around this. However, reading through multiple issues in this repo and one among them is: #114, I made the schema change to be as follows:
Schema-2:
This schema specifies that username is mandatory. However, colour & mass are optional fields with default values to be ""(empty string with size 0) & 0(integer 0) respectively.

{
  "type": "record",
  "name": "user",
  "fields": [
    {
      "name": "username",
      "type": "string"
    },
    {
      "name": "colour",
      "type": "string",
      "default": ""  # A null value here fails `NativeFromTextual` with a `panic: runtime error: invalid memory address or nil pointer dereference` error
    },
    {
      "name": "mass",
      "type": "int",
      "default": 0     # A null value here fails `NativeFromTextual` with a `panic: runtime error: invalid memory address or nil pointer dereference` error
    }
  ]
}

My incoming JSON String is:
Ex-1:

{"username":"temp"}

This will be converted to Avro just fine. However, the meaning of the record has now changed due to the defaults not being accepted as null but some values. When the downstream big data systems consume these messages and perform the aggregates the meanings will changed as the int null is different from int 0 and similar with any other data types.

Ex-2:

{"username":"temp","colour":"red","mass":10}

This works just fine. No issues

What I wanted is:

  1. A schema that works for both the below JSON messages with out changing the meaning of the message i.e the schema should accept null values when the incoming messages doesn't have the associated fields defined.
{"username":"temp"}
{"username":"temp","colour":"red","mass":10}

Please note that, I do not intend to change the JSON message structure.

Any inputs are greatly appreciated.

@amwill04
Copy link

amwill04 commented Mar 14, 2023

To give further context to this:

package main

import (
	"fmt"

	"github.com/linkedin/goavro/v2"
)

var schema = `{
  "type": "record",
  "name": "Example",
  "fields": [
    {
      "name": "ProductType",
      "type": [
        "null",
        "string"
      ],
      "default": null
    },
    {
      "name": "Address",
      "type": {
        "name": "AddressRecord",
        "type": "record",
        "fields": [
          {
            "name": "street",
            "type": ["null", "string"],
	    "default": null
          }
        ]
      }
    }
  ]
}
`

func main() {
	schemaCodec, err := goavro.NewCodec(schema)
	if err != nil {
		fmt.Println(err)
		return
	}

	// This will work returning
	// map[Address:map[street:<nil>] ProductType:<nil>]
	n, _, err := schemaCodec.NativeFromTextual([]byte(`{"Address":{}}`))
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Println(n)

	// This will fail with the following:
	// cannot decode textual record "Example": cannot decode textual record "Address": cannot decode textual union: expected: '{'; actual: '"' for key: "street" for key: "Address"
	n, _, err = schemaCodec.NativeFromTextual([]byte(`{"Address":{"street": "foo"}}`))
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Println(n)

}

So it is working as expected at a top level of the schema but not on any nested records

@mihaitodor
Copy link
Collaborator

Hey @amwill04, I believe you need to use goavro.NewCodecForStandardJSON instead of goavro.NewCodec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants