Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to System.Text.Json #100

Merged
merged 30 commits into from
Nov 9, 2021
Merged

Conversation

Turnerj
Copy link
Collaborator

@Turnerj Turnerj commented Nov 20, 2019

Attempts to resolve #96 , switching from JSON.Net to System.Text.Json.

One problem with the migration is it requires dropping .NET Standard 1.1 support. I've done that in this PR already however if you're wanting to maintain that, it might be possible with falling back to JSON.Net but likely will require a ton of #if declarations in every class.

Current status: Down from 27000 errors to 0 - just testing is left.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 20, 2019

It doesn't look like some of the functionality that JSON.Net has exists in System.Text.Json, specifically the handling of default values. Currently JSON.Net is configured to ignore default values however you can't do that with System.Text.Json.

Assuming I can get everything else to work, I think this still might be a deal breaker.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 20, 2019

A few more opinionated changes in the Thing class too with the ToString methods.

The short of it is:

  • HTML Escaping is enabled by default in JsonSerializer so I've obsoleted the ToHtmlEscapedString method. It can be disabled by using a custom encoder however with it being enabled by default, might be a good time to drop this method - up to you!
  • The static JsonSerializerOptions need to use a static constructor due to the access level of the Converters property (its read only).

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 20, 2019

Currently through getting the tests working, there is a big blocker with: dotnet/corefx#38841

I could manually change every DataContract and DataMember reference to use the new JsonPropertyName property but not sure if that is a good idea. Doing it would mean the property order would no longer be observed and while that is fine for JSON, it would require updating all the tests that do an exact text match.

@RehanSaeed
Copy link
Owner

This is good stuff. I wondered if there would be road blocks like null value handling problems.

With regards to HTML escaping by default. There are times when you might not want to do that (you're not working in a web context), so we need to provide both methods.

Regarding https://github.com/dotnet/corefx/issues/38841, it may be that we need to wait for .NET Core 3.1 before this PR can be accepted.

Can you not set the order with System.Text.Json at all? I wouldn't mind a complete switch to the new attributes if required. We don't support any other serialization like XML, so there is no benefit to using DataMember.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 21, 2019

As far as I've seen in the source code, no it doesn't seem possible to set the order with the new property. I have a feeling they probably won't have that functionality though as according to the JSON spec, an object is an unordered list of name/value pairs.

As far as I can tell, the serializer will put inherited properties at the end so it is at least still predictable, just that all the tests that do a string comparison will need to have their properties moved around.

I've updated the Thing.ToString() methods so the HTML escaping and non-escaping paths are working as intended again.

I'm happy to wait for 3.1 to see if DataMember change has landed though can move to JsonPropertyName if you want. 🙂

@RehanSaeed
Copy link
Owner

Happy to move to JsonPropertyName, it's just a find and replace to get there.

The ordering is a bigger problem as you have to mess with all of the tests which is a bigger job and much harder to review the PR. If there are no plans to implement ordering in 3.1, then perhaps we should go ahead with it, otherwise wait for 3.1. Thoughts?

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 28, 2019

I really don't see them bringing across ordering to JsonPropertyName simply because JSON has no dependency on ordering of object keys normally. At a quick count, there is maybe 36 JSON strings in the tests so it isn't too much to modify.

It does suck needing to change the tests, less because of the work to do it, more because it is far nicer having the code meet the tests rather than updating both (code and tests) to meet each other.

I'll look at getting this done soon, even if it is just so we can see if there are any other deal breakers with switching to System.Text.Json.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 28, 2019

Note for self: Need to update Schema.NET.Tool - it has a number of string references to DataMember and Newtonsoft that would need updating.

@RehanSaeed
Copy link
Owner

Instead of comparing exact string representations in our tests, perhaps we should compare JSON object types from System.Text.Json?

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 29, 2019

That could work! So with Newtonsoft that was JObject and in System.Text.Json that is JsonDocument - it would likely make the code less fragile. It also would have a built in comparer so besides writing up the model, it should fit easily with the test assertions.

The only exception to this is still specifically checking how the ToString extensions on Thing work. They specifically replace the inner @context properties from the JSON and this wouldn't match the normal deserialization process.

@RehanSaeed
Copy link
Owner

Yes, I hadn't thought of that. I guess string comparisons are still important. The System.Text.Json API must still output a deterministic order for all it's properties, so I guess it might be best to bite the bullet and fix the tests to match.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 30, 2019

Another note to self: There looks to be a more efficient way to cast raw JSON to value types than using int.TryParse etc. This would be used in ValuesJsonConverter where many of the direct TryParse methods are used directly.

See: https://stackoverflow.com/a/57334833/1676444

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 30, 2019

Had to move to a preview version of System.Text.Json as the code was hitting an unavoidable error. It seems like System.Text.Json 4.6.0 doesn't understand OneOrMany properly and while it uses the CanConvert on ValuesJsonConverter, it never actually used the converter itself. Anyway, updating to the latest preview avoids that issue.

A few tests are now passing with no change (eg. MusicEventTest) though I'm not sure some of these tests are valid. MusicEventTest for example deserializes and serializes again a string which wouldn't actually test whether the string JSON is an exact match. It might be worth looking through some of the tests to make sure they are testing the right things. The vast majority seem to be doing that (in at least, most of them are still failing) but some passed where they likely shouldn't have.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 30, 2019

Two curly issues I've needed to work around due to how System.Text.Json works though they are a result of the same problem. The serializer doesn't seem to recognize object types properly, instead falling back to whatever interface on a collection or specific generic parameter provided is.

Case in point, the ToString method on Thing passed this which made the serializer think it was only wanting to serialize the properties of Thing (rather than the actual type that inherits it). That was a brain buster to solve but simply passing the type directly resolves it.

Similarly when serializing a OneOrMany property, they normally point to the interface (eg. OneOrMany<ICreativeWork>) which made the serializer only try to serialize that interface and the problems that arise from that (not using the right JSON property names). The work around for this was looping the enumerable directly rather than passing it off.

Ultimately though we are still stuck with default values being serialized. The "IgnoreNullValues" requires classes but because the property types are structs, that never works so the serialized JSON is full of null values. This sets back actually updating the tests - we might have to wait for dotnet/corefx#38878 before continuing much more on this PR.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 1, 2019

I threw together a basic "Thing" converter which mitigates the need for System.Text.Json to support ignoring default values. I've put it in a Gist as I'm not sure we want to go that direction or just wait for real support: https://gist.github.com/Turnerj/11bb3832e4d4a40464768311de3d46e8

@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 8, 2019

I've fixed issue #104 for System.Text.Json though the current master with JSON.Net still has the issue described.

@RehanSaeed
Copy link
Owner

Do you think we're close with this or should I start to think about fixing it with JSON.NET?

@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 8, 2019

For the most part, yeah we are close. The biggest issue is still the lack of ignoring default values - I have a work around which basically takes over all serialization of all schema entities, just needs the JsonConverter attribute on every class (not difficult). I'm not sure if we want to go this route or wait till real support for ignoring default values exists - it could be a while waiting.

Until we know the path for ignoring default values (whether to use my workaround or wait till real support), there isn't much point altering the tests. Doing so currently will still lead to test failures because of a lot of null properties being serialized and that would take a while to write out in all the JSON string assertions.

If you're happy with me using my workaround, I can continue but otherwise, it is a waiting game for dotnet/corefx#38878 to be done.

@nickevansuk
Copy link
Contributor

nickevansuk commented Dec 8, 2019

Just throwing my 2cents in here:

Really exciting to see this working, and great how much progress we’ve been able to make and contribute back to corefx issues at the same time to help steer System.Text.Json.

I help maintain a library (OpenActive.NET) on top of Schema.NET - expecting this change - so great to see it’s progressing.

My quick thought was given that Schema.NET is in use in production across a large number of systems, I’m not sure we need to push ourselves to be white-knuckle-bleeding-edge here? Perhaps sticking with the tried and tested JSON.NET while System.Text.Json settles into mainstream use would be best - especially now we’ve done the hard work to identify which bits of System.Text.Json we need to see improvements in before migrating?

Although we gain performance in theory using System.Text.Json (though perhaps not so much with the workaround still in place?) it doesn’t seem like there’s any obvious functional benefit here and we’re increasing complexity by being an early adopter?

As I say just my 2cents, is really great to be able to give the MS team feedback on where we’ve found issues with the new functionality, just not sure if we need to go the whole way just yet?

@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 9, 2019

Those are fair points you've raised @nickevansuk - I too am working on a pretty substantial project where Schema.NET plays a vital part so I definitely want to make sure it is stable, fast and memory efficient.

I don't imagine the workaround for default values performing any slower than having that functionality built-in (both do caching of certain reflected property lookups) but it would increase the overall code maintenance required for the library. On the other hand, once working support for ignoring default values lands, it would make it redundant so it likely wouldn't be around long-term.

Keeping with JSON.Net for now also does allow the library to keep .NET Standard 1.1 support (that was one of the first things that had to go for System.Text.Json support). Personally I'm indifferent about supporting .NET Standard 1.1, I am myself using .NET Core 3 but even then, there still is support for newer versions of .NET Framework etc.

What might be good in the meantime is actually beefing up the tests (I found some while doing this PR that don't seem to be adequately testing what I think they should be), adding some basic benchmarking (both to optimise the current code and to measure the performance improvement when going to System.Text.Json) and any other tweaking (a few little quality-of-life improvements when de-serializing certain values that don't specify a type, potentially auto-generated IEquatable<T> for each type, other useful things like merging schema objects).

If @RehanSaeed is happy for those types of things, I'm happy to focus on those things and let this PR sit till System.Text.Json has "ignore default value" support.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 9, 2019

If we still want to go ahead with System.Text.Json with my workaround, I would definitely be doing significant testing and validation to make sure it all plays nice - no point switching to a better dependency if everything is a hot mess of buggy code!

@RehanSaeed
Copy link
Owner

Happy to wait for .NET 5 if it makes it easier. In that case we need to fix #104 but I don't have much time in the short term.

Also as an aside if we want to add System.Text.Json, we should consider adding an option to serialize directly to UTF8 which is a bit faster. I've done this with some other code I've been working on but not given it much thought with Schema.NET.

@RehanSaeed
Copy link
Owner

See docs about serializing to and from UTF8

https://docs.microsoft.com/en-gb/dotnet/standard/serialization/system-text-json-how-to

@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 15, 2019

I've done a preliminary benchmark run on System.Text.Json and while it isn't stable (one of the benchmarks failed - might be an internal System.Text.Json thing) and includes default values (so more strings it has to write), it performs about as good as my overhaul of the ValuesJsonConverter in #109.

Method Runtime Mean Error StdDev Min Max Gen 0 Gen 1 Gen 2 Allocated
Serialize .NET 4.7.2 297.3 us 2.93 us 2.74 us 288.4 us 299.3 us 37.5977 - - 119669 B
Deserialize .NET 4.7.2 NA NA NA NA NA - - - -
Serialize .NET Core 3.0 153.6 us 2.58 us 2.41 us 149.1 us 157.9 us 15.1367 - - 48024 B
Deserialize .NET Core 3.0 319.1 us 4.36 us 3.87 us 307.5 us 324.7 us 33.6914 - - 106906 B

I did have another thought regarding improving serialization/deserialization performance by having Schema.NET.Tool generate custom converters for each type. Technically we have all the pieces we need in the tool with each property and the ancestor properties etc plus we know all the types ahead of time.

Unless System.Text.Json is emitting a custom converter for each type, it has to be using some form of reflection for properties. Even if those are cached, I can't see it being faster than a custom converter for writing.

I think this is something worth pursuing and could actually make the need for ignoring default values redundant. Also, we would have control of the property order so the tests could stay as they are.

@Turnerj
Copy link
Collaborator Author

Turnerj commented May 15, 2020

With dotnet/runtime#36322 - it looks like we will be closer to moving this forward and not needing to worry about default values. Will keep my eye out if there are any beta versions of the System.Text.Json library out with this new change so we can keep progressing on this.

@Turnerj Turnerj force-pushed the system-text-json-update branch from a4dc45d to df836cb Compare November 9, 2021 10:47
@Turnerj
Copy link
Collaborator Author

Turnerj commented Nov 9, 2021

Took a bit longer to get sorted than I thought but it seems to all be good! Any final things you want to check before we merge @RehanSaeed ?

Copy link
Owner

@RehanSaeed RehanSaeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM minus a couple of very minor suggestions. Feel free to complete.

Source/Common/ContextJsonConverter.cs Outdated Show resolved Hide resolved
Source/Common/EnumHelper.cs Outdated Show resolved Hide resolved
@Turnerj Turnerj merged commit 2f222a6 into RehanSaeed:main Nov 9, 2021
@Turnerj Turnerj deleted the system-text-json-update branch November 9, 2021 13:05
@AraHaan
Copy link

AraHaan commented Nov 13, 2021

Yay, finally after a few years it was done. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues describing an enhancement or pull requests adding an enhancement. major Pull requests requiring a major version update according to semantic versioning.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch from Newtonsoft.Json to System.Text.Json
7 participants