Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating individual definitions #432

Open
sztomi opened this issue Jul 2, 2023 · 10 comments
Open

Validating individual definitions #432

sztomi opened this issue Jul 2, 2023 · 10 comments

Comments

@sztomi
Copy link

sztomi commented Jul 2, 2023

I'm trying to validate JSON snippets against the DAP JSON schema: https://microsoft.github.io/debug-adapter-protocol/debugAdapterProtocol.json

Each type of message is a separate definition. I can extract and compile the relevant definition, but references are not resolved. I think I need to use the with_document function, but I'm not sure what the ID is supposed to be.

fn main() {
    let schema = include_str!("../debugAdapterProtocol.json");
    let schema: Value = serde_json::from_str(schema).unwrap();

    // Get the specific definition from the schema
    let next_request_schema = schema.get("definitions").unwrap().get("NextRequest").unwrap().clone();
    
    // Compile this specific schema
    let compiled = JSONSchema::options()
        .should_validate_formats(true)
        .should_ignore_unknown_formats(false)
        .with_document("#".to_string(), schema)
        .compile(&next_request_schema)
        .unwrap();

    let instance = json!({
        "seq": 153,
        "type": "request",
        "command": "next",
        "arguments": {
        "threadId": 3
        }
    });
    let result = compiled.validate(&instance);
    if let Err(errors) = result {
        for error in errors {
            println!("Validation error: {}", error);
            println!("Instance path: {}", error.instance_path);
        }
    }
}

This results in

Validation error: Invalid reference: json-schema:///#/definitions/Request
Instance path:
Validation error: Invalid reference: json-schema:///#/definitions/NextArguments
Instance path:

I tried an empty string as well, but that didn't work either. Any clues?

@VoltaireNoir
Copy link

I am facing the same problem with internal references. Were you able to figure it out?

@sztomi
Copy link
Author

sztomi commented Feb 11, 2024

@VoltaireNoir
Copy link

@VoltaireNoir with a very ugly hack: https://github.com/sztomi/dap-rs/blob/main/integration_tests/src/lib.rs#L27

Ah, I arrived at the same solution: to replace the references with the appropriate schema. I guess I'll stick with it until jsonschema is able to resolve local/internal references correctly.

@eirnym
Copy link

eirnym commented Feb 11, 2024

I have complex schema with dependencies between files and defined objects. I can't extract an object like @sztomi has shown in his example. Is it still possible to use this subschema to validate?

@VoltaireNoir
Copy link

I have complex schema with dependencies between files and defined objects. I can't extract an object like @sztomi has shown in his example. Is it still possible to use this subschema to validate?

I believe this library supports resolving web and file based references. Try enabling the resolve-http and resolve-file feature flags.

@eirnym
Copy link

eirnym commented Feb 19, 2024

Try enabling the resolve-http and resolve-file feature flags.

This solves a small portion of my goal.

I have several files common.yml, schema1.yml and schema2.yml (json schema is saved in safe yaml without references), all of these files contain all objects under $defs. Main schema contains only schema version, it's $id and minimal description. My goal is to create a validator object to validate by a definition for an object defined as above. Obviously loader is restricted by a folder where schemas are located and other locations are deliberately disabled to avoid security issues.

In current setup I use another validator (as it's a Python project) where I able to load given files into registry and then during application startup add referencing schemas to create an actual validator. Referencing schemas are anonymous schemas with references to an actual object defined in the registry, e.g. containing only $ref pointing to schema1.yml:#/$defs/obj1 or urn:schema:2:id:#/$defs/obj2 (where urn:schema:2:id is an $id value from schema2.yml file).

Could you please help me how to do a similar thing using this library?

@thomas-burke
Copy link

This is a very useful crate. Nicely done.
I've worked around the limitation in a similar use case by iterating through the definitions and concatenating an "allOf:" [{"$ref":"#/definitions/<schema-needed-later>"}] to the end and then compiling the full schema for the definitions I need later.

example:

  • Compile {<all-definitions-here>, "allOf:" [{"$ref":"#/definitions/<schema-needed-later1>"}]}
  • Compile {<all-definitions-here>, "allOf:" [{"$ref":"#/definitions/<schema-needed-later2>"}]}

This is inefficient from a compile time & memory perspective, so I'd like to +1 this request.

Options I can think of would be:

  1. Add a way to simulate the "allOf" at validate-time (e.g. a new signature validator.validate(&json, "#/definitions/Event") or similar).
    • Overall this seems to be the most efficient way because it is compile-once, validate-many
  2. Add a way to compile the targeted schema schema without the need to do JSON manipulation (e.g. a new signature validator = validation_options.build(&schema3, "#/definitions/Event") or similar)
    • This would save the need for transient memory allocation that I do prior to build, but the compile-time and final memory costs would be similar to my current work-around.

I tried to implement a facsimile of (1) by making a version of validate that also accepts a LazyLocation, but I'm wasn't familiar enough with the code to make it work.

I think (1) would also help with this issue: #452

@Stranger6667 Stranger6667 mentioned this issue Dec 22, 2024
12 tasks
@Stranger6667
Copy link
Owner

Stranger6667 commented Dec 29, 2024

Hello folks!

Sorry for the delay, but I hope that as jsonschema now has 100% support for all the necessary JSON Schema features across all drafts, the issue has a solution now.

As suggested in the JSON Schema Slack channel by Greg Dennis, the easiest way would be to create a new schema that directly references the one you need. For example, this test passes as of version 0.28.0:

    #[test]
    fn test_subschema() {
        // Your root schema
        let root = json!({
            "$id": "https://example.com/root-schema",
            "definitions": {
                "NextRequest": {
                    "type": "object",
                    "properties": {
                        "seq": { "type": "integer" },
                        "type": { "type": "string" },
                        "command": { "type": "string" },
                        "arguments": { "$ref": "#/definitions/NextArguments" }
                    },
                    "required": ["seq", "type", "command"]
                },
                "NextArguments": {
                    "type": "object",
                    "properties": {
                        "threadId": { "type": "integer" }
                    },
                    "required": ["threadId"]
                }
            }
        });
        // Schema for `NextRequest`
        let schema = json!({
            "$id": "new-schema",
            "$ref": "https://example.com/root-schema#/definitions/NextRequest"
        });
        let validator = jsonschema::options()
            .with_resource(
                // Make sure you add your root schema
                "https://example.com/root-schema",
                jsonschema::Resource::from_contents(root).expect("Invalid specification"),
            )
            .build(&schema)
            .expect("Invalid schema");
        let valid = json!({
            "seq": 153,
            "type": "request",
            "command": "next",
            "arguments": {
                "threadId": 3
            }
        });
        let invalid = json!({
            "seq": 123,
            "type": "request",
            "command": "next",
            "arguments": {
                // This property is invalid
                "threadId": "bug"
            }
        });
        assert!(validator.is_valid(&valid));
        assert!(!validator.is_valid(&invalid));
    }

I think it should be enough to solve the original issue for @sztomi. In other words, the solution is already there but I think it is completely unclear how to approach this use case effectively. At least, I didn't have a clue for quite some time. Therefore I plan to do the following in the next couple of releases:

  • Properly document the use case with examples
  • Add test cases specifically for this
  • Maybe add a shortcut for creating such schemas. Like Validator::validator_at("json path") on validator instances to avoid the boilerplate of defining a new schema / repeating the same URIs
  • It will actually be nice to have a better way to reuse registries in multiple validators, at least have a cheap clone.
  • Expose it in Python API (I've completely missed that)

It also seems to me that #452 and #365 are achievable with exactly the same solution above. Tagging the authors for visibility. @jeromegn @VoltaireNoir

Please, let me know if I've missed anything or if the proposed solution is not enough.

@cschramm
Copy link

cschramm commented Dec 30, 2024

Thank you for that explanation, @Stranger6667. That just blew my mind even though it is so simple.

Applying it to a real-life case, there is one issue left at least slightly related to this: There are $refs like schema1.yml#/$defs/obj1 as mentioned in #432 (comment). For some reason, the registry does not resolve them against the resource's base. Instead, it applies its default root URI json-schema:/// which is useless with the DefaultReceiver.

It's not a big deal as I have to use a custom receiver for multiple reasons anyway (YAML, preprocessing that adds unevaluatedProperties / additionalProperties for stricter validation, etc.), but the application of the default root URI does not seem logical, and it would be nice if the DefaultReceiver worked for more basic cases than mine.

EDIT: That is pretty much #640

@Stranger6667
Copy link
Owner

Thanks for sharing @cschramm !

#640 is fixed in 0.28.1 (I hope that I got that fix right).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants