Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature][WIP] YAML Schema validation for syntax compatibility check #348

Open
martinrieder opened this issue May 17, 2024 · 7 comments
Open

Comments

@martinrieder
Copy link

martinrieder commented May 17, 2024

With respect to the latest backwards-incompatible changes, input files may require some compatibility check. It is quite common that drawings are revised after many years. May I suggest adding an (optional) "syntax version" tag? How else could it be assured that future versions of WireViz support the syntax of an older one?

Note that following issue was raised by @nanangp for adding versioning checks on the CLI syntax as well:
nanangp/vscode-wireviz-preview#6

Going a bit further, some of the settings given as a command line argument, could also be listed in a header of the YAML file. This could either be a comment or incorporated in the WireViz syntax, like the technical drawing changes require adding some general info for the title block as well.

This suggestion might also be important for #223 YAML parser replacement.

@martinrieder
Copy link
Author

martinrieder commented May 21, 2024

I changed the title because I believe that schema validation provides just what we need here. It allows to check your YAML even before running WireViz.

Example from https://github.com/redhat-developer/yaml-language-server :

Associating a schema in the YAML file
It is possible to specify a yaml schema using a modeline. Schema url can be a relative path. If a relative path is specified, it is calculated from yaml file path, not from workspace root path

# yaml-language-server: $schema=<urlToTheSchema>

@martinrieder
Copy link
Author

martinrieder commented May 28, 2024

Found this file by @cooked as a starting point: wireviz-yaml-schema.json

TODO

  • Implement the WireViz syntax according to the current spec in syntax.md
  • Recognize cable and wire designator through pattern matching
  • Validate that designators in connections exist
  • Recognize special designators and auto-generation syntax in connections
  • Assure the correct order of cables and wires in the connections list

@martinrieder
Copy link
Author

martinrieder commented May 29, 2024

Additional to JSON Schema based validation that is widely supported, there is also 23andMe/Yamale which relies on its own schema defined in YAML itself. It can read data through either PyYAML or Ruamel, so choosing this would depend on the parser choice taken in #223.

I would like to mention that checks like in 9b2b22d will not be needed if schema validation was done programmatically.

Note that the validation should also take effect with the GH action Create Examples once implemented.

@martinrieder
Copy link
Author

martinrieder commented Jun 9, 2024

Just providing a quick update:

  • I have been fiddling with JSON Schema. It is a powerful tool, but things quickly get complex.
  • I am planning to evaluate Yamale and compare both solutions. It seems easier to define the schema in YAML.

Any comments on this are welcome. The main question that arises: Do we also need to provide live Schema validation for editing i.e. in VS Code?
See also nanangp/vscode-wireviz-preview#7 on this topic.

@martinrieder martinrieder changed the title [feature] YAML Schema validation for syntax compatibility check [feature][WIP] YAML Schema validation for syntax compatibility check Jun 10, 2024
@martinrieder
Copy link
Author

martinrieder commented Jun 10, 2024

I have attached the YAML Schema for Yamale. GH would not accept the file extension, so I had to rename it to .TXT.

schema.yaml.txt

schema.yaml

# WireViz YAML Schema for syntax validation using Yamale
connectors: map(include('CONNECTOR')) # dictionary of all used connectors
cables: map(include('CABLE')) # dictionary of all used cables and wires
connections: list(include('CONNECTION')) # list of all connections to be  made
additional_bom_items: list(include('additional_bom-item'), required=False) # custom items to add to BOM
metadata: include('metadata', required=False) # dictionary of meta-information describing the harness
options: include('option', required=False) # dictionary of common attributes for the whole harness
tweak: include('tweak', required=False) # optional tweaking of .gv output
---
DESIGNATOR: regex('^[a-zA-Z\200-\377_-][0-9a-zA-Z\200-\377_\\.]*$', name='designator', multiline=False)
---
CONNECTOR: # unique connector designator/name
  # pinout information
  # at least one of the following must be specified
  pincount: int(min=1, required=False) # if omitted, is set to length of specified list(s)
  pins: list(required=False) # if omitted, is autofilled with [1, 2, ..., pincount]
  pinlabels: list(required=False) # if omitted, is autofilled with blanks
  # pin color marks (optional)
  pincolors: list(required=False) # list of colors to be assigned to the respective pins;
  # general information about a connector (all optional)
  type: str(required=False)
  subtype: str(required=False)
  color: include('colorname', required=False) # see below
  image: include('image', required=False)
  notes: str(required=False)
  # product information (all optional)
  ignore_in_bom: bool(required=False) # if set to true the connector is not added to the BOM
  pn: subset(int(),str(),allow_empty=True) # [internal] part number
  manufacturer: str(required=False) # manufacturer name
  mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
  supplier: str(required=False) # supplier name
  spn: subset(int(),str(),allow_empty=True) # supplier part number
  additional_components: list(include('additional_component'), required=False) # additional components

  # rendering information (all optional)
  bgcolor: include('colorname', required=False) # Background color of diagram connector box
  bgcolor_title: include('colorname', required=False) # Background color of title in diagram connector box
  style: str(equals='simple', required=False) # may be set to simple for single pin connectors
  show_name: bool(required=False) # defaults to true for regular connectors, false for simple connectors
  show_pincount: bool(required=False) # defaults to true for regular connectors false for simple connectors
  hide_disconnected_pins: bool(required=False) # defaults to false

  loops: list(required=False) # every list item is itself a list of exactly two pins on the connector that are to be shorted
---
CABLE: # unique cable designator/name
  # conductor information
  # the following combinations are permitted:
  # wirecount only          no color information is specified
  # colors only             wirecount is inferred from list length
  # wirecount + color_code  colors are auto-generated based on the specified
  #                         color code (see below) to match the wirecount
  # wirecount + colors      colors list is trimmed or repeated to match the wirecount
  wirecount: int(min=1, required=False)
  colors: list(required=False) # list of colors (see below)
  color_code: include('colorcode', required=False) # one of the supported cable color codes (see below)

  wirelabels: list(required=False) # optional; one label for each wire
  # general information about a connector (all optional)
  category: str(equals='bundle', required=False) # may be set to bundle;
  type: str(required=False)
  gauge: subset(int(), num(), str(min=1), allow_empty=True) # allowed formats:
  show_equiv: bool(required=False) # defaults to false; can auto-convert between mm2 and AWG and display the result when set to true
  length: subset(num(), str(min=1), allow_empty=True) # num() is assumed to be in meters unless <unit> is specified
  shield: subset(bool(), include('colorname'), allow_empty=True) # defaults to false
  color: include('colorname', required=False) # see below
  image: include('image', required=False)
  notes: str(required=False)

  # product information (all optional)
  ignore_in_bom: bool(required=False) # if set to true the cable or wires are not added to the BOM
  pn: subset(int(),str(),allow_empty=True) # [internal] part number
  manufacturer: subset(str(required=False),allow_empty=True) # manufacturer name
  mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
  supplier: subset(str(),allow_empty=True) # supplier name
  spn: subset(int(),str(),allow_empty=True) # supplier part number
  additional_components: list(include('additional_component'), required=False) # additional components

  # rendering information (all optional)
  bgcolor: include('colorname', required=False) # Background color of diagram cable box
  bgcolor_title: include('colorname', required=False) # Background color of title in diagram cable box
  show_name: bool(required=False) # defaults to true
  show_wirecount: bool(required=False) # defaults to true
  show_wirenumbers: bool(required=False) # defaults to true for cables; false for bundles
---
CONNECTION: list(any(include('DESIGNATOR'),list(include('DESIGNATOR')),map(),enum('--','<--','<-->','-->','==','<==','<==>','==>')),required=False) # list of all connections to be made
---
additional_bom-item: # custom items to add to BOM
  description: str()
  # all the following are optional:
  qty: num(required=False) # qty to add to the bom (defaults to 1)
  unit: str(required=False)
  designators: list(include('DESIGNATOR'), required=False)
  pn: subset(int(),str(),allow_empty=True) # [internal] part number
  manufacturer: str(required=False) # manufacturer name
  mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
  supplier: str(required=False) # supplier name
  spn: subset(int(),str(),allow_empty=True) # supplier part number
---
additional_component:
  type: str() # type of additional component
  # all the following are optional:
  subtype: str(required=False) # additional description (only shown in bom)
  qty: num(required=False) # qty to add to the bom (defaults to 1)
  qty_multiplier: str(required=False) # multiplies qty by a feature of the parent component
  unit: str(required=False)
  pn: subset(int(),str(),allow_empty=True) # [internal] part number
  manufacturer: str(required=False) # manufacturer name
  mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
  supplier: str(required=False) # supplier name
  spn: subset(int(),str(),allow_empty=True) # supplier part number
  bgcolor: include('colorname', required=False) # Background color of entry in diagram component box
---
metadata: map(key=str(), required=False) # dictionary of meta-information describing the harness
---
option: # dictionary of common attributes for the whole harness
  bgcolor: include('colorname', required=False) # Default = 'WH'
  bgcolor_node: include('colorname', required=False) # Default = 'WH'
  bgcolor_connector: include('colorname', required=False) # Default = bgcolor_node
  bgcolor_cable: include('colorname', required=False) # Default = bgcolor_node
  bgcolor_bundle: include('colorname', required=False) # Default = bgcolor_cable
  color_mode: enum('full','FULL','hex','HEX','short','SHORT','ger','GER', required=False)
  fontname: str(required=False) # Default = 'arial'
  mini_bom_mode: bool(required=False) # Default = True
---
tweak: # optional tweaking of .gv output
  override: subset(map(required=False),allow_empty=True) # dict of .gv entries to override
  append: any(str(),list(),required=False) # string or list of strings to append to the .gv output
---
colorname: any(enum('BK','WH','GY','PK','RD','OG','YE','OL','GN','TQ','LB','BU','VT','BN','BG','IV','SL','CU','SN','SR','GD'),regex('#[0-9A-Fa-f]{6}',name='html_color'))
colorcode: enum('DIN', 'IEC', 'TEL', 'TELALT', 'T568A', 'T568B', 'BW')
---
image:
  src: str() # path to the image file
  # optional parameters:
  caption: str(required=False) # text to display below the image
  bgcolor: include('colorname', required=False) # Background color of entry in diagram component box
  width: int(required=False) # range: 1~65535; unit: points
  height: int(required=False) # range: 1~65535; unit: points

Running Yamale with schema.yaml in the WireViz folder will validate all YAML files from there. It even found that the GitHub Actions would not match and that there is an issue with ex14.yml...

C:\Git\martinrieder\WireViz>yamale --no-strict -n 1 -s schema.yaml
Finding yaml files...
Found 26 yaml files.
Validating...
Validation failed!
Error validating data 'C:\Git\martinrieder\WireViz\.github\workflows\main.yml' with schema 'schema.yaml'
        connectors: Required field missing
        cables: Required field missing
        connections: Required field missing
----
Error validating data 'C:\Git\martinrieder\WireViz\examples\ex14.yml' with schema 'schema.yaml'
        connections.0.7: '<=>' is not a designator.
        connections.0.7: '<=>' is not a list.
        connections.0.7: '<=>' is not a map.
        connections.0.7: '<=>' not in ('--', '<--', '<-->', '-->', '==', '<==', '<==>', '==>')

@kvid
Copy link
Collaborator

kvid commented Jun 10, 2024

@martinrieder wrote:

[...]

C:\Git\martinrieder\WireViz>yamale --no-strict -n 1 -s schema.yaml
Finding yaml files...
Found 26 yaml files.
Validating...
Validation failed!
Error validating data 'C:\Git\martinrieder\WireViz\.github\workflows\main.yml' with schema 'schema.yaml'

.github\workflows\main.yml is not a WireViz input file, and uses another syntax.

    connectors: Required field missing
    cables: Required field missing
    connections: Required field missing

None of these are required to be valid WireViz input. All documented top-level keys are optional, and any other top-level keys are just ignored. A random non-documented top-level key is often used to make templates to inherit later in the YAML input, e.g. in demo02.yml, ex05.yml, and ex06.yml. The syntax documentation is currently not very clear about this.

WireViz will happily process YAML input including only a single key, and it doesn't have to be one of the documented ones. Any non-dict as top-level will fail because yaml_data is assumed to be a dict.

Do you want to validate what WireViz really accept, or a stricter, more limited syntax interpreted from a syntax description not written precisely enough for validation?

With simple use cases it makes sense to drop top-level sections without contents. With v0.4, connections is required to obtain a non-empty diagram, but e.g. #267 is a use case where only additional_bom_items is needed.


Error validating data 'C:\Git\martinrieder\WireViz\examples\ex14.yml' with schema 'schema.yaml'
connections.0.7: '<=>' is not a designator.
connections.0.7: '<=>' is not a list.
connections.0.7: '<=>' is not a map.
connections.0.7: '<=>' not in ('--', '<--', '<-->', '-->', '==', '<==', '<==>', '==>')

Your validation rules are stricter than what WireViz accepts. The syntax documentation doesn't list all legal variations. See https://github.com/wireviz/WireViz/blob/v0.4/src/wireviz/wv_helper.py#L128-L140

@martinrieder
Copy link
Author

martinrieder commented Jun 10, 2024

@kvid wrote:

Do you want to validate what WireViz really accept, or a stricter, more limited syntax interpreted from a syntax description not written precisely enough for validation?

Neither of both. This is just an investigation about finding a tool that would suit the following purpose:

  • Validation at runtime: simple compatibility check for old YAML files
  • Validation while editing: syntax highlighting and code completion

The latter is closely related to nanangp/vscode-wireviz-preview#7

.github\workflows\main.yml is not a WireViz input file, and uses another syntax.
[...]
Your validation rules are stricter than what WireViz accepts. The syntax documentation doesn't list all legal variations.

There is no sense in proving that it passes the tests. I need to put up some strict rules to show that it will detect some issues, even by feeding it the wrong files.

Thanks for the hint to the code. Copying the syntax description into a schema format that Yamale expects was quite easy. You proved that this manual process is not ideal though. Yet, the resulting file is still human-readable, kind of self-explaining...

The point that @liambeguin is making in #220 (comment) is absolutely valid. If the schema could be generated from code, that would make it much easier to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants