[RFC] Asset Management and Pipeline #11

kabergstrom · 2018-08-13T23:43:01Z

I believe we can all agree that good tooling is essential for making users feel productive. Amethyst rests on a solid foundation of core tech but to really make a data-driven engine shine, solid editing and introspection tools are essential. I'd like to take a step closer to the Amethyst tooling vision and address the issue of assets, a common factor in all game editing tools.

If this seems like a good direction I'll be working on an RFC that will discuss how these tools may interact with assets once there is consensus on the problems to solve. This issue will initially contain some of my thoughts around problems and features I'd like to see in Amethyst with a suggested technical design coming in the RFC. Looking forward to your thoughts!

Background & Problem Statements

Asset lifecycle

Production-ready game engines generally have multi-stage asset pipelines. This means that an asset goes through multiple steps of processing and conversion before being loadable in the engine runtime. Usually there are three stages for an asset.

Input -> Edit -> Runtime

The input format is usually some form of common data interchange format like fbx, png, tga. The edit format is engine-specific and generally abstracts the input format as well as provides the possibility to add metadata to the asset. The runtime format is optimized for quick loading and can be adapted per platform or based on other build settings. There are multiple benefits to this separation.

By separating the specifics of an input format from the data it provides the engine becomes more extensible. PNG, TGA, JPG provide textures which are generally collections of two-dimensional color arrays. FBX, OBJ, GLTF provide 3D scene data. Support for more formats that provide similar data can be added more easily.
How assets are loaded at runtime can be configured during edit time which simplifies loading APIs significantly. Decisions like which compression format to use for a texture or whether mipmaps should be generated can be made with tools instead of cluttering game code.
Asset preparation passes such as mesh simplification or texture compression can be configured and performed at build time instead of during runtime.
Assets can be built with different configurations for different purposes. Textures can be compressed differently for phones or consoles to ensure a smaller artifact and shaders can be precompiled for specific platforms to save on startup time.
Custom processing steps can be implemented by users. This can be useful to automatically configure something per platform or fix up some quirk in a third-party exporting tool.

Build scalability through "pure functional asset pipelines"

It's nice when you don't have to wait for your computer. Even if you have 80GB of source data like some people. Frostbite may have spent a ton of time to make their build pipelines fast and Amethyst doesn't really need to do that yet, but the key take-away and the enabling feature of their fully parallel and cachable build pipeline is a deterministic mapping from source data to build artifact. This is what enables a bunch of caching tricks and studio-wide networked caching systems that can, combined with a few 40G switches, make your build times quite acceptable.

To clearly state the requirement, this means being able to deterministically hash a source asset and all variables that become an input to the build process and also have the build artifact be deterministic. This usually means hashing the asset's build config, target platform, compiler version, build code version, importer version, asset dependency hashes. Once you have calculated the hash, you can request the artifact off the network or a local cache.

NVMe m.2 drives are becoming cheaper and cheaper with multiple GBps in sequential read & write speeds. I'd be really glad if Amethyst could scale to the limitations of the hardware in its asset pipeline.

Concurrent Modifications

While I enjoy the Unix philosophy and admire the vision for Amethyst tools, there is a large difference between Unix command-line tools and game development tools. Game development tools are usually interactive and persistent in their display of information while Unix tools run once over a set of data, output a result and terminate. This difference results in one of the greatest challenges of computer science: cache invalidation!

Let's take a particle system editor as an example. Perhaps it edits an entity (prefab) asset. These assets are files on disk and presumably are not parsed and loaded each frame, thus there is a cached in-memory representation of the disk contents. If another tool, say a general component inspector of some kind, was to edit the same file concurrently there is a chance of inconsistent or lost data unless the tools exercised some form of cache coherence protocol.

Hot reload

Quick iteration times are key to staying competitive in the current game development market and hot reloading of as many asset types as possible is a large leap in the right direction. A running game should be able to pick up on asset changes from tooling over the network to enable hot reloading on non-development machines.

Search and query

Presumably many tools will want to search for specific files or attributes in files. This is useful when finding which assets reference a specific asset for example. Being able to find what you are looking for is amazing and if this can be provided as a common service to all tooling that'd presumably save a lot of time for both tool developers to avoid duplicated code and for users to find what they need. Attribute indexing would presumably require asset reflection of some sort.

Asset identifiers and Renaming or Moving

Users want to be able to rename or move assets without compromising their data and therefore references between assets cannot be based on paths, and preferably loading assets is not path-based either. Bitsquid's blog discusses this issue in detail.

The productivity gained from being able to describe your entire game as a graph where each edge is an asset reference and each node is an asset is incredible in many cases. It enables a better understanding of resources usage through visualization and to automatically optimize asset runtime layouts based on dependencies.

Persistent asset IDs can also enable "serialization of handles" where they are represented on disk as asset IDs but the in-memory representation is a handle that is materialized as the asset is loaded.

Asset Versioning or Version Upgrade

I'd argue that the #1 reason Linux has seen such success is the dedication Linus Torvalds has for maintaining compatibility between versions. When updating from one version of the Linux kernel to another, you never need to update any other applications and this is due to the strict policy of "no user space regressions".

It'd be nice if there was a way for Amethyst to ensure that assets created in older versions are still compatible when updating, or at least that there is an upgrade path. Otherwise Amethyst may end up with people staying on older versions and splitting the community at each major update. I'm not saying that this promise of not breaking people's projects needs to exist right now, but there should be a technical plan for how this can be handled in the future to ensure both a smooth upgrade process for users and preferably a low maintenance cost for the Amethyst developers.

An important note is that it's easier to automatically upgrade people's data than their code. As a data-driven engine that's probably something to embrace.

AnneKitsune · 2018-08-14T11:52:40Z

Extra constraints:

Load from the network at runtime.
Runtime asset overrides and additions. See this file just after the imports https://github.com/jojolepro/amethyst-extra/blob/master/src/lib.rs
How to avoid getting into the same mess that unity is in with loading levels and assets bundles at runtime.
Keep in mind that there is no reflection in rust. I wanted to make a prefab creator tool, but we don't have a way to easily list all component types for example.

kabergstrom · 2018-08-14T20:03:09Z

Load from the network at runtime.

Runtime asset overrides and additions. See this file just after the imports
https://github.com/jojolepro/amethyst-extra/blob/master/src/lib.rs

I agree that these should be possible.

How to avoid getting into the same mess that unity is in with loading levels and assets bundles at runtime.

Could you elaborate on this?

Keep in mind that there is no reflection in rust. I wanted to make a prefab creator tool, but we don't have a way to easily list all component types for example.

Yeah I've been wondering about that as well. To be able to create tooling that interacts with ECS data I believe there needs to be support for introspection of fields. But a number of questions arise regarding how to handle custom data structures in assets.

Should I have to recompile all tools when changing ECS fields?
Should I need to restart a tool I'm using?
Is there a difference with scripting?

From my POV neither restart nor recompilation of an tool should be required to observe updates. To give an ideal example, say I am editing a prefab with a custom component in a tool. I add a field to the custom component. Maybe some sort of build process. The tool shows the new field and I can start inputting values.

AnneKitsune · 2018-08-14T20:13:42Z

I agree ^

Could you elaborate on this?

Keep in mind that there is no reflection in rust. I wanted to make a prefab creator tool, but we don't have a way to easily list all component types for example.

Unity builds their assets, like your proposed solution. The issue we often get is that you want to dynamically load scenes, you need to build at least 3 of them (windows,mac, linux).

Other issue with unity is that loading an asset at runtime is very challenging (often impossible).
To do so, you need to: build a mod containing your assets as an AssetBundle. In the build game, import the asset bundle, resolve the dependencies, and finally add or replace the modded assets.

I want to keep the possibility of loading assets at runtime easily, and also be able to compile the ones that are going to be static.

kabergstrom · 2018-08-14T20:37:59Z

OK gotcha. I've built an extensive system for over-the-network hot reloading of arbitrary assets in Unity based on assetbundles so I think I know what you mean. The technical design will definitely try to address this.

I want to keep the possibility of loading assets at runtime easily, and also be able to compile the ones that are going to be static.

Just to clarify, when you say "loading assets at runtime easily" do you mean from a user API perspective or like, being able to load any asset source format at runtime (i.e. including importers in a build)? The distinction between a developer workflow and players running a finished build is important IMO.

AnneKitsune · 2018-08-14T21:13:24Z

-> being able to load any asset source format at runtime (i.e. including importers in a build)

Keep in mind that people will want to create entities at runtime from scripts and to import their assets from there.

From the game developer perspective, I don't think it matters too much if you have to compile your assets, provided you have hot reloading working and it isn't tedious to load non-compiled assets in case someone has a particular use case.

kabergstrom · 2018-08-17T22:48:58Z

Thanks for the comments. Now onto a more concrete solution for feedback.

Terminology

An asset is an engine-loadable object such as Mesh, Texture, Prefab, Animation etc.
An asset file is a file on disk that can contain multiple assets.

Project asset directory

To be able to provide search and listing functionality it's essential that there is some convention for organizing asset files. I propose an assets directory where all asset files live. Asset management can list and watch this directory recursively to become aware of changes.

Asset identification and rename/move handling

How assets are identified is key to enabling good tooling and to keep the engine code simple. I propose using a UUID for identiying an asset, which is a 16-byte identifier with guarantees of never generating the same ID on different machines. This will be the AssetID.

It's important to separate the concept of asset and file. An asset may not come from a file, and a file may not contain assets. Asset files are a source of assets.

Asset files need to be enriched with metadata to be able to provide a stable UUID for each asset within the file. The metadata for an asset file be persisted as a file on disk in the same directory as the asset file with .meta appended to the asset filename, i.e. "ASSET_FILENAME.meta". The exact data persisted is format-specific and will have to be handled by the file format importer. The implementation will need to guarantee that the identifier is stable across the following events:

Assets are added to a file. New identifiers are generated for new assets, identifiers for existing assets are kept.
Assets are removed from a file.
Contents of assets in the file are changed, but the assets remain. No identifiers change.
The user moves the file and its .meta file to a new location. No identifiers change.

Bonus points if the implementation can restore identifiers in the case where a file temporarily disappears from the assets directory and is later restored. Presumably this can be implemented by having a long-running service maintaining a mapping between asset file hash and its metadata.

The only way to load an asset in the engine runtime is with an AssetID.

Asset stages

Assets will go through multiple stages before being loaded into the Amethyst runtime engine.
Brackets mean a persisted data format. Parentheses is an optional stage.
[Asset file] -> Importer -> [Tool asset format] -> Build -> [Engine asset format] -> (Packaging ->) Engine load

Import

Importing an asset file does a few things.

Generates or updates asset identifiers.
Generates metadata for indexing/searching.
Converts the asset into a tool-friendly asset format that is independent of the asset file's format. See the Tool asset format section for more info.

Importing should support a pre-processing step and a post-processing step.
The pre-processing step can configure the importer based on the environment or file structure. This is useful if a format has multiple possible mappings to the internal asset format for example, or if some transformation like coordinate systems adaption is required.
The post-processing step can fix up or change the asset in the tool asset format. This can mean setting build settings based on the directory of the asset - perhaps you have a "sprites" directory where you want to automatically configure a build parameter for each sprite.

Tool asset format

There are many reasons for an edit-time asset format.

Enables tools to include a large amount of data in the format to improve the editing experience without compromising runtime loading performance or memory usage.
A common format for asset types abstracts the source file formats which provides a stable API for asset pre/post-processors, tools and asset builders.
Enables platform-specific build formats for the engine.
No need to import asset files again when building for a target platform, so should result in faster builds.

The format should use a serialization system that supports data evolution to allow for backward and forward compatibility with tooling when the format gets updated.

There are two categories of assets that differ in how they are treated by tools.

Externally created assets. Meshes, textures etc. These asset files are not modified by tools, but tools may add metadata to the asset file's .meta companion.
Amethyst assets. Prefabs primarily. These asset files are created and edited entirely by Amethyst tools so both the asset file's contents and the .meta file may be modified.

From this it's clear that there must be a system for applying these changes back to asset files and their metadata files. For both types of assets, it will be essential that the tool asset format has a clear division between metadata and data imported from asset files. For Amethyst assets the content of the asset file can be the serialized version of the data section.
For both metadata and asset files, the serialized format should be human-readable and friendly merge-friendly where possible by version control tools like git.

Tools may edit their in-memory representation of an asset file, but those changes will not be made visible to other tools before the asset goes through the import process again.

Asset builder & engine asset format

Asset builders take the asset tool format and produce a build artifact for consumption by the engine runtime that is optimized for loading speed. The build artifact result of the asset builder is identified with a hash of the AssetID, the asset contents, build parameters (including target platform). The build artifact must be deterministic.

Asset builders should provide a pre-processing step to enable platform-specific asset modifications where required.

Build artifacts have no requirement for backward and forward compatibility as speed should be prioritized in the runtime. This means the serialization format of the asset data can be arbitrary, even just a binary dump of the in-memory representation if it makes sense since a build of an asset is specific to a target platform. The built asset formats should include a common header so that the runtime can know which asset type to load it as for the case where the engine only knows the AssetID.

Asset references & loading dependencies

The unified AssetID makes it possible to load any asset without knowing its path or type. This also allows to create a dependency graph from only the metadata without having to parse or load any assets. Loading a subgraph of assets becomes easier since we can load dependencies before the assets that depend on them. Very powerful static analysis also becomes possible, such as estimating how much memory a certain scene or object and its dependencies will consume once loaded.

Assets can contain weak and strong AssetID references. Strong references are effectively handles in the runtime and are materialized when loading the asset by loading required asset dependencies. Weak references are just the AssetID bytes and can be used to trigger a load within the runtime by game code, but will not automatically trigger a load.

A key point is that the AssetID only identifies the asset in the source, but when loading an AssetID in the runtime we need to resolve it to a built artifact for the target platform. This process will be specific to the engine runtime environment and will differ between editor and packaged game. The implementation should be pluggable as this is the primary way of supporting asset overrides and similar features.

Asset management

An asset management component will watch the asset directory for external changes to files. It will expose an API to other tools with support for the following operations,

List all assets
Get metadata for an asset
Get tool asset format data for an asset
Listen for changes to assets
Update an asset's metadata or content

The system will need to persist information about the latest version observed for an asset and maintain an index for efficiently fetching metadata for assets.

When an import is required, the asset management system will invoke the appropriate importer, update its internal storage and propagate change events through the API.

Search

The Search component will call Asset Management to listen for changes to assets and maintain indices for efficiently searching through metadata. Suggestions for API specifics are welcome. Presumably some query language will be appropriate.

Build artifact management

Build artifact management is specific to the game runtime's environment, but I will here describe the likely implementation for development environments.

The build artifact management component provides an API for resolving an AssetID to the latest build artifact provided a set of build parameter constraints. If the build artifact is not cached, a build will be triggered to fulfill the request. It will listen for asset changes from the asset management API to internally delete cached build artifacts that are no longer up-to-date.

Hot reloading

Hot reloading will involve listening to asset changes from the asset management API and requesting the latest build artifact from the Build Artifact system. As long as all inter-asset references within the engine are done with handles, hot reloading of any asset seems possible.

Moxinilian · 2018-08-18T17:48:20Z

I've spent some time chatting about details of this proposal with @kabergstrom and I really dig it. This would fit very well with the modular tooling and data-driven aspect of the engine. A consequence of this will be that we will have to make tooling a priority as the only way this integrates well is with tools designed for it. But it's a great opportunity to actually get traction on that (critical) aspect of the engine, and that on top of a solid architectural design.
I think you should elaborate a bit on your vision regarding tooling in Amethyst (vision that I personally agree with).

dotellie · 2018-08-25T20:52:41Z

First of all, apologies if I have misunderstood something here. In general, I am in favor of this but I have a few concerns. First of all is the .meta files. Those usually end up cluttering the asset directories a lot and are generally a hassle to keep track of. Most of all, I would prefer another solution (maybe something similar to the .git folder?), but if .meta files are necessary, we need to make sure they all start with a dot so they are hidden by default. I'm also very concerned regarding this line:

The only way to load an asset in the engine runtime is with an AssetID.

For me personally at least, I only plan on using Amethyst as a framework, using the tooling as basically a prefab generator and as such, I think this is a workflow that we should, if nothing else, at least support. Only being able to load assets through AssetID's doesn't sound like it will lead to very readable code and it would also require the use of the tooling in order to get the AssetID. Another concern of similar character is loading assets dynamically from the internet (from an HTTP address or something) where an AssetID is no-go.

Again, I probably misunderstood something and you have probably already thought of most of this, but I wanted to at least put it out there anyway. 😛

AnneKitsune · 2018-08-25T20:59:25Z

Replying from phone. Going over elements from top to bottom.

An asset file is a file on disk that can contain multiple assets.

Call that AssetSource

Don't force the asset folder name. Should be configurable.

Uuid are good
Meta files are evil

We should avoid processes as much as possible.

The only way to load an asset in the engine runtime is with an AssetID.

We should still be able to do the decoding at runtime from the data bytes (modding and image manipulation)

My idea to replace .meta files.
Have a converter from a raw file format .png .jpg to a rawasset file .ron

Example: id: uuid, type: PngFormat, data: [u8]

That would be your edit time format.
It would then be converted by another amethyst tool "asset convert" or something that would convert the raw data into a general format. In this case that would be "InternalImageFormat". Finally, you use an "asset packet" to optionally make something equivalent to unity assetbundle.

Prefab refactor proposal / Asset loading additions

Raw assets vs prefabs

RawAsset: all .png .jpg .mp3 .blend .obj files

Prefabs are simply containers of data. They are classed in different conceptual categories, but are effectively the same implementation wise. Only what happens to the loaded data differs between those.

AssetPrefab: A prefab containing the runtime reprensentation of the asset's data. (Image, Audio, Mesh, Scene)
EntityPrefab: A prefab containing entities or a hierarchy of entities. (ScenePrefab, UiPrefab)
DataPrefab: Prefabs used to store configuration data. (Tilemaps, Spritesheets)

Meta data

Meta data will be stored inside of the prefab files, instead of in separated .meta files.

Raw asset to AssetPrefab

Using the amethyst cli:

$ amethyst convert assets/raw/my_image.png assets/prefabs/my_image.ron

Example of the resulting file:

#![enable(implicit_some)]
Prefab (
    id: "e79a6d1c1e5341ea8872fb1ad975e15b",
    asset: Image(
        size: (128,128),
        format: RGBA,
        data: [
            RGBA,
            RGBA,
            RGBA,
            RGBA,
            RGBA,
        ],
        // Optionnal options
        compression: None
    ),
    // You can have edit and runtime options here
    some_option: Something,
)

Doing it this way allow to now have any conversion to do at runtime to load the asset into the engine, without having the need for complex build pipelines.
You can still have edit time options that are reflected in the runtime product by changing the properties of the prefab and you can also still modify the original asset to change the raw data.

The code to convert the raw asset to the prefab is simple: It is the same we already use for the amethyst Formats. We convert the raw asset files into a format that is exactly the one used inside of amethyst, without the need for any conversions. We only need to ron::deserialize_bytes(include_bytes!("blabla"));

Also, we can still directly load the raw assets from the engine at runtime (.png .jpg) because the Format code would still live there.

** Raw Asset -> AssetPrefab paths? **

You can use either a built-in tool, or a custom one to do your caching/versionning/hot reloading.

An example of a built-in tool would be something that can map a file path from a raw asset to a prefab uuid.
A background service (or the amethyst editor) can check for changes in the raw asset and apply those changes in the prefab file's "asset" field.

HashMap<String, Uuid> // raw asset path -> prefab uuid
HashMap<Uuid, String> // prefab uuid -> prefab file path

Hot reload steps:

Detect for changes in assets stored in raw asset paths
If a change is detected, run "amethyst convert" to get the converted data
Find the path to the associated uuid prefab. (if not found, search through the project to find the new path)

With those steps, the only place it can fail is where you move the raw asset file or rename it. In this case, the editor (or whatever tool you use for hot-reloading) would ask you to which prefab you want to save the converted data. In this case, you can either create a new prefab, or locate the one that was used before.

Asset versionning

With this kind of structure, it would be really easy for anyone to make an asset versionning system that would look like what cargo does.
You have a server containing all the version of your assets, and a .toml file with the list and version of each asset you want to pull.

When running your tool, it goes on the server and fetches the assets that need to be updated. It then runs "amethyst convert" to generate the prefabs and updates the ones that need to be.

Complex scenes

This is an example of a scene which contains a player.

Scene1.ron

#![enable(implicit_some)]
Prefab (
    id: "c022403bb2124babbb92cc29e6a043b7",

    // Scene is an Entity Prefab
    entities: [
        {
            named: "ui_hi_text",
            ui_text: UiText("Hi"),
            ui_transform: UiTransform,
        },
        {
            entity_prefab: EntityPrefab("e3bb4e44292246dab2f853cd3c3f6618")
        }
    ]
)

Player.ron

#![enable(implicit_some)]
Prefab (
    id: "e3bb4e44292246dab2f853cd3c3f6618",
    // Entity Prefab
    entities: [
        {
            named: "player",
            transform: Transform,
            spritesheet: SpritesheetPrefab("67b3f53b11b247c4a44a6fcb07df0bc2")
        },
    ]
)

PlayerSpritesheet.ron

#![enable(implicit_some)]
Prefab (
    id: "67b3f53b11b247c4a44a6fcb07df0bc2",
    asset: Spritesheet(
        size: (128,128),
        texture: ImagePrefab("e79a6d1c1e5341ea8872fb1ad975e15b")
        sprites: [
            {
                id: 0,
                offsets: (0.0,0.5),
                uv: (0.0,1.0),
            }
        ],
    ),
    // Engine built-in modding support of assets.
    overrides: [
        "8468c805b51c47baa624b7141e5bced9", // Game's basic player spritesheet
        "2635a167227d4a04b857a4c7b9706ca8", // MyFancyPlayer's modded player spritesheet
    ],
)

PlayerTexture.ron

#![enable(implicit_some)]
Prefab (
    id: "e79a6d1c1e5341ea8872fb1ad975e15b",
    asset: Image(
        size: (128,128),
        format: RGBA,
        data: [
            RGBA,
            RGBA,
            RGBA,
            RGBA,
            RGBA,
        ],
        // Optionnal options
        compression: None
    ),
    // You can have edit and runtime options here
    some_option: Something,
)

Clean separation like this of prefabs allow loading at any level, as well as runtime edits of any property. Here's some examples.

Load the scene (loads the player too).
Load only the player.
Load only the player's spritesheet.
Load only the player's spritesheet picture.
Apply a mask over the player's texture

Does that sound like something you usually do when using an engine editor? That's what I thought ;)

[Scene Editor/Prefab editor] Load the scene (loads the player too).
[Prefab Editor] Load only the player.
[Spritesheet editor] Load only the player's spritesheet.
[Image viewer/ Image editor] Load only the player's spritesheet picture.
[Custom made Image Prefab Editor] Apply a mask over the player's texture

Caching

As moxi said an annoying amount of times, a prefab is an asset.
Runtime caching is trivial: HashMap<Uuid,Handle>

Then, if you use the api to load the prefab from file, you can just make a simple HashMap<String, Uuid> and check if it is already cached or needs to be loaded.
Also with this you can easily preload prefabs during the game initialization, and then instantiate entity prefabs or access your asset prefabs at runtime without any loading delay.
(and if you are loading multiple giant maps, you'll want to manually remove them from the cache when changing map to save some ram)

Actually loading a prefab is really easy to do from code and it is already something we do. Also, instead of just loading from file, we should add an api to load it from its string representation (allowing mods to be loaded from the network). Since it is just data that is getting loaded, in theory it shouldn't cause remote code execution attacks (but it is the responsability of the user to not write systems vulnerable to fabricated data).

If your PrefabType happens to be PrefabHandle (generated by ScenePrefab, UiPrefab, etc...), then you can attach that onto an entity, and the prefab's components will get attached to the entity (cloned, not owned).

Modding

Asset modding is as simple as adding an "override" field in the prefab struct.

SEE PlayerSpritesheet.ron IN THE COMPLETE EXAMPLE.

Assets override made this way would be resolved at runtime.
Most of them can be resolved when the game loads and the rest of them can be resolved when a mod gets loaded.

If you override a modded spritesheet, like in this example, the engine should be able to calculate that this ^ one spritesheet has priority.
If there is a circular override (mod1 overwrites mod2, and mod2 overwrites the same file in mod1), the mods are simply incompatible. This should cause an error at the moment the second mod is added (the one closing the circular override loop).

End

If any of those concepts seems unclear, let me know and I will update it.

The prefab file examples are not accurate. The final data structure should contains the same data, but maybe in a different layout that is more usable/extensible.

Moxinilian · 2018-08-26T17:58:14Z

The data field would be extraordinarily slow. Like seriously super slow. This can't possibly be acceptable at runtime.
I don't see what issue your approach solves over @kabergstrom's proposal.
Also, your definition of a prefab is very unorthodox. I think there's an issue here, at least for clarity.

AnneKitsune · 2018-08-26T18:47:29Z

How would the data field be slower than the current solution? Both solution propose having the runtime representation as bytes. My issue solve the compatibility problem with users not using the editor, and diminishes the number of issues we'll be getting from the type system when we do make the editor.

Moxinilian · 2018-08-26T19:08:42Z

Regarding the slowness of the representation, what you propose implies copying assets all the time, which is not acceptable, and forces us to give up on fast OS-optimized loading of single files. Representing assets "as bytes" does not mean anything here, I am not sure if you fully understood the concept of a 3-step pipeline.

People not interested in advanced tooling seem to be content with the model currently in use, assuming it keeps being maintained and tweaked. Why do you want to replace it with something halfway between manual framework-like usage and complete tooling suite?

I don't understand what you are talking about regarding type system issues?

dotellie · 2018-08-26T21:40:52Z

@Jojolepro I have to say, at first glance, your proposal looked pretty nice in my eyes, but the more I think about it, the more issues I see. First you have what Moxi mentioned where you're effectively making a useless (from the user POV) copy of each file that takes up space and isn't viewable or editable. Then there's also the issue of version control which becomes essentially a nightmare because of all the bytes in the files.

Instead, provided we continue with the initially proposed solution, I would like to propose a change to the way the meta files are handled. I mentioned earlier that I think something similar to .git would be nice and this is essentially what this proposal is. In the root of an amethyst project, we put a .amethyst folder which is there for the sole purpose of keeping track of metadata. This could be essentially anything configured through tools but in this case, the important part is keeping track of assets. Instead of attempting to make the user aware of these meta files, we keep them hidden. This is because most of the time, you'll either be editing them through the editor in which case everything is taken care of for you or you'll have a script or something similar that can easily perform a lookup for the correct meta file. The main benefit here is that users simply don't have to see the clutter that meta files generate.

So what happens when a user tries to modify, rename or move a file then? Well, one important thing to keep in mind is that these actions almost never happen at the same time. This of course all assumes you have the editor open keeping track of files, but we could potentially get by even if the editor is closed. First, if you modify a file, we would still recognize it by the file path so that's pretty easy. Where it gets interesting is moving and renaming where I suggest keeping track of the files through md5sums. Most files are only a couple of MB's, so the performance hit is most likely gonna be minimal. For example, running time md5sum 120MBfile gives a total of only about 0.25s on a Ryzen 1700. Keep in mind that most project files are added gradually, not all at once. There is also the potential that the user moves a file but keeps the filename, in which case figuring out what happened is also quite easy. The problem is when we do actually lose track of files. This is probably because the user did something weird and I think the easiest way to resolve it is probably just to prompt them asking where a file went or what this new file is.

I'm tired right now and this comment is of shit quality so apologies for that. Feel free to flame me now. 😜

dotellie · 2018-08-27T11:46:30Z

(This is basically a slightly modified version of a message I wrote to Moxi)

I've done a litte looking around on the internet and I've found that the meta file solution is very similar to Unity's and my .amethyst solution is very similar to UE's. Looking around at pain points, I see that most people complain about what we have already discussed, that being either "meta files are annoying as all hell" or "references can get lost". I've also seen quite a few instances of "I have no idea what these .meta files are, so I just ignore them". Obviously Unity has quite a larger user base than UE so this might not be a completely fair comparison, but I would say that more people are very annoyed at meta files than people are complaining about lost references. What should also be noted here is that UE is doing an absolutely horrible job at trying to keep track of the files, but they still don't have too many that complain about it.

Also of note that's quite interesting, cryengine and lumberyard are both using aliased file paths for their assets, seemingly without meta files.

Some references (incomplete):
https://www.reddit.com/r/godot/comments/8m4cug/moving_from_unity_to_godot/dzl1e5q
https://www.quora.com/What-are-the-main-pros-and-cons-of-Unity-3D-and-Unreal-Engine#eiYZZl
https://forums.unrealengine.com/unreal-engine/feedback-for-epic/1438613-asset-management

AnneKitsune · 2018-08-27T12:36:29Z

Before being able to use the Asset Browser on an existing project you'll have to generate *.cryasset files

https://docs.aws.amazon.com/lumberyard/latest/userguide/asset-pipeline-intro.html

kabergstrom · 2018-08-28T16:11:41Z

@Jojolepro Thanks for the proposal!

Addressing Jojolepro's proposal

Meta data will be stored inside of the prefab files, instead of in separated .meta files.

If metadata is stored in the same file as an intermediate representation of a source asset, users will need to version control both the source asset and the intermediate representation. It also means that the intermediate representation is no longer a "pure function" of the metadata and source asset. The intermediate representation will need to be "patched" when source asset is rebuilt, as opposed to rebuilt from scratch, and the definition of input data (and its hash) will become blurred.

I also think it's very unadvisable to store inherently binary asset formats in plain text. The space amplification factor is high.

To give some context, medium-large sized 3D games usually are in the range of 30-300GB of source assets, the vast majority being non-human-readable assets like textures and meshes. Doing the math for how much redundant data will have to be version controlled and shared between team members for large projects quickly pushes us towards other approaches.

The intermediate representation can be thought of as .o files - an intermediate representation for the compilation of an asset. They are to be hidden from the view of users and presumable live in some sort of database/directory like the target directory for Rust.

..[Loading asset prefab .ron files at runtime]
... Doing it this way allow to now have any conversion to do at runtime to load the asset into the engine, without having the need for complex build pipelines.

When I mention "runtime format" in my proposal, I mean a format that is as optimized as possible for loading into the engine for the target platform. In the best case this is a memcpy - the data is already laid out the same as structs in memory. I also mean that the build artifact can be adapted for the platform - perhaps a certain texture compression is really fast on a specific console, or you want to reduce texture resolution on phones. You don't want to be compressing loaded assets at runtime, this is a build-time thing and should be converted for the build artifact.

To use a compiler analogue, the build artifact is the executable binary, or shared library. Individual build artifacts can be considered binary blobs, and will need some sort of container or other way of maintaining metadata for actual loading, so they will generally not be handled by users individually. One implementation can be for all build artifacts to be packed into a file with a header containing metadata for all build artifacts.

Asset versioning and complex scenes

Indeed, this is why AssetIDs are so powerful. When I say "edit-time representation" I mean essentially what you have in your .ron files, but in a serialization format that is optimized for loading and not human-readable. For assets like prefabs, there still will be a human-readable representation since persisting (saving) an asset will turn it into a source asset file - serializing it into a format that is appropriate for version control merges etc.

Caching

I think it's implied in my proposal that loaded assets are re-used if referenced multiple times. I didn't write much about the concrete implementation of asset lifetimes, but basically my preference for API is reference counting per AssetID, then dependencies to the referenced AssetID are loaded/unloaded automatically.

There can be a multitude of ways of looking up an AssetID with pluggable implementations, as I mentioned in the proposal. Also pluggable is the mapping from AssetID to build artifact ([u8] blob) for loading the actual asset - as you mention, from file or network both work, as does shared memory or any RPC framework.

Modding

Modding can be implemented by loading a different build artifact for an AssetID. By making the mapping from AssetID to build artifact pluggable, you can write your own implementation. For example, checking a list of directories in some order of priority for a specific filename, loading the first one found. This way, you don't have to specify overrides in the original asset, this is done through some logic in the AssetID->Build artifact resolver code, allowing you to override any assets you want to.

Addressing .meta file user experience concerns

Thanks to @magnonellie for the research and persistence regarding this issue!

I do agree that the user experience is essential, and especially adoption is #1 for an free, open-source project.

I propose that file metadata is hidden from the user by default, and that users need to actively activate it to become compatible with version control software. This is based on the observation that users usually start a project on their own and then later start working in groups once they have more experience. Thus the initial experience should cause as little confusion as possible.

So how do we achieve hidden metadata? I have two options:

Hidden .meta files.
Works the same as the original proposal, but use OS-specific methods for hiding files. On Windows, this is the hidden attribute. On Linux, this means using a dot prefix for the filename. The asset management system will use file system events, file signatures and timestamps to maintain metadata for files when users are not using built-in tools for moving assets.
Centralized metadata storage.
Metadata is stored in a database in some project directory that is not to be used by users (the directory that will be used by a number of other systems for maintaining metadata indexes, asset management database etc).

I think ##1 is better than ##2 because a user might delete the "hidden project folder" containing metadata databases, build artifacts etc, but we can still reconstruct the users entire project from only the asset directory and its files + meta files. The implementation will also be the same for hidden and visible, only difference is the hidden attribute.

Note that for all these three options, the serialized data and API will be identical. The only difference is the user experience and representation on disk.

norman784 · 2018-09-03T13:21:04Z

I agree that .meta files are annoying but they are only visible in the OS file manager and not in the editor, and to me the unity approach is the easiest way to do it, just have the .meta files with the file, and have a process that clean up orphan .meta files (I'm not sure if unity has one), because if is as you pointed @kabergstrom (an important piece of the engine because the AssetID) then its important to remove unused.

I've another approach that can be used in parallel with this (because will not work with dynamic content, i.e. mods), and its to use something like the class/struct R from android where all resources are hardcoded into a class, then if you want to access to an image called player then you access it R.Image.Player, I like this approach because you can see at compile time the errors (if something is missing or broken, and not at runtime).

Off course we need to ensure that you can load assets just from strings (paths or AssetID), then you will have 2 methods to load the resources (safe way)

// Autogenerated struct
struct R {
  struct Image {
    static let Player = 0x0001232; // this can be obtained from the .meta file
  }
}

let safe_resource = load_resource(R.Image.Player);
let unsafe_resource = load_resource("f3394fasdxcgdjijpsacsacsadsa90r3");

Strings (or slice of strings) are evil, I think, and must be avoided whenever if possible, at least is what I've heard (because strings are not performant as int, byte, etc).

PD: Hit me with a club if the pseudo code is not entirely rust. :)

kabergstrom · 2018-09-03T13:56:14Z

@norman784 Thanks for the suggestion!

I think it'll be pretty easy to generate a Rust file with AssetIDs as constants for usage with loading functions once there's a central database of all assets. Good idea!

AnneKitsune · 2018-09-17T18:30:40Z

Okay so I didn't get any ideas to not have meta files and not copy the whole asset into a prefab. 👍 if you want to do it using meta files. The only caveat is that

The user can choose to not use the asset pipeline at all and continue to reference files directly
The user can decide to use the asset pipeline for some files and not for other.
Meta files are only created for assets that are in the asset pipeline.

Does that sound reasonable?

kabergstrom · 2018-09-17T21:24:30Z

Sounds good!

This will be possible by using the importer, builder and loader within the runtime, loading contents from file. I suppose references to other assets can be defined as either an assetID or a path, or whatever else. The reference is then resolved during loading using the appropriate resolver.
With "asset pipeline" I suppose you mean running the .meta file generator/maintainer, importer and builder when the game is not running. This is fine - it just means the asset will not have a stable assetID and the source file parsing will need to be done at runtime.
In the implementation I have in mind, you'll be able to define a number of asset directories where .meta files will be generated and maintained.

randomPoison · 2018-09-21T17:51:00Z

Overall I think this proposal is really great, thanks for writing all of it up! I'm really a fan of the deterministic build steps and multiple stages of asset processing. My primary concern is the meta files and how we'll manage them (seems like I'm not alone in this, though). It may be the case that storing a meta file next to the source asset is the best approach, however years of working with Unity has made me pretty wary of the problems it can cause. The alternate approach that I'm personally leaning towards is having all the meta files grouped in a separate folder, but as noted that approach has its own issues.

I was originally going to write up a whole big thing comparing the two, but then I noticed that @kabergstrom suggested that we could make this an option in the project settings. Long term I don't think it makes sense for this to be user-configurable, but I do think it's a good idea to implement both these approaches so we can try them out in practice. Most of the asset management logic is going to be the same regardless of where we stick the meta files, so I don't think we're risking a ton of duplicate work to try out both approaches.

So what I'd propose for now is:

We use meta files as proposed by @kabergstrom.
We implement support for both keeping meta files adjacent to source files, as well as sticking all meta files in one folder in the project workspace.
Make the approach used be user-configurable (ideally including tooling to switch between the two approaches at any time).
Eat that dog food until we know which one tastes better 🐶

Once we get a clear winner, we can deprecate/remove the other approach. Seem reasonable?

AnneKitsune · 2018-09-22T02:35:44Z

Waiting for the PR 👌 :D

torkleyy · 2018-10-02T13:30:38Z

Writing here what I already wrote on Discord: I approve of this PR, but I'd like to do the transition in small steps as much as possible. Therefore, it would be nice if we could split up all the great improvements listed in this issue so that we can implement them with separate PRs.

Moxinilian · 2018-10-02T13:31:59Z

Splitting is easy by design because this RFC is more about tools than actual changes to the engine.

torkleyy · 2018-10-02T13:33:57Z

Yep, just saying :) If we leave this as one big RFC it's a bit unfriendly to potential contributors.

zakarumych · 2018-10-26T12:03:16Z

I like the idea overall.

But I don't really like using UUID. Especially since it will be used in config/prefab files.
Seeing in log that prefab e3bb4e44292246dab2f853cd3c3f6618 can't be instantiated because mesh f3394fasdxcgdjijpsacsacsadsa90r3 is not found I will be like (╯°□°）╯彡 ┻━┻

I propose to use URI for addressing assets.
URN will be serve very similar purpose as UUID but instead of f3394fasdxcgdjijpsacsacsadsa90r3 it can look like urn:amethyst-asset:mesh.weapon.bazooka.

No. mesh.weapon.bazooka is not the path. Resolver for amethyst-asset namespace will produce URL that can be used to access the asset.
While approaches are similar I see a few advantages:

Human readable asset identifier. Identifier can have meaningful semantics.
Possibility to have multiple namespaces with different rules. Another namespace may contain sha256 of the asset data to be persistent (changed asset is another asset)
If asset manager takes URI user can provide and URL to access the asset data.

kabergstrom · 2018-10-26T14:54:46Z

UUIDs will need metadata and tooling to be usable for humans, I think this is unavoidable. I do believe the final user experience will be superior. It's possible for metadata to be used in the impl Display to resolve UUID to a human-readable form when printing.

I see your point though, human-readable identifiers can have some value. It's not impossible for us to have a human-readable, unique alias for each assetID that is generated based on the asset contents when the assetID is generated. I'm not sure I'd make it a first-class option though, as an alias would require dynamic allocations whereas a 16-byte UUID can impl Copy without allocation. Especially when considering FFI.

Another good point is that an URI would need some type of UUID proxy to be resolved if asset loading only allowed UUIDs, which may be undesirable from a UX point of view. It would be nice if URIs could be resolved to asset blobs directly.

Anyway, the concept of an AssetID is only there to resolve it into an actual loadable binary blob. So I propose a collection of types: AssetUUID (16-byte UUID, Copy) and AssetID (enum, Clone + !Copy: AssetUUID, File path, URL, URN..). Resolver implementations can be made for each type as necessary.

berkus · 2018-10-27T16:04:32Z

You could use composite name like uuid:human-readable-part

E.g. e3bb4e44292246dab2f853cd3c3f6618:mesh.weapon.bazooka.
Tooling will help generate these, a bit of parsing magic will help have meaningful error messages like
prefab bazooka.bill (uuid e3bb4e44292246dab2f853cd3c3f6618) can't be instantiated because mesh mesh.weapon.bazooka (uuid f3394fasdxcgdjijpsacsacsadsa90r3) is not found - this can be done easily from the above composite name.

Obviously, you don't need any resolvers, because lookup is still done via uuid. Text part is just a human-readable comment.

fhaynes · 2019-01-08T02:49:17Z

Transferring this to the RFC repo.

LucioFranco changed the title ~~[Discussion] Asset Management and Pipeline~~ [RFC] Asset Management and Pipeline Aug 23, 2018

fhaynes transferred this issue from amethyst/amethyst Jan 8, 2019

[RFC] Asset Management and Pipeline #11

[RFC] Asset Management and Pipeline #11

Comments

kabergstrom commented Aug 13, 2018

Background & Problem Statements

Asset lifecycle

Build scalability through "pure functional asset pipelines"

Concurrent Modifications

Hot reload

Search and query

Asset identifiers and Renaming or Moving

Asset Versioning or Version Upgrade

AnneKitsune commented Aug 14, 2018

kabergstrom commented Aug 14, 2018 • edited

AnneKitsune commented Aug 14, 2018

kabergstrom commented Aug 14, 2018 • edited

AnneKitsune commented Aug 14, 2018

kabergstrom commented Aug 17, 2018 • edited

Terminology

Project asset directory

Asset identification and rename/move handling

Asset stages

Import

Tool asset format

Asset builder & engine asset format

Asset references & loading dependencies

Asset management

Search

Build artifact management

Hot reloading

Moxinilian commented Aug 18, 2018

dotellie commented Aug 25, 2018

AnneKitsune commented Aug 25, 2018

AnneKitsune commented Aug 25, 2018

Moxinilian commented Aug 25, 2018

kabergstrom commented Aug 25, 2018 • edited

dotellie commented Aug 25, 2018

kabergstrom commented Aug 26, 2018 • edited

kabergstrom commented Aug 26, 2018

Moxinilian commented Aug 26, 2018 • edited

AnneKitsune commented Aug 26, 2018

Moxinilian commented Aug 26, 2018

kabergstrom commented Aug 26, 2018

AnneKitsune commented Aug 26, 2018

AnneKitsune commented Aug 26, 2018

Prefab refactor proposal / Asset loading additions

Raw assets vs prefabs

Meta data

Raw asset to AssetPrefab

Asset versionning

Complex scenes

Caching

Modding

End

Moxinilian commented Aug 26, 2018 • edited

AnneKitsune commented Aug 26, 2018

Moxinilian commented Aug 26, 2018 • edited

dotellie commented Aug 26, 2018

dotellie commented Aug 27, 2018

AnneKitsune commented Aug 27, 2018

kabergstrom commented Aug 28, 2018 • edited

Addressing Jojolepro's proposal

Addressing .meta file user experience concerns

norman784 commented Sep 3, 2018

kabergstrom commented Sep 3, 2018

AnneKitsune commented Sep 17, 2018

kabergstrom commented Sep 17, 2018

randomPoison commented Sep 21, 2018

AnneKitsune commented Sep 22, 2018

torkleyy commented Oct 2, 2018

Moxinilian commented Oct 2, 2018 via email

torkleyy commented Oct 2, 2018

zakarumych commented Oct 26, 2018

kabergstrom commented Oct 26, 2018

berkus commented Oct 27, 2018 • edited

fhaynes commented Jan 8, 2019

kabergstrom commented Aug 14, 2018 •

edited

kabergstrom commented Aug 14, 2018 •

edited

kabergstrom commented Aug 17, 2018 •

edited

kabergstrom commented Aug 25, 2018 •

edited

kabergstrom commented Aug 26, 2018 •

edited

Moxinilian commented Aug 26, 2018 •

edited

Moxinilian commented Aug 26, 2018 •

edited

Moxinilian commented Aug 26, 2018 •

edited

kabergstrom commented Aug 28, 2018 •

edited

berkus commented Oct 27, 2018 •

edited