Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Overhaul for OpenSeadragon (reviewed). #2407

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

Aiosa
Copy link
Contributor

@Aiosa Aiosa commented Sep 8, 2023

Second attempt to redesign OSD cache. Follows up on #2329.

Areas to discuss are marked with FIXME comment - please, let's discuss these issues.

Why do we need this?

Extending abilities of OpenSeadragon require more flexible system to work with data. This overhaul fixes many issues, bugs and API inconsistencies.

What is solved

  • inconsitent TileSource data management: newly, tiles can own multiple caches, multiple plugins can affect tile data - tile sources should only govern how the data is retrieved, not managed - depreacated tile cache life cycle handling methods
  • automatic support for data type conversions: drawers in the future will require complex interactions between available sources and the system: out of the box, the system 'jsut works' (though it might not be as performant as one would wish, but there is API to make this as performant as possible - depends on the developer effort)
  • deprecated getContext2D, destroy for TiledImage - hacky way for imagetilesource to work in older versions of OSD, the tilesource has been rewritten
  • deprecated tile.context2D - hacky way of rendering, storing evergrowing amount of data in the browser memory
  • fixed docs, syntax bugs and test behavior along the way

What needs to be solved (apart from FIXME)

  • add tests (once the design is approved)
  • missing support for shared cache between viewers
    • does not work with major changes, maybe different PR, maybe do not support
  • asynchronous data type conversion
  • it is now impossible for the tile not to use cache, do we want this feature?
    • IMHO, most developers incorrectly went with context2D which was a performant, but memory-expensive choice, one should increase cache size instead of hardcoding data
  • how much data is inside a cache object and whether to cache also conversion mid-results
  • thoroughly profile on a demo that cache does not leak (and all references are properly removed)
  • finalize data-tile rendering pipeline (works, but with design flaws)
  • wait for the WebGL PR and integrate drawers
    • drawers must respect tile cache API as well - they must declare what type they can render, they must delay tile rendering when the cache is in a loading state
  • profile OSD rendering with the old cache and compare to the new implementation
  • finalize the design
    • the design must naturally lead developers to a correct usage
    • todo: should cache own its data similarly to how C++ uses unique / shared pointers?

@Aiosa
Copy link
Contributor Author

Aiosa commented Sep 8, 2023

Follow up on #2329 (comment):

  1. Simple path: nothing to write - works out of the box

  2. The 10% red overlay.

this.viewer.addHandler("tile-loaded", (event) => {
    //simple :)
    const context2D = event.tile.getData("context2d");
    const canvas = event.tile.getData("canvas"); //we could do it directly, but let's flex our API ;)
    // context2D... set color ... draw rect canvas width height ...
    event.tile.save(); //now just sets _needsDraw = true;
});
  1. Each tile's "ultimate data" is a 50% merge of two separate JPGs loaded from server (suppose equal size).
    then we can a):
downloadTileStart: function(imageJob) {
    let context = imageJob.userData;
    context.images = [new Image(), new Image()];


    let count = 0;
    for (let image of context.images) {
        image.onerror = image.onabort = function() {
            count = -5; //prevent onload
            imageJob.fail("Failed to parse tile data as an Image", context.promise);
        };
        image.onload = function() {
            if (count++ === 0) {
                const canvas = document.createElement('canvas');
                canvas.width = image.width;
                canvas.height = image.height;
                context.canvas = canvas.getContext('2d');
                context.drawImage(image, image.width, image.height);
                return;
            }
            const result = this.context.canvas;
            result.globalAlpha = 0.5;
            result.drawImage(image, image.width, image.height);

            delete context.canvas;
            imageJob.finish(result, context.promise, 'contex2d'); //recognized automatically, we could leave the type out
        };
    }

    function getImage(img, url) {
        //bit ugly since we keep just one...
        context.promise = fetch(url, {
            method: "GET",
            mode: 'cors',
            cache: 'no-cache',
            credentials: 'same-origin',
            headers: imageJob.ajaxHeaders || {},
            body: null
        }).then(blob => {
            img.src = URL.createObjectURL(blob);
        }).catch(e => {
            img.onerror(e);
        });
    }
    
    getImage(context.images[0], imageJob.firstUrl);
    getImage(context.images[1], imageJob.secondUrl);
}

or b) if we suppose the data was not sent out in an incompatible format:

... (see above)

        imageJob.fail("Failed to parse tile data as an Image", context.promise);
    };
    image.onload = function() {
        if (count++ === 0) return;

        //if we do not set type, the resulting type is: OpenSeadragon.DataTypeConvertor.guessType(context.images)
        //which will result in something like 'Array [image]' which is a bit too generic, since our type is 
        // exactly two images,  returned by exactly this tilesource
        imageJob.finish(context.images, context.promise, "aiosa two images");
    };
}

...

OpenSeadragon.convertor.learn("aiosa two images", "context2D", data => {
    const canvas = document.createElement( 'canvas' ),
        image1 = data[0], image2 = data[2], context = canvas.getContext('2d');
    //context.drawImage ...etc
    return context;
});

or c):

this.viewer.addHandler("tile-loaded", (event) => {
    //incompatible data type within cache still valid - until the tile-loaded handler is over
    const data = event.tile.getData(); //"aiosa two images" is default, _unless_ some other plugin modified the data before us
    const canvas = document.createElement( 'canvas' ),
        image1 = data[0], image2 = data[2], context = canvas.getContext('2d');
    //context.drawImage ...etc

    // then we cannot actually call save since we need to update the cache! system does not know "aiosa two images" type
    event.tile.setData(context, 'context2d'); //or leave the type out, can be detected by the system
});

  1. Generative data, e.g. Mandlebrot... How do the new changes affect your earlier example?

Simplified! Almost no changes, except:
createTileCache* functions are deprecated, no implementation needed since context2D data is returned, for example the function from the docs below will get implemented for us implicitly:

getTileCacheDataAsImage: function() {
     // not implementing all the features brings limitations to the
     // system, namely tile.getImage() will not work and also
     // html-based drawing approach will not work
     throw "Lazy to implement";
 },

downloadTileStart now distinguises finish and fail (fail does not happen with generated data so no change)

  1. Tiles that change over time, like fading from original JPG to black and back again.
this.viewer.addHandler("tile-loaded", (event) => {
    //touching event.data directly is now not really a good idea, since we loose all the API.
    //we could set event.tile.myData = event.data and have 'dirty' access anytime, lets use cache instead (or, we would
    // repeat stuff covered in earlier examples)

    //let's work more closely with the cache api
    const tile = event.tile;
    //get the original data
    const cache = tile.getCache(tile.cacheKey);
    //we can now set the data directly to the cache, but it might not get freed by GC -> cache.destroy() is hardcoded
    // cache.myData = event.data;

    //instead, third option: use multiple caches per tile! note that we are 100% type safe (though not as optimal as 
    // possible in corner cases, unless we allow getData to specify a range of possible types we accept - optimization, 
    // see FIXME comment in the convertor)

    //switch to a new key: allow the original data to be used by somebody else...
    tile.originalCacheKey = tile.cacheKey;
    tile.cacheKey = `${tile.cacheKey}-opacity-change`;
    tile.setCache(tile.cacheKey, cache.getData("context2d")); //we will use this to render

    //if someone after us would use the cache, they can use the tile regardless of what we do here - they just get some
    //data to be rendered...
});

const tiledImage = ...; //our reference
setInterval(() => {
    tiledImage.lastDrawn.forEach(tile => {
        //we should be guaranteed to have cache loaded, as the cache is removed when a tile gets unloaded

        //internally, *-opacity-change is used as the default data
        const data = tile.getData("context2d");
        //we have two choices: we access the data as-is and have optimal time, or we require "image"
        // -> we could force the system into conversion, but we are guaranteed to get an image object
        const origData = tile.getCache(tile.originalCacheKey).getData("image");
        data.drawImage(origData, origData.width, origData.height);
        tile.save();
    });
}, 500);
  1. Two plugins; one colors the tile 10% red, and the other inverts it. Make sure they work together and they are happening in the right order (how do we control that?).

I think it is obvious from the previous example, that we can modify the data regardless of what other plugins do, if we use
the API correctly. To control the order, OSD already supports event priority (see https://openseadragon.github.io/docs/OpenSeadragon.EventSource.html#addHandler) - my another PR.

@Aiosa
Copy link
Contributor Author

Aiosa commented Sep 9, 2023

I was thinking about the asynchronous data processing, and IMHO we would transitively make everything async. Insead, we could adopt Maybe data type, and allow users to process data in synchronous way, unless the data is unavailable.

Tile data access could opt for callback when the data is ready. Tile drawing would inspect whether the tile data (wrapped as Maybe) is ready and

  • draw tile if data available
  • skip drawing and set _needsDraw=true (~ call tile.save(), hmm... rename to .invalidate()...)

Example:

//possibly provide helper methods to do the checks automatically, provide something like `Promise.all()`
viewer.addHandler('tile-loaded', (e) => {
  //here cache is used - raw API example, we could simplify this by wrapping the result of tile.getData(...)
  const cache = e.tile.getCache();
  if (cache.ready) {
    //we can work with the data
  } else {
    cache.onload = () => {
      //...
    };
  }
});

Advantage: flexible. Disadvantage: more complicated, more overhead. For example, we need to provide helper functions for making sure multiple data sources are ready at once, etc... possibly we could behind the scenes wrap everything in Promises and execute everything in the right order when data is available. E.g.:

tile.getDataset((item1, item2, item3) => {
     //data is ready
}, tile.cacheKey, 'context2d', 'customKey2', 'image', 'customKey3', 'some other type'); 

Another approach is to split the data operation into two phases: before and during tile-loaded and afterwards. Data use on tile-loaded event could use async processing, we would extend raise event to await promise handlers in the execution (I already made PR containing the implementation), but afterwards, only synchronous conversions would be allowed. Advantage: cleaner API, clean distinction. Disadvantage: less flexibility.


Test were OK on my localhost, but failed here. I noticed that sometimes the navigator did not refresh and I had to call viewer.navigator.world.draw(), other than that, this implementation should be more or less functional. The performance is another question and I would like to also rewrite the tile deletion process using the binary heap implementation, which would not loop through all cached tiles, but simply call heap.remove() in O(log n). Moreover, consider cache removal as an order of tiles, not caches. If tile gets unloaded, it should consider it's access time, since tile that owns multiple caches would be removed more frequently.

@iangilman
Copy link
Member

This is awesome, and a lot to absorb! I am going to be busy for a bit upcoming, so I'm not going to be able to get to this right away, but I will! Meanwhile, I'm hoping @pearcetm can I help as a sounding board, since he's got the other major patch in the works, and there's probably some touch points between them.

Thank you, @Aiosa, for tackling this!

@Aiosa
Copy link
Contributor Author

Aiosa commented Sep 24, 2023

I moved on with the async decision once thinking this through several times. tile-loaded event is now asynchronous, meaning instead of ugly getCompletionCallback (deprecated), you can simply perform async stuff or resolve a promise once you are done. The thing is that the old approach had a race condition here when multiple plugins would touch the tile, now with strict execution order and event priority capabilities it is possible to do these things in a stable way.

The cache object follows the 'maybe' type, meaning the cache might be loaded (cache.data) or not ('cache.loaded=false'), then something like 'cache.getData().then(...)' can be done to do stuff once the data is ready. Supports both sync and async approach. Drawers can just access the data immediatelly and when not ready, delay the rendering.

Furthermore, tile has new originalCacheKey prop that allows you to keep the original cache data in a system-defined way in case you modify your data somehow:

  • plugins can uniformly access original data
  • system can create tiles without server access relying on the original data presence
    if (!tile.loaded && !tile.loading) {
  • originalCacheKey != cacheKey only when the data contents is modified, not necessarily the data type
    • no overhead, single cache in most cases, and if you modify tile data, you usually want to keep the original data anyway
  • will have to stress this convention in the docs and guidelines, but it is the most straigforward, simplest way of keeping all stuff compatible

The downside is the missing support for async and await, which makes the implementation a little promise hell. Would be more readable otherwise.

@Aiosa
Copy link
Contributor Author

Aiosa commented Sep 24, 2023

I also added default timeout for QUnit of 60s, debugging is a drag if a test does not time out - you cannot rerun it in a separate window, since it shows rerun link only when failed.

Copy link
Member

@iangilman iangilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm back! Still not entirely sure how to approach this PR. I'll need to read through it again. Here are a few random comments, though.

@Aiosa What do you consider left to be done here? Are there open questions we should discuss?

We'll want a new documentation page on how this system works and how to use it. That would be helpful for me in reviewing the code as well.

Thank you for taking this project on!

* Unique identifier (unlike toString.call(x)) to be guessed
* from the data value
*
* @function uniqueType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here in the function documentation doesn't agree with the name of the actual function.

* @param {Object} eventArgs - Event-specific data.
* @return {OpenSeadragon.Promise|undefined} - Promise resolved upon the event completion.
*/
raiseEventAwaiting: function ( eventName, eventArgs ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an elegant solution to adding async events!

@@ -2886,6 +2886,30 @@ function OpenSeadragon( options ){
}
}

/**
* Promise proxy in OpenSeadragon, can be removed once IE11 support is dropped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE11 support is already dropped as of #2300!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an old comment from a piece of code I had implemented locally, and forgot to remove. Since we don't have polyfils I thought this might be a good error-cacher, if someone has old version of some JS framework/browser. I can remove it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it. I suppose it could be amusing to have some sort of "you're on an old browser" detection, but if we add that I think it should be explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I've found out why I left it there:

332:58 error Promise.resolve() is not supported in op_mini all compat/compat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's just Opera Mini? We don't explicitly support that browser, but I suppose if it's easy we might as well. It would be good to add an explicit comment that that's why this is here, then.

return data;
},
getCompletionCallback: function () {
$.console.error("[tile-loaded] getCompletionCallback is not supported: it is compulsory to handle the event with async functions if applicable.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a deprecation; this is complete removal. Would it be possible to deprecate it (have it still work but produce an error) for a few releases before we remove it, or is that not going to work because of how things are changing?

I'm concerned about the new version of OSD breaking a bunch of plugins. Of course some of the popular plugins that would be affected (like https://github.com/usnistgov/OpenSeadragonFiltering) aren't in active development, so if we are removing this API eventually, we may need to fork them and fix them ourselves. Regardless, no doubt people's apps use this API and it would be nice to give them a grace period if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was impossible (in a nice way) since handling of this was fundamentally different from how it works now. Since its behavior was also erroneous in certain use cases, I decided for removal. I will try to look into it once again, maybe introduce switch flag that would enable the new feature and later turn that flag on by default.

As for plugins, I wanted to fork the filtering plugin and fix it to move from all the old 'dirty' ways (e.g. context2D prop). Doing so would also allow me to analyze the new system performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. It would be nice to have such a flag for a release or two if possible, for people's own projects. For plugins I agree it would be great to fork the filtering plugin and bring it up to date if you're up for it!

Copy link
Contributor Author

@Aiosa Aiosa Nov 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I remembered it wrongly, the support for it can be left there, I just did not like it since the async way is much cleaner code, and not buggy. But I added it (locally for now). Also:

https://github.com/Aiosa/OpenSeadragonFiltering/blob/f1e350c08e8bf050fdc84b89e9b04ad041c861b9/openseadragon-filtering.js#L106

Even in this popular plugin it's used incorrectly. If you get a callback, you should call it, otherwise the pipeline will never finish. I mean right, they probably reset the whole tiled image and want to prevent processing things that were loaded before the reset, but I am not sure what happens with the unfinished call... 'leaks?' hanging pieces of memory, closures, or will it get freed, dereferenced? Since I am not sure I guess nobody really is.

BTW It also seems the zombie cache will be much of use here:
https://github.com/Aiosa/OpenSeadragonFiltering/blob/f1e350c08e8bf050fdc84b89e9b04ad041c861b9/openseadragon-filtering.js#L195

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! And yeah, I agree it needs to be moved away from. I just want us to do it more gradually so people have a chance to update with warning :)

@Aiosa
Copy link
Contributor Author

Aiosa commented Nov 10, 2023

@Aiosa What do you consider left to be done here? Are there open questions we should discuss?

Well, the docs, trying out how it performs/behaves (e.g. re-implementing the Filtering plugin) and possibly adding better cache handling/ordering using smarter file structures for cache expiration (instead of iterating the whole cache array every time) and see how that performs (the cache system already implements a heap so why not using it).

@iangilman
Copy link
Member

Cool, sounds like a good list!

@Aiosa
Copy link
Contributor Author

Aiosa commented Nov 17, 2023

And also write tests. It will still take some time to finish, here is a small demo of the cache zombie feature.
ezgif-4-28c98e099e

@Aiosa
Copy link
Contributor Author

Aiosa commented Nov 18, 2023

I am facing a (probably final) design issue I would like to resolve.

The problem is that as of now, the conversion automatically throws away references to the old data in its cache. This might be problemtic with user-interventions, and cache re-use. It all boils down to the tile-loaded event which is as of now inconsistent.

Suppose that we fetch na image. We use canvas so that tile cache data is rewritten with context2d reference. When a new tile has to be loaded for which we find an existing cache record (or revive zombie), we might find tile ready for rendering. Cool.
However, the tile-loaded event is fired with a different input type than it was when the data was loaded for the first time!
I think in OSD this is not an issue, but user might

  • arrive at unexpected type within the same tile event (not really issue functionally-wise, if API used correctly, no errors, but:)
  • do unnecessary work/conversion

at worst, context2d data will be asked for as image, converted to canvas by canvas API, the the renderer will have to once more ask for canvas -> context2d conversion. This furthermore creates one tricky issue: drawer imposed data type requirements on a cache, thus the cache record type is context2d. If we finish the event with canvas type (given existing cache or zombie), we try to push this data to
a cache record of distinct type (if it already exists). As of now, such attempt will ignore the new data: it will add the tile to cache record and use existing data instead (it stores context2d!). Meaning all the work done was pointless. Note that this should be consistent behavior, since the cache key is equal, meaning the data already stored is the correct data.

This brings me to another related problem: we have two types: context2d and canvas, we might ping-pong these type conversions from one to other as I for example usually think about these two intechangeably, and better would be just having one type.
Extending on this: users might be tempted to do something like

...addHandler('tile-drawing', e => {
      e.tile.getData("image").then(...);
});

which would fire image <-> canvas conversions for each tile per frame. But that's probably just the cost of such flexibility. My point was that even e.tile.getData("canvas").then(...); would trigger conversions - which would be not so intuitive (given we use the canvas drawer).

Finally, asynchronous type conversion within a synchronous event (tile-drawing) might be problematic. It means the drawer must ensure the data is available also after the event (not yet implemented as webgl PR will change drawers), and delay rendering otherwise. I am myself not sure how it will behave, but type conversion + animation seems to me like problematic point.

Solutions / Ideas

Intentionally drop support for context2d type.

To cache on the data type level seems to me like unecessary overhead.

Keeping always the very first data type cached explicitly, running the tile data initialization from the beginning, and somehow solve cache write collision differently (otherwise, the result of the whole execution is pointless).

Not raising the event when the cache was already initialized (my choice). But the user might expect the event being called for all tiles, since not all tiles sharing the data share also their position (though IMHO, then if this is issue they should have different cache key).

Checking that no type conversion is done in tile-drawing seems like a good idea. But there can be use-cases when we really want to do this.


This overhaul does not introduce these issues - they've always been there. But being forced to do manual conversion made a developer relize the consequences. Having simple API might make people write super slow rendering (tile-drawing + conversion). Ideas?

… called.

Fix bugs (zombie was disabled on item replace, fix zombie cache system by separating to its own cache array). Fix CacheRecord destructor & dijkstra. Deduce cache only from originalCacheKey. Force explicit type declaration with types on users.
… Return

deprecated support for getCompletionCallback. Turn on zombie cache if sources replaced & equal.
if (newIndex !== -1) {
queueItem.options.index = newIndex;
}
_this.world.removeItem(queueItem.options.replaceItem);
if (!replaced._zombieCache && replaced.source.equals(queueItem.tileSource)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice feature that we allow implicit usage of zombies in case the sources equal & tile is being replaced. Zombies used automatically when suitable. The problem is equals implementation accross all sources. For now I just check base URLs but query params might vary a lot. Maybe try generating tile URL and copare these URLS? But how do we know particular tile exists? Might not know even the equals function itself..

@iangilman
Copy link
Member

I don't think I fully understand the scenario you're describing, but I do have a couple of quick impressions:

  • What's the advantage to having both context2d and canvas as different types? It doesn't seem necessary to me.
  • Even if we still support tile-drawing events, nothing should be allowed to delay them. I don't see any problem with people doing asynchronous conversion in tile-drawing events; the result just won't be ready until some future draw. As always, the OSD philosophy is "you draw with the data you have, not the data you wish you had".

As for the bigger question, would it be solved by allowing the system to save intermediate steps? I agree that it probably shouldn't by default, but maybe that could be an option? Maybe by default you always go back to the original when modifying?

@Aiosa
Copy link
Contributor Author

Aiosa commented Nov 21, 2023

The advantage would be users being able to use various types - handiness. And it might as well introduce these re-conversion issues. I don't know which one is better. In the end, the same code gets called anyway (either automatically or manually), the difference is that conversion process is asynchronous.

you draw with the data you have, not the data you wish you had

Yes, after thinking about it, I think the ebst option is to change the tile API a bit to not to change the tile cache immediatelly, but only when user explicitly tells it to. E.g. tile.getData() will return a copy of the data (or reference if no conversion happens), and changes the tile cache type once when user explicitly asks it to (~save). We might as well add requirement to implement copy pattern, e.g. even equal type data access creates a copy by default=!tile.loading to avoid artifacts during rendering?

This will help since renderer will happily draw old data untill new data arrives. We might as well either force users to return drawer-compytible type (or ensure this automatically) so that when user finishes, the conversion to something drawer-compatible happens ASAP to not to hinder the animation.

Caching immediate steps on data-type level would be a bit harder to implement. For now, I am trying to design support for the most basic advanced scenario: tile will get cache + original cache key (these equal by default), and first change to the cache data will ensure the original data is preserved. Then, if a tile is loaded to the system and cache exists (zombie or re-use), depending on cache it uses it can

  • cache from cacheKey: does not call tile-loaded since we found cache data
  • cache from originalCacheKey: call event if these keys differ - meaning it stores some first version of data and can re-execute the pipeline.

When users decide not to keep the old data (now the default option) and overwrite the cache itself (one per tile), then it makes no sense to re-execute tile-loaded over shared cache, because the cache state will differ with each repeated call.

When users say 'keep the original data when calling setData) new tiles (as of now, zombies erase this info explicitly but don't have to) might not know the changed cacheKey so that they would always re-execute the pipeline, which might be desired behavior (depending on the design choices, now it does not make sense, see below). Then

  • change cacheKey = "__mod__" + cacheKey so that modified cache key is deterministic [current behavior]
    • new tiles will know which key to access to check for modified data presence
    • downside: might interfere with custom changes of this process, but IMHO this is such a generic thing everyone would want to do that we can just recommend not to touch these and create different custom cache objects for additional usecases
  • change cacheKey = <random> so that modified cache is never shared and tile-loaded event can be executed
    • otherwise, uses would re-execute their pipeline with originalCacheKey data, modify it
    • attemtp to add cache of a deterministic changed key "__mod__" + cacheKey to the viewer cache
    • collision: the cache already exists and holds data:
      • the result of the new pipeline overwrites the data for all tiles sharing the object, requires different approach than what discussed above
      • the result of the new pipeline is discarded [current behavior]

Thinking of mandelbrot, a valid option could be also that tile has no cache objects (undefined, a shader renders the tile realtime on GPU, it needs only the tile position which is known to the drawer and should be known to the shader too).

This seems to me like workflow that works.

@iangilman
Copy link
Member

I'm still only partially following; this cache stuff is confusing to me! Anyway, what you've described sounds sensible.

Regarding the cache key naming, is that something that would be a convention and the developer would have to know what to do, or is it something that we enforce somehow? If it's a convention, what are the failure modes when they don't do it properly? How do we educate them to do it right?

The advantage would be users being able to use various types - handiness. And it might as well introduce these re-conversion issues. I don't know which one is better. In the end, the same code gets called anyway (either automatically or manually), the difference is that conversion process is asynchronous.

We're talking about context2d vs. canvas, right? It's the same data type, just different parts. We should just use context2d and if people need the canvas they can use context2d.canvas. Unless we're talking about the possibility of WebGL canvasses?

@Aiosa
Copy link
Contributor Author

Aiosa commented Nov 22, 2023

These cache naming things would be convention, but convention that is handled internally, they have two cache key props they can use. When they don't properly use it, the only thing that happens is that some tile might receive different cache object than they would expect it to and vice versa. Purely from the system point of view, as if they did not properly provide 'properly unique cache key getter'. So we educate them by using the cache API through the tile instance only. The only problem is that they could accidentaly use the same cache key as we automatically derive - there is a very low chance, and we as well might use some URL-unfriendly characters to ensure this does not happen.

Thanks for the last point! Indeed, canvas type is abiguous as it's context defines its behavior. Than it's clear.

@iangilman
Copy link
Member

Okay, sounds reasonable!

… in the rendering process, remove 'canvas' type support, many bugfixes and new tests.
…eaders test: did not finish because we don't call tile-loaded on cached tiles by default.
@Aiosa
Copy link
Contributor Author

Aiosa commented Nov 27, 2023

I also have the new version of the filtering plugin. There are some improvements in the plugin thanks to the overhaul:

  • much cleaner and simpler data handling logics (compare with the original callback hell, especially in filter application)
  • better behavior of the animation with data-processing-heavy code at tile-drawing event (but not by much since although async, all the work is done on the main thread anyway, it helps avoiding AnimationFrame handler took x milis.. errors)
  • uses the cache, no evergrowing memory (unlike the older implementation)

The downside is that first time load takes slightly longer to display fully loaded viewport, since async data processing means delayed execution, and some animations (+other logics) is executed instead where we would prefer doing more data processing. One thing that could help here is to reduce the amount of re-draw requests (each main cache update has to explicitly request redraw since asnyc). It should have faster refresh of the content, since auto zombies are used when replacing equal tiledImage on the same index, it was working for me but now I checked console and there are still download jobs fired on refresh, so I guess my latest changes broke something 🗡️

And finally, I am still arriving at chicken-egg problem with plugin data pipelining, I will have to think about it a bit more. I had to add getOriginalData since I realized I never want to access getData with filtering since I am insterested in the unfiltered version when processing. That said, if some plugin before had modified the tile, I would ignore its output at that point, so I need to make sure plugins transparently access outputs of their predecessors... One way of doing this is to internally manage a third cache, so that originalCacheKey points always to original data, cacheKey is always the data to draw (but not actually used), and drawers would internaly use a copy of cacheKey object they would update once they decide to do so, and they would reset cacheKey = originalCacheKey. But this means more cache overead and possible problems (users tend to like attaching custom properties to the data objects, doing so on cacheKey would not work).

// reloaded, so user code should get re-executed. IMHO,
// with tile propagation the best would be throw away old tile
// and start anew
callTileLoadedWithCachedData: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the headers change either update the tile keys or clear cache as well, since we don't update the cache key and thus potentially have incorrect state? I think the only state-consistent way of doing this is by unloading all the tiles this image owns, and let the system pull them anew. Whether the headers are reflected on the key later is again up to the system (overriden method for cache generation?).

@iangilman
Copy link
Member

Great that you're updating the filtering plugin! This will be good to have, but also it's a good test case.

When you say it takes a little longer at startup, how much longer are we talking about?

It does seem like when a plugin asks for the version of the data prior to their changes, it should include the changes from the previous plugins. That said, how do we determine what the plugin order is? And of course it's not just plugins... It could be the app itself. It does seem like some way to "stack" effects in a stable manner would be good.

@Aiosa
Copy link
Contributor Author

Aiosa commented Nov 27, 2023

When you say it takes a little longer at startup, how much longer are we talking about?

With cached tiles in browser, the synchronous loading happens instantly. With the async processing you can actually see the tiles jumping from low to high resolution, a few hundred milliseconds. But that also might be because each tile has to say 'I got new data viewport needs refresh', because it gets eventually ready in async context. Meanwhile, animation frames fire and each time it probably finds out we need viewport update, instead of waiting a bit to process more tiles at once. So I hope being smarter about when we say 'viewport needs update' will help. Or playing more with how drawers are 'greedy' (having them render in anync too to counter-balkanve the penslization). I need the WebGL PR merged for this to test properly. It modifies the drawer structure.

It does seem like when a plugin asks for the version of the data prior to their changes, it should include the changes from the previous plugins. That said, how do we determine what the plugin order is? And of course it's not just plugins... It could be the app itself. It does seem like some way to "stack" effects in a stable manner would be good.

The tile-loaded event should be obvious. If we make also tile-drawing event awaiting, we can let stuff define async handlers if they need to, and the order is decided with order defined via addHandler arguments. Now the drawing event does not wait, which might be problematic.

I was talking mainly about the tile cache access. Even if the plugin order is given (which is not possible for async/callback based plugin interaction at all in current version, see the caman usage in the Filtering plugin), I still need to resolve how to pipeline the data correctly.

Let's say plugin A does basic thing: gets data (makes a copy) and stores modified data (replaces the old data). It makes two caches: one for drawing, one with the original data saved. We have two methods: getData and getOriginalData. Now, original data is created only first time, subsequent calls to setData will overwrite just the rendering item.

Now, if plugin B does exactly the same, they either use getData which will correctly work with the modified data obtained in any previous step, even regardless of the data type. But, if some plugin wanted to access the original data (Filtering plugin notices we changed filters, it re-executes the filters), it ignores the output of any other plugin that might come before since only first plugin to modify data will store non-modified copy as the original. So we kind of need three caches (original, data in previous step, data now; just how swapping works), and the middle step can get deleted after finish. Or simply change the behavior of getData to access the original data in the first step (events would have to reset somehow tiles before firing) and later access the modified instances.

Edit: I noticed the HTML drawer does not fire tile-drawing at all, and the WebGL drawer will probably have poor performance with event-related animation adjustments, so I need to have the drawers refactored here as well to see how does it behave and which drawers support what events...

@Aiosa
Copy link
Contributor Author

Aiosa commented Feb 5, 2024

The code should be merged. There are some minor issues and tests still do not fully pass - sometimes there are few weird errors, and it seems CI got stuck too. Also there are quite some TODOs which need to be resolved (particularly with the new async behavior to finetune animation), but from my point of view this is almost ready. @pearcetm could you verify this works on your end?

Btw: I tried to also start cleaning up code where I made more changes (use class, replace var with const and let). One of these cleanups should be also moving files to some hierarchy as the sources file count is increasing. I decided to leave this once this PR is almost ready to prevent merge issues from any other PRs that might come - there are quite a few changes. The proposed structure is:

utils/
  - matrix3.js
  - proiorityqueue.js
  - spring.js
  - strings.js
  - point.js
  - rectangle.js
  - profiler.js
  - placement.js /more like enum
  - displayrectangle.js
  - fulscreen.js
sources/
  ...*source.js
drawers/
  ...*drawer.js

... the rest

@iangilman
Copy link
Member

Reorganizing the files sounds good, and that looks like a good organization. We should do that in a separate PR after this, to avoid confusion in this one.

@pearcetm
Copy link
Contributor

pearcetm commented Feb 8, 2024

@Aiosa I see you added a filtering plugin demo page. Awesome! It seems currently broken, to me at least. Is that what you're seeing too?

@Aiosa
Copy link
Contributor Author

Aiosa commented Feb 8, 2024

For me it works both for the drawers. The only issue is that

  • if you apply filter, you have to move viewport, the redraw / refresh does not properly propagate
  • the async application of filters makes renderer blink black frames since sometimes the data to render is not ready

I did not have time to fix this yet, but I was happy it is already operable - regardless of what drawer you use.

@Aiosa
Copy link
Contributor Author

Aiosa commented Feb 8, 2024

Added for demos implementation of generic drawer type switcher, we should add it to most demos to allow for flexible manual testing.

@Aiosa
Copy link
Contributor Author

Aiosa commented Feb 10, 2024

I have a question: how (and what purpose) is that of imageFormatSupported and related? E.g. discussed here: #2453 - the support for image formats is directly given by renderers as of now (what canvas / webgl can eat), and in the future will be given by what format type convertors are programmed. To me it seems like outdated feature, yet I struggle to come up with a good replacement design. Remove completely?

I mean even now with OSDv4 I render zip archives thanks to the 'advanced data pipeline'. This PR makes it possible to render virtually anything. Thoughts?

EDIT: Right now I realized there is quite an issue with the current design since webgl creates texture on the 'original data reference' which never gets freed, so I am trying now a design where drawers by default keep their own internal simplified cache inside cache items for drawing...

@iangilman
Copy link
Member

It looks like imageFormatSupported dates back to at least 2012. I'm not sure exactly why it was added.

The thing I like about it is the idea that it helps with cross-browser compatibility. The formats that we say we support are the ones we know are supported on all of the browsers OSD is expected to run on. For instance, it's not uncommon for people to want to load .tiff files directly into OSD, but only Safari supports them directly (at least according to https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Image_types). This means if they tested only on Safari (for some kooky reason) they would end up thinking that it works.

With the cache overhaul, it seems like maybe this feature is even more important, since we are now going to be supporting a lot more possible formats. I imagine it should tell the developer whether the format they are using is either natively supported by browsers or there is some sort of converter installed for it.

@Aiosa
Copy link
Contributor Author

Aiosa commented Mar 3, 2024

Okay, I managed to get some free time and fix a few things. I still have to measure the performance but otherwise it should work. There is no ability to say 'we support format XY' with the convertor class, so developers need to

  • program the type conversion
  • add the supported format declaration

To me this looks confusing. Format != data type looks to me like window != viewport coordinates, people will confuse it all the time. I would like to unify this somehow; in the end this function is used in just two sources (dzi and iiif), only the second one really uses it, dzi just throws an error. The rest of the sources just ignore it. Moreover format declaration cannot be fully trusted anyway. IMHO it would be better to just check that given data can be converted, and if the conversion fails just print an error that lists several reasons why this could be the case. If you test only on Safari you still won't see any errors anyway whether we check formats or not. If you forget to add some format that is supposed to work, you get false alarms. So the format support could be just the IIIF class thing...

We could just add event that fires when a conversion that was supposed to work fails and users can do whatever they want with that. The current implementation of imageFormatSupported is just randomly used error prone feature.

@pearcetm
Copy link
Contributor

pearcetm commented Mar 4, 2024

@Aiosa I haven't had time to dive into this extensively or do much testing, but I hope to do so over the next couple of weeks. In your mind is the filtering demo the best place to start? Have you been able to address the issues you noted above?

  • if you apply filter, you have to move viewport, the redraw / refresh does not properly propagate
  • the async application of filters makes renderer blink black frames since sometimes the data to render is not ready

@Aiosa
Copy link
Contributor Author

Aiosa commented Mar 4, 2024

I managed to mostly fix these issues, however, I discovered few more.

I tried to implement sharing of cache between multiple viewers. I did not succeed. Not only you sometimes cannot afford to share the cache completely (WebGL cannot share loaded texture between contexts) which I was able to solve more or less, but the overal tile fetching strategy is that tile record is created once it has data downloaded, which is too late: already two jobs per tile were requested at that point. So I realized it maybe might be much better idea to work with TiledImage positioning within one OSD instance. Harder to adjust plugins (e.g. annotation) to this separation though.

Concerning the new cache API, there are still easy to misuse scenarios, for example my implementation of the filtering plugin was doing this

        if (processors.length === 0) {
            //restore the original data
            const context = await tile.getOriginalData('context2d', false);
            tile.setData(context, 'context2d');
            tile._filterIncrement = self.filterIncrement;
            return;
        }

        const contextCopy = await tile.getOriginalData('context2d', true);
        const currentIncrement = self.filterIncrement;
        for (let i = 0; i < processors.length; i++) {
            if (self.filterIncrement !== currentIncrement) {
                break;
            }
            await processors[i](contextCopy);
        }

and I did not realize that my implementation of data restoration was in fact making the data item shared between two caches, and when one cache was freed so was the other -> black screen. But this happens only when you add some filters, then remove all of them and then try to again add new filters. But forcing users to copy on read will have its cost too.

Then there is another issue when the user calls tile invalidation too often (e.g. range input drag), my implementation of the plugin sometimes does not update all tiles to the latest state.

It will still take quite some work to get this right.

@Aiosa
Copy link
Contributor Author

Aiosa commented Mar 4, 2024

Okay I thought it is fixed but the webgl drawer still sometimes shows black screen on filter adjustments, I already ten times thought that it works already, then I clean up console log stuff and it stops working. I tried to put request redraw everywhere I could think of, still sometimes it does not properly redraw viewport... I will see

Edit: finally fixed. A stupid typo.

@iangilman
Copy link
Member

This is gnarly stuff! Thank you for working through all of it and keeping us updated ❤️

@Aiosa
Copy link
Contributor Author

Aiosa commented Mar 7, 2024

@iangilman I would like your opinion on imageFormatSupported. I can see two ways:

  • moving it to IIF tile source which is the only implementation seriously using it
    • just implement a warnings on conversion failure listing possible causes
  • properly integrating it withint he tile source base class
    • this is very hard to imagine for me RN, how do we check formats in a generic way? we no longer have to download just images, it can be vector, byte streams / archives, text...

To me it seems like it tries to solve the single issue where image type cannot be supported due to formats, and also to add ability for IIF to specify a desired image format. The former can just check the image conversion routine and alert on error. Maybe even raise an event...

convertor.learn("url", "image",  (tile, url) => new $.Promise((resolve, reject) => {
    const img = new Image();
    img.onerror = img.onabort = reject.bind(null, 'Check image format bla bla bla etc');
    img.onload = () => resolve(img);
    img.src = url;
}), ...);

The latter would better be in the IIIF class anyway, since it is a specific protocol feature.

@iangilman
Copy link
Member

Yeah, it may be time to let go of imageFormatSupported, or just push it down into the components that really need it.

That said, with the new data conversion infrastructure, aren't we expanding our knowledge of what is supported? For instance, if we're loading vectors or bytestreams, either we have modules installed that can convert those into pixels to be drawn to the screen or we don't, right? So it's not like the existence of those new formats makes it harder for us to know what can be loaded, right?

Anyway, as far as I can tell, the only advantage imageFormatSupported gives us is helping to inform the developer that they're using an image format that only some browsers support. Maybe it's fine to leave that to them.

When things fail on the current browser, though, providing meaningful error messages would be good!

@Aiosa
Copy link
Contributor Author

Aiosa commented May 3, 2024

I am sorry I did not progress much here, but lately I was very busy. I will try to return to this in a few weeks when all the stuff calms down a bit :)

@iangilman
Copy link
Member

@Aiosa Thank you for the update!

@Aiosa
Copy link
Contributor Author

Aiosa commented Jun 2, 2024

Merge for the recent changes. I am still unable to get stuff smoothly working. I replaced OpenSeadragon.Promise with sync proxy and the issues remained.... I have some local testing changes that did not work and so I came to realize the issue is with the approach rather the implementation:

Since I was doing an async pipeline I somehow assumed that most calls should just 'work' and execute as soon as possible with the only limit to keep the cache state consistent. But cache consitency != screen consistency. Instead, we should restrict execution flow as much as possible to allow only meaningful routes:

  • if a cache is being used, restrict any access to it until finished
    • this is a bit tricky since we need a good mechanism that will enable users really do the stuff they need
  • do not store two, but three caches: original data, render target (main cache), and working cache ('swap buffer')
    • we might want to hide the working cache within the main cache since it must have the very same tile set connected, tricky to keep this consistent otherwise
  • collect all changes and when ready, swap caches to allow seamless changes
    • what about the tiles that arrive during a processing? keep them from being rendered until processing changes to not to render 'new' data on the 'old' canvas - we are waiting for all tiles to finish the update to the new state
    • have 'swap queue' to allow users to queue up custom changes or restrict any custom usage except for the defined routes

And we can both support async & sync mechanisms just by simply swithing between window.Promise and the promise stub. This is pretty neat. Any comments to the above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants