Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homepage still rough #245

Open
bedeho opened this issue Nov 22, 2023 · 3 comments
Open

Homepage still rough #245

bedeho opened this issue Nov 22, 2023 · 3 comments
Assignees

Comments

@bedeho
Copy link
Member

bedeho commented Nov 22, 2023

Background

There are two problems

1. Channel Resampling

Despite that fact that the list of featured channels is quite long: https://www.notion.so/joystream/Setting-channel-weight-by-YPP-7a6329a186b94893b330ccaf10065a79

The homepage still suffers from a serious problem in terms of sampling the same channel multiple times

Screenshot 2023-11-22 at 18 11 23

The same channesl are repeated 3 and 4 and more times on the first screen, and if you scroll it just contiues, this gives the impression that there are very few distinct creators present, and also increases variance of home screen a great deal.

2. Language

Many videos will be in languages which are not understandable to the viewer, and it should be relatively easy to determine this. So certain languages can be used for all users, but say Russian or Spanish, or Arabic, these can be shown to specific other audiences.

Proposal

Update home screen display algo to avoid resampling the same channel multiple times as a strict rule, even as one scrolls, and also has some representation of what languages it is appropriate to display to users from different geos. The language code or geo of the user can be provided as an argument to the home page endpoint, that way its also easy for curators and other to easily inspect how the page looks for different geos. All of this is premised on the idea that there actually as a language information identifier in channels from youtube synch and uploaded in Atlas (is this true).

Please describe specification before going to implementation stage.

@WRadoslaw
Copy link
Contributor

Channel sampling

I've tried to come up with a custom query that would allow us to do such filtering, but the main problem I encountered is the performance since to have such logic, we need to at least do a group by channel with videos joined to determine top video for each creator, and based on the result we can fetch videos ordered by video relevance. This can be done in a single query, but it doesn't change much performance-wise

I think there might be a way to improve the performance with a materialized view, which will take care of the grouping by channel and getting top video, so we can just read from it. The downsides will be that data won't be as responsive (the homepage will update only after the view gets updated) and I think the materialized view will eat some memory (not sure how significant it would be @zeeshanakram3)

@zeeshanakram3
Copy link
Contributor

Yeah, I don't think using materialized views here is the right approach, as it's not trivial to set up the materialized views, and we would also need to set up the triggers to update these views periodically.

An alternative approach is calculating the relevance score such that the top n videos queried based on the video_relevance field would be for n distinct channels. So that you won't have to group the videos by their channel IDs, and then find the top video. In short this way, the background writing operation (which periodically calculates the scores) would become a bit more costly, while the read query cost would remain the same

@bedeho
Copy link
Member Author

bedeho commented Nov 24, 2023

video_relevance field

Does such a field exist in each channel?


Overall I'm pretty sure that the right approach here, in terms of simplicity, performance and the fact that we are not personalizing the home screen in this step, is just do a routine pre-computation (e.g. every 5-10mins), which effectively computes an implicit top 100 videos, and stores this conclusion in some very simple way.

That computation can itself be quite slow, and have lots of contingencies - such as avoiding resampling a channel, but then later querying to fetch these 100 videos will be very fast. If we then want to have language sensitivity into this, then you would just have to compute N such distinct K=100 videos, where N is the number of language categories you want.

So one super simple way to put this together is add one new table, called HomeFeedRankedVideos, where each row represents the home feed rank (0, ->) of a given video for a given language category. This home feed rank could include things like the channel weight we introduced previously, and now also make sure to not include multiple videos from the same channel. For a given language category, every X mins, a new set of videos are computed as top K videos, and then all existing videos in the table for this language are flushed, and these new K videos are inserted. So notice here that at any given time, the vast majorit of videos will not have any representation in this table, as they will not be in the top K of any language. When someone queries the home page, they just provde the lanague category they are intereted in, and the backend query just returns result - in ranked order - based on querying this table.

Disclaimer:

  • I'm not saying we have to or should do it this way, just saying this is one very simple way to do it.
  • It may be that even this is sufficiently hard (depending on time involved), that just biting the bullet on recommendations and full personalisation may make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants