Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Convert private WebUI strings to i18next #20610

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

sledgehammer999
Copy link
Member

This draft PR is meant for collaboration for the i18next transition of the WebUI private part. The public part was done with PR #20520

You can use my commits into your own work without any kind of attribution.
Each commit provides the regex I used for each converstion from QBT_TR(string)QBT_TR to the i18next way.

There are .html files with nested elements that the regex hasn't converted. I think they need manucal conversion. About 124 instances.
Almost the same issue with the 3 remaining instances in the .js files.

@Chocobo1
For translating HTML attributes we need to do some like the following. For example:

<textarea placeholder="Format: IPv4:port / [IPv6]:port"></textarea>
<textarea class="qbt-translatable" data-i18n="[placeholder]Format: IPv4:port / [IPv6]:port" placeholder="Format: IPv4:port / [IPv6]:port"></textarea>

Aka we set the data-i18n attribute's value to [attribute-name]string. And if we have multiple attributes to translate for the same HTML element we separate it with ; eg [attribute-name1]string1;[attribute-name2]string2
However, I haven't tested if i18next-parser can correctly break out the different strings.
Source: https://stackoverflow.com/a/22822236/10260481

@ OTHER contributors
If you have time please take look at the diffs and try to verify the conversion.
Example:
This HTML

<p class="random">QBT_TR(Some text)QBT_TR</P>

becomes this:

<p class="random qbt-translatable" data-i18n="Some text">Some text</P>

and this Javascript:

let a = 'QBT_TR(Some text)QBT_TR';

becomes this:

let a = i18next.t('Some text');

Search: (<((?!class=).)*?)>QBT_TR\((.+?)\)QBT_TR(\[CONTEXT=.+?\])?
Replace: $1 class="qbt-translatable" data-i18n="$3">$3
File glob: src/webui/www/private/**/*.html
Search: (<.+?class=".+?)(".*?)>QBT_TR\((.+?)\)QBT_TR(\[CONTEXT=.+?\])?
Replace: $1 qbt-translatable$2 data-i18n="$3">$3
File glob: src/webui/www/private/**/*.html
Search: (<(?:(?!class=).)+?)(\S+?)="QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?"
Replace: $1class="qbt-translatable" data-i18n="[$2]$3" $2="$3"
File glob: src/webui/www/private/**/*.html
Search: (<.+?class=".+?)(".*?)(\S+?)="QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?"
Replace: $1 qbt-translatable$2data-i18n="[$3]$4" $3="$4"
File glob: src/webui/www/private/**/*.html
Search: (<.+?data-i18n=".+?)(".*?)(\S+?)="QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?"
Replace: $1;[$3]$4$2$3="$4"
File glob: src/webui/www/private/**/*.html
Search: (["'])QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?\1
Replace: i18next.t($1$2$1)
File glob: src/webui/www/private/**/*.html
Manual conversion to template literal

Search: (["'])QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?\s*?\1
Replace: i18next.t($1$2$1)
File glob: src/webui/www/private/**/*.html
Search: `(.*?)QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?
Replace: `$1\${i18next.t('$2')}
File glob: src/webui/www/private/**/*.html
Search: (["'])(\s*? )?QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?((\\n)*?)\1
Replace: `$2${i18next.t($1$3$1)}$4`
File glob: src/webui/www/private/**/*.html
Search: (["'])QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?\1
Replace: i18next.t($1$2$1)
File glob: src/webui/www/private/**/*.js
Search: (["'])(\s*? )?QBT_TR\((.+?)\)QBT_TR(?:\[CONTEXT=.+?\])?\1
Replace: `$2${i18next.t($1$3$1)}`
File glob: src/webui/www/private/**/*.js
@sledgehammer999 sledgehammer999 added WebUI WebUI-related issues/changes Translations Related to i18n or Transifex labels Mar 26, 2024
@sledgehammer999
Copy link
Member Author

sledgehammer999 commented Mar 26, 2024

Damn it. I worked on an older state of the master branch.
I'll update it tomorrow.
You get the idea. Please comment if you any comment for the methodology.

@sledgehammer999
Copy link
Member Author

Aka we set the data-i18n attribute's value to [attribute-name]string. And if we have multiple attributes to translate for the same HTML element we separate it with ; eg [attribute-name1]string1;[attribute-name2]string2
However, I haven't tested if i18next-parser can correctly break out the different strings.

@Chocobo1
I looked at the source of the HTMLLexer. Very simple code. It does indeed what I describe above.
It also has this tidbit in the end:

// remove any leading [] in the key
key = key.replace(/^\[[a-zA-Z0-9_-]*\]/, '')

// if empty grab innerHTML from regex
key = key || $node.text()

So we can probably de-duplicate the HTML files. Instead of doing this:

<p class="random qbt-translatable" data-i18n="Some text">Some text</P>

we can do this:

<p class="random qbt-translatable">Some text</P>

And our loading function will use the .text() as key if the data-i18n attribute isn't present.

@@ -43,7 +43,7 @@

<body>
<noscript id="noscript">
<h1>QBT_TR(JavaScript Required! You must enable JavaScript for the WebUI to work properly)QBT_TR[CONTEXT=HttpServer]</h1>
<h1 class="qbt-translatable" data-i18n="JavaScript Required! You must enable JavaScript for the WebUI to work properly">JavaScript Required! You must enable JavaScript for the WebUI to work properly</h1>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is noscript, no point in providing translation strings.

@Chocobo1
Copy link
Member

Aka we set the data-i18n attribute's value to [attribute-name]string. And if we have multiple attributes to translate for the same HTML element we separate it with ; eg [attribute-name1]string1;[attribute-name2]string2
Source: https://stackoverflow.com/a/22822236/10260481

It seems the link was using jquery-i18next which we don't use and therefore we don't have .i18n(). This function seems to be capable of parsing the [attribute] part and applying the translated text (to various element types). We probably need to write our own version.

So we can probably de-duplicate the HTML files. Instead of doing this:

<p class="random qbt-translatable" data-i18n="Some text">Some text</P>

we can do this:

<p class="random qbt-translatable">Some text</P>

And our loading function will use the .text() as key if the data-i18n attribute isn't present.

I tried to apply it to the public login page. There are a few issues with it:

  1. i18next-parser couldn't extract the the translation string.
    Probably it only recognize data-i18n by default. Using a empty data-i18n would work though:
    <p class="random qbt-translatable" data-i18n>Some text</P>
  2. We would need to manually handle and apply the [attribute] part.
    i18next itself doesn't seem to provide function for it.

I quickly skimmed through the changes. It seems that the CONTEXT part was thrown away. Would it be better to preserve it in some way?
i18next supports it but I reckon it might be confusing for the translators. I mean the context string would be appended to the key name and that might confuse the translators. Also I didn't found a way to have context on html.
https://www.i18next.com/translation-function/context#basic

Or perhaps we really need to write our own custom string extractor...

@sledgehammer999
Copy link
Member Author

We would need to manually handle and apply the [attribute] part

Yes, because we use the vanilla version. We are responsible for applying the translated strings.
It should be fairly easy to implement in our replaceI18nText() function though.

Using a empty data-i18n would work though:

I wonder if we could further optimize replaceI18nText() to filter nodes that have the data-i18n attr and drop the requirement for adding the qbt-translatable class.

I quickly skimmed through the changes. It seems that the CONTEXT part was thrown away. Would it be better to preserve it in some way?
i18next supports it but I reckon it might be confusing for the translators. I mean the context string would be appended to the key name and that might confuse the translators. Also I didn't found a way to have context on html.

I wanted to also talk about this.
I am not sure what to do here. I understand the value of having context but the ergonomics don't fit our workflow. i18next is designed around having a single variable-like key and not a string-like key.
i18next-parser has a config setting contextSeparator: '_'. But I am not sure what the hell Transifex translators will be presented with.

Or perhaps we really need to write our own custom string extractor...

The regex I have in each commit can probably be made to also capture the CONTEXT value. However, i18next's JSON format saves it as "key_context: translated string".

Or you mean having our own format for runtime and extraction? This might work. I don't know how hard the implementation will be.
In any case, we also need to investigate how it will work with Transifex.

@Chocobo1
Copy link
Member

I wonder if we could further optimize replaceI18nText() to filter nodes that have the data-i18n attr and drop the requirement for adding the qbt-translatable class.

Maybe this, I haven't test: https://stackoverflow.com/questions/2694640/find-an-element-in-dom-based-on-an-attribute-value

I quickly skimmed through the changes. It seems that the CONTEXT part was thrown away. Would it be better to preserve it in some way?
i18next supports it but I reckon it might be confusing for the translators. I mean the context string would be appended to the key name and that might confuse the translators. Also I didn't found a way to have context on html.

I wanted to also talk about this. I am not sure what to do here. I understand the value of having context but the ergonomics don't fit our workflow. i18next is designed around having a single variable-like key and not a string-like key. i18next-parser has a config setting contextSeparator: '_'. But I am not sure what the hell Transifex translators will be presented with.

Transifex would very likely to present the full string which is not what we want for the translators.

I thought about this and I reckon that the CONTEXT isn't that useful. It would be useful if the translation strings is too many to hold it in one js Object/Map and needed to use CONTEXT to access different Object/Map.
What is really useful is the disambiguation field from Qt (https://doc.qt.io/qt-6/qobject.html#tr). We do encounter cases in the past that required this field.
However JSON on Transifex doesn't support it and we would need other file format such as Gettext (.po) or Qt Linguist (.ts).
So the workflow would look like this:
HTML source:

<p class="qbt-translatable" data-i18n="Some text{disambiguation}">Some text</P>

Extracted JSON:

"Some text{disambiguation}": ""

translations_qt2i18next.py would convert the JSON file into .po (or .ts) and the {disambiguation} string will be removed from key field and moved into comment field. Then upload to Transifex.
Downloading from Transifex would reverse the process.

@sledgehammer999
Copy link
Member Author

sledgehammer999 commented Mar 28, 2024

I wonder if we could further optimize replaceI18nText() to filter nodes that have the data-i18n attr and drop the requirement for adding the qbt-translatable class.

Maybe this, I haven't test: https://stackoverflow.com/questions/2694640/find-an-element-in-dom-based-on-an-attribute-value

I quickly skimmed through the changes. It seems that the CONTEXT part was thrown away. Would it be better to preserve it in some way?
i18next supports it but I reckon it might be confusing for the translators. I mean the context string would be appended to the key name and that might confuse the translators. Also I didn't found a way to have context on html.

I wanted to also talk about this. I am not sure what to do here. I understand the value of having context but the ergonomics don't fit our workflow. i18next is designed around having a single variable-like key and not a string-like key. i18next-parser has a config setting contextSeparator: '_'. But I am not sure what the hell Transifex translators will be presented with.

Transifex would very likely to present the full string which is not what we want for the translators.

I thought about this and I reckon that the CONTEXT isn't that useful. It would be useful if the translation strings is too many to hold it in one js Object/Map and needed to use CONTEXT to access different Object/Map. What is really useful is the disambiguation field from Qt (https://doc.qt.io/qt-6/qobject.html#tr). We do encounter cases in the past that required this field. However JSON on Transifex doesn't support it and we would need other file format such as Gettext (.po) or Qt Linguist (.ts). So the workflow would look like this: HTML source:

<p class="qbt-translatable" data-i18n="Some text{disambiguation}">Some text</P>

Extracted JSON:

"Some text{disambiguation}": ""

translations_qt2i18next.py would convert the JSON file into .po (or .ts) and the {disambiguation} string will be removed from key field and moved into comment field. Then upload to Transifex. Downloading from Transifex would reverse the process.

I was thinking about the problem this past day and I came to a similar conclusion.
Optimize replaceI18nText() according to the SO answer. Rework the tstool.py to merge/split JSON from Qt .ts files, therefore not needing to create a new Transifex resource.
Same thoughts about disambugation/context.
Currently our QBT_TR(string)QBT_TR template doesn't support disambugation so I thought we might not need it after all. However after reading your thoughts how about this: Use i18next's context as Qt's disambugation?

@Chocobo1
Copy link
Member

Chocobo1 commented Mar 29, 2024

Optimize replaceI18nText() according to the SO answer.

I'll experiment on public login page first and submit a PR if the results are satisfactory.

However after reading your thoughts how about this: Use i18next's context as Qt's disambugation?

Unless you mean to create our own string extractor, I think there are some problems with i18next context (and probably i18next-parser):

  1. The context will prepend an underscore and appended to the key. The resulting key would be ambiguous to decide which is which. Imagine there are underscores in both key and disambugation field. This might introduce problems to our auxiliary python scripts.
  2. The resulting key will confuse translators on Transifex. They can't tell which is disambugation and will probably translate the string as a whole.
  3. The data-i18n attribute doesn't have a field for disambugation. Only i18next.t() have it.

Which is why I used the form: string{disambugation} (or string{{disambugation}}).
https://www.i18next.com/translation-function/context#basic

@sledgehammer999
Copy link
Member Author

sledgehammer999 commented Mar 29, 2024

Optimize replaceI18nText() according to the SO answer.

I'll experiment on public login page first and submit a PR if the results are satisfactory.

Yeah, I wanted to introduce a PR this weekend too.

However after reading your thoughts how about this: Use i18next's context as Qt's disambugation?

Unless you mean to create our own string extractor, I think there are some problems with i18next context (and probably i18next-parser):

1. The `context` will prepend an underscore and appended to the key. The resulting key would be ambiguous to decide which is which. Imagine there are underscores in both `key` and `disambugation` field. This might introduce problems to our auxiliary python scripts.

2. The resulting key will confuse translators on Transifex. They can't tell which is disambugation and will probably translate the string as a whole.

3. The `data-i18n` attribute doesn't have a field for `disambugation`. Only `i18next.t()` have it.

Which is why I used the form: string{disambugation} (or string{{disambugation}}). https://www.i18next.com/translation-function/context#basic

I have slightly different understanding on how these work and how I intend them to work:

  1. The context separator character (aka underscore) is configurable in both i18next and i18next-parser. We can choose another character that is highly unlikely to appear as a legit character in the english string.
  2. The data-i18n attribute will need extra parsing logic anyway on our end to support multiple keys separated via ; and prefixed with [attr-name]. We just add logic to separate out the context too.
  3. Our python script will separate the context too, and insert it into the <comment> tag of Qt's .ts files. The .ts file will be on Transifex and Transifex will correctly present the main string and the comment as separate things.

Maybe my thoughts will be conveyed better in code. I'll try to have a PR ready today/tomorrow.

@Chocobo1

This comment was marked as off-topic.

@sledgehammer999
Copy link
Member Author

Take a look at #20643 first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Translations Related to i18n or Transifex WebUI WebUI-related issues/changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants