Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree-sitter fixes for December (including a PHP grammar!) #852

Merged

Conversation

savetheclocktower
Copy link
Sponsor Contributor

@savetheclocktower savetheclocktower commented Jan 5, 2024

NOTE: I could spin the new PHP grammar off into its own PR, but I'm not sure what the point would be. The rest of this description is for the PHP grammar itself; read the other fixes’ commit descriptions for further detail.

Description of the Change

This is the modern Tree-sitter PHP grammar that’s been sitting on my hard drive for months.

The base PHP grammar is text.html.php. Anything that looks like HTML gets handled in an HTML injection. The tree-sitter-php parser is weeeeeird in how it decides to arrange and hierarchize nodes, so in order to properly identify PHP blocks (the parts between, and including, <?php and ?>), we’re re-injecting the PHP grammar into itself.

Why is this necessary? For the same reason that there’s no simple way to describe the group consisting of you, your second cousin once removed, your great grand-uncle, and your three unmarried aunts on your father’s side. There’s no name for that group because that’s just a random sampling of your family tree. Now pretend that you had to create one contiguous region of the buffer from the boundaries described by six seemingly random nodes. Same concept!

The least worst best way to do this is via addInjectionPoint, and even then I had to invent a couple of features to pull it off — specifically the includeAdjacentWhitespace option to addInjectionPoint and the ability to apply an injection’s root scope to its “content ranges” rather than the entire extent of its injection.

Appreciations go to @claytonrcarter for wandering into our Discord and causing me to become aware of his Tree-sitter parser for PHPDoc. We now have good highlighting of documentation comments in PHP, much like the JSDoc injection into JavaScript/TypeScript documentation comments.

I had to rebuild web-tree-sitter to add another C stdlib function, so I also took the opportunity to move us to version 0.20.8. This is a modified Tree-sitter version that has some extra externals as described here. It also includes a fix that’s currently pending on the main tree-sitter repo (authored by yours truly) to make some of the newly-introduced predicates work correctly in the web-tree-sitter bindings.

Alternate Designs

Technically, the PHP-injected-into-itself layer doesn’t need to do any parsing. All the parsing is already handled by the base layer. The purpose of creating an injection layer here is just to identify the buffer ranges that need to be annotated with source.php, so once the injection is in place, the parser might as well be a no-op. Since that doesn’t exist, I’ve just chosen to have the injection use tree-sitter-php.

In theory, this means some unnecessary work, but it shouldn’t have any meaningful impact on user experience. I’d care more about finding a solution to this if I thought that solution would have any applicability in any scenario other than this one strange one.

Possible Drawbacks

There’s an annoying thing I realized about the tree-sitter-html grammar only today: it assumes it’s looking at an entire HTML document. Open up a blank document, set the grammar to HTML (either modern or legacy Tree-sitter), then type </div>. It won’t highlight until you put an opening <div> before it!

I only realized this now because I was doing sanity checking on my local WordPress codebase. WordPress themes often break up parts of the page between different PHP includes, so a file is not guaranteed to have a <div> for every </div>, and vice versa. Otherwise this isn’t really PHP’s fault or the fault of this new grammar.

This is pretty silly on tree-sitter-html’s part and I might venture over there to see if this has been discussed.

Verification Process

Check out this PR and throw the weirdest PHP files you have at it. Make sure modern Tree-sitter grammars are enabled as described here, including the grammar-selector setting change so that you can switch between TextMate and Tree-sitter grammars.

Compare the old grammar and the new grammar in how they highlight code on all bunled syntax themes.

Release Notes

Added a modern Tree-sitter grammar for PHP.

@savetheclocktower savetheclocktower changed the title Tree-sitter-fixes for December (including a PHP grammar!) Tree-sitter fixes for December (including a PHP grammar!) Jan 5, 2024
Copy link
Contributor

@claytonrcarter claytonrcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi this is cool! I have a couple of fiddly comments on some of the injections code, but nothing really major there.

As you know, I'm new to Pulsar and have always struggled trying to wrangle Atom builds with Electron. That said, I'm getting an error when I run this branch; it seems that the first layout of the file is sort of OK, but then subsequent updates don't change anything. This seems to also be affecting injections. I'm running the latest release (1.112.1) but I'm struggling to build from source, so it could be that this is not happening on master, but only on 1.112.

When I open a php file w/ all the tree-sitter stuff turned on, I get this in the console:
Screen Shot 2024-01-05 at 11 02 45 AM

Note that I get this 3-4 times per highlight (eg per keystroke). At this point, I've got like 1800 of them in the console for the little test file I pasted below.

And here it is pasted:

/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:617 Uncaught (in promise) RuntimeError: abort(Assertion failed: undefined). Build with -s ASSERTIONS=1 for more info.
    at abort (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:617:29)
    at assert (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:343:25)
    at getDylinkMetadata (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:801:25)
    at loadWebAssemblyModule (/Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:966:36)
    at /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:2739:52
    at WASMTreeSitterGrammar.getLanguage (/Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-grammar.js:139:24)
    at LanguageLayer.update (/Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-language-mode.js:3286:11)
    at async Promise.all (index 1)
    at LanguageLayer.update (/Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-language-mode.js:3290:9)
abort @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:617
assert @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:343
getDylinkMetadata @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:801
loadWebAssemblyModule @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:966
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/vendor/web-tree-sitter/tree-sitter.js:2739
Promise.then (async)
seek @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/wasm-tree-sitter-language-mode.js:2374
buildScreenLines @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/screen-line-builder.js:92
getScreenLines @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/display-layer.js:637
queryScreenLinesToRender @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:907
updateSyncBeforeMeasuringContent @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:402
updateSync @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:279
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:225
performDocumentUpdate @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/view-registry.js:264
requestAnimationFrame (async)
requestDocumentUpdate @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/view-registry.js:250
updateDocument @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/view-registry.js:208
scheduleUpdate @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:224
didRequestAutoscroll @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor-component.js:2283
scrollToScreenRange @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:5143
autoscroll @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:795
changePosition @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:785
setScreenPosition @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:66
moveLeft @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/cursor.js:313
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:276
modifySelection @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:1222
selectLeft @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:276
backspace @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/selection.js:596
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1786
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1801
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1800
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/text-buffer.js:1320
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:2467
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1799
mergeSelections @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:4053
mergeIntersectingSelections @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:4015
mutateSelectedText @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1798
backspace @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:1786
object.<computed> @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/underscore-plus/lib/underscore-plus.js:77
core:backspace @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/register-default-commands.js:441
(anonymous) @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/register-default-commands.js:691
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/text-buffer/lib/text-buffer.js:1320
transact @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/text-editor.js:2467
newCommandListeners.<computed> @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/register-default-commands.js:691
handleCommandEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/command-registry.js:405
module.exports.KeymapManager.dispatchCommandEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/atom-keymap/lib/keymap-manager.js:617
module.exports.KeymapManager.handleKeyboardEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/node_modules/atom-keymap/lib/keymap-manager.js:408
handleDocumentKeyEvent @ /Applications/Pulsar.app/Contents/Resources/app.asar/src/window-event-handler.js:153
Show 16 more frames

Here is my test file in TextMate mode:
Screen Shot 2024-01-05 at 11 06 53 AM

And here it is in tree-sitter mode:
Screen Shot 2024-01-05 at 11 08 48 AM

Note that:

  • the TODO is highlighted
  • the phpdoc is not
  • the heredocs are not highlighted as injections, only PHP strings
  • the nowdoc is not highlighted even as a string?

After this first layout, if I add anything, it's not highlighted. See lines 10&11 in this pic:
Screen Shot 2024-01-05 at 11 11 36 AM

I'll tinkering with this. Like I said, it may just be an issue on 1.112 and not on master. Or maybe just something on my system? I'm not in safe mode, but I did remove all plugins except a link to the php package.

Comment on lines +84 to +95
atom.grammars.addInjectionPoint('text.html.php', {
type: 'comment',
language: (node) => {
return TODO_PATTERN.test(node.text) ? 'todo' : undefined;
},
content: (node) => node,
languageScope: null
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 3 separate comment injections, all of which look very similar aside from the injection language they're looking for and some syntax. I think that these can be consolidated into 1, what do you think?

(The only part that's new to me and that I'm unfamiliar with is languageScope: null and what that does; I see that it's null for the first 2 injections (TODO and hyperlinks), but not for phpDoc. Maybe that means that it's not possible to consolidate these?)

Suggested change
atom.grammars.addInjectionPoint('text.html.php', {
type: 'comment',
language: (node) => {
return TODO_PATTERN.test(node.text) ? 'todo' : undefined;
},
content: (node) => node,
languageScope: null
});
atom.grammars.addInjectionPoint('text.html.php', {
type: 'comment',
language(node) {
if (TODO_PATTERN.test(node.text)) return 'todo';
if (isPhpDoc(node)) return 'phpdoc';
if (HYPERLINK_PATTERN.test(node.text)) return 'hyperlink';
},
content: (node) => node,
languageScope: null
});

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can't be consolidated because we might need to inject more than one of them into each comment. If we want both TODOs and URLs to be highlighted inside an ordinary PHP comment, we need to create two injection points, or else only one or the other will be picked.

languageScope: null means that the root scope of the injected grammar (the one specified in the grammar's CSON file) isn't actually added to the injection's ranges. This makes sense for the TODO and hyperlink injections because there's no point in adding a text.todo scope or a text.hyperlink scope into the buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject more than one of

Oh! I never even considered that. It's really cool that that works!

packages/language-php/lib/main.js Outdated Show resolved Hide resolved
@claytonrcarter
Copy link
Contributor

I kept flailing hacking around a bit and have this to add:

  • adding (nowdoc) @string.unquoted.nowdoc.php to the highlights fixed the nowdoc highlighting (this was just a guess that seemed to work, though; there's probably a better way to do it!)
  • the heredoc language injection seems like it may work for nowdoc as well if you just repeat the same injection for type: 'nowdoc'

it may just be an issue on 1.112 and not on master

Yes, I think that this is the case. I poked around and realized that the error is coming from the web-tree-sitter code that was also updated in this PR, and which won't be running in 1.112. So you can disregard all of my comments above. 🤦 Now I just need to figure out how to build from source, but that's not relevant to this PR.

@savetheclocktower
Copy link
Sponsor Contributor Author

Yes, I think that this is the case. I poked around and realized that the error is coming from the web-tree-sitter code that was also updated in this PR, and which won't be running in 1.112. So you can disregard all of my comments above. 🤦 Now I just need to figure out how to build from source, but that's not relevant to this PR.

You don't need to build, luckily. You just need to check out the repo, do a yarn install, and then set ATOM_DEV_RESOURCE_PATH to the root of your source directory. Then it should be able to run from source when you launch Pulsar in dev mode.

@savetheclocktower
Copy link
Sponsor Contributor Author

OK, addressed feedback. Also updated the tree-sitter-php WASM file to the latest commit from master because I realized I'd built the original one last March. Amazingly, nothing seems to have broken from that change.

@claytonrcarter
Copy link
Contributor

OK, addressed feedback. Also updated the tree-sitter-php WASM file to the latest commit

👍 Now that I have my build fixed, I'll play with this PR with some of my PHP files and see if anything turns up. This is really great, thank you!

@claytonrcarter
Copy link
Contributor

OK, so you did ask me to be a little nitpicky. 😄 Here's a round of things I noticed after taking screenshots of a file I loaded up and doing a visual diff.

Changes that don't affect highlighting

  • semicolons have scope punctuation.terminator.expression.php in TM, but nothing here
    • this may be a fix:
(expression_statement
  ";" @punctuation.terminator.expression.php)
  • integers in TM have constant.numeric.decimal.php but have constant.numeric.decimal.integer.php here

  • meta. scopes are mostly missing from this PR, but are applied willy nilly in the TM grammar:

    • use declaration gets the scope meta.use.php
    • class declarations (like, the entire thing) get meta.class.php and the class body also gets meta.class.body.php
    • the list goes on and on and on: meta.function.php, meta.array.php, meta.switch-statement.php, etc
    • I have no idea how useful these are, just commenting on the difference

Changes that do affect highlighting

use Exception;
use Foo;
use Illuminate\Support\Arr;
  • in above example, "namespace-less" use declarations differ from those w/ a namespace

    • TM: Exception, Foo and Arr are all colored the same.
    • PR: Exception and Foo are the same (not matching TM), but Arr is different (does match TM)
    • TM: Exception is support.class.builtin.php; Foo and Arr the scope support.class.php
    • PR: Exception and Foo are both entity.name.type.namespace.php; Arr is support.other.function.constructor.php
  • static is not highlighted (eg private static array $config; or public static function foo() {})

    • TM: storage.modifier.php
    • PR: nothing
  • self is not highligted in new self

    • TM: storage.type.php
    • PR: nothing
    • appears to be fine when used as a return type
  • $this is not colored correctly

    • it's all 1 color in TM, but the $ and this are diffent colors here
    • TM: $this the scope variable.language.this.php
    • TM: $ is punctuation.definition.variable.php (same as here)
    • in this PR, only this has the scope constant.language.builtin.this.php
    • visually, the color of this is the same in both, only $ is different in this PR
  • method names are a little complicated in both

    • declaration (eg function foo() {})
      • __construct
        • TM: support.function.constructor.php
        • PR: entity.name.function.method.php (same in all 4 case)
      • magic methods (eg __get, __isset)
        • TM: support.function.magic.php
        • PR: entity.name.function.method.php
      • static & regular methods
        • TM: entity.name.function.php
        • PR: entity.name.function.method.php
    • as for usage (eg $foo->bar() or Foo::bar()), all above cases appear to
      be handled the same w/i each grammar, but nonetheless differently between
      the 2 grammars:
      • TM: meta.function-call.php & entity.name.function.php
      • PR: entity.name.function.method.php
  • dynamic propery names (eg the $bar in $foo->$bar)

    • TM: variable.other.property.php and colored diffently (in my theme) than regular variables
    • PR: variable.other.php and colored the same as regular variables
  • exceptions in catch (eg catch (Exception))

    • TM: support.class.exception.php
    • PR: storage.type.php
  • return types (eg function foo(): Collection)

    • TM: support.class.php
    • PR: storage.type.php
  • some function calls, eg

    • get_class_vars() (the highlighting MATCHES for me)
      • TM: support.function.classobj.php
      • PR: support.function.class-obj.php
    • collect() (the highlighting does NOT match for me)
      • TM: entity.name.function.php
      • PR: support.other.function.php
  • the fn in arrow functions

    • TM: storage.type.function.php
    • PR: nothing
    • note that regular closures do match for me; both get storage.type.function.php, though TM also applies meta.function.closure.php
  • phpdoc: in this snippet@return Collection<Coupon>

    • both grammars apply comment.block.documentation.phpdoc.php to that whole snippet
    • @return
      • TM: keyword.other.phpdoc.php
      • PR: storage.type.class.phpdoc.php
    • PR: Collection<Coupon> is all one color w/ scope entity.other.type.instance.phpdoc.php
    • TM:
      • Collection<Coupon> has scope meta.other.type.phpdoc.php
      • Collection and Coupon both also have scope support.class.php
      • this means that the classes have one color in TM, but the < and > have a different color

Files

text mate screenshot
tree sitter screenshot

example code from screenshots
<?php

namespace SDK;

use Exception;
use Foo;
use Illuminate\Support\Arr;
use Illuminate\Support\Collection;
use Illuminate\Support\Str;

class Coupon
{
    private string $code;
    private string $description;
    private string $expires;
    private bool $single_use;
    private ?int $amount_flat;
    private ?int $amount_percent;
    private ?bool $free_shipping;
    private ?int $minimum_subtotal;
    private null|string|array $branch;
    private ?string $country;

    private static array $config;

    private function __construct(array $attributes, string $code)
    {
        foreach (get_class_vars(get_class($this)) as $key => $value) {
            switch ($key) {
                case 'config':
                    break;
                case 'code':
                    $this->$key = $code;
                    break;
                default:
                    $this->$key = $attributes[$key] ?? null;
            }
        }
    }

    public function __get($name)
    {
        1111111;
        if ($name === 'config') {
            throw new Exception('Call Coupon::find() or ::getConfig() instead');
        }

        return $this->$name;
    }

    public function __isset($name)
    {
        $result = null;
        try {
            $result = $this->__construct($name);
        } catch (Exception $e) {
            // Swallow the exception the __get() can throw.
        } catch (Exception) {
            // Swallow the exception the __get() can throw.
        }

        return isset($this->$name) && !is_null($result);
    }

    public static function find(string $code): ?self
    {
        $normalizedCode = Str::upper(trim($code));

        $code_lookup = self::getConfig()[$normalizedCode] ?? null;

        if (!$code_lookup) {
            return null;
        }

        return new self($code_lookup, $normalizedCode);
    }

    /** @return Collection<Coupon> */
    public static function all(): Collection
    {
        function($item) { $item };
        return collect(self::getConfig())->map(
            fn($config, $code) => new self($config, $code),
        );
    }
}

@claytonrcarter
Copy link
Contributor

BTW, I was playing with the tests that I originally wrote for my old PR at atom/language-php#438 and I was able to get them working w/ this WASM grammar. In case they may be useful, they allow specs to be written like this:

it("should tokenize = correctly", async () => {
  await editor.setPhpText('$test = 1;');

  expect(editor).toHaveScopesAtPosition([1, 0], '$', ["source.php", "variable.other.php", "punctuation.definition.variable.php"]);
  expect(editor).toHaveScopesAtPosition([1, 5], ' ', ["source.php"]);
  expect(editor).toHaveScopesAtPosition([1, 6], '=', ["source.php", "keyword.operator.assignment.php"]);
  expect(editor).toHaveScopesAtPosition([1, 8], '1', ["source.php", "constant.numeric.decimal.integer.php"]);
  expect(editor).toHaveScopesAtPosition([1, 9], ';', ["source.php", "punctuation.terminator.expression.php"]);
});

And then failures look like this:

  Scopes did not match at position [1, 8]:
$test = 1;
        ^
  These scopes were expected but not received:
      constant.numeric.decimal.php
  These scopes were received but not expected:
      constant.numeric.decimal.integer.php
  These were all scopes recieved:
      source.php, constant.numeric.decimal.integer.php

        at jasmine.Spec.<anonymous> (/Users/crcarter/src/Atom/pulsar/packages/language-php/spec/wasm-tree-sitter-spec.js:38:22)
      Failure:
  Scopes did not match at position [1, 9]:
$test = 1;
         ^
  These scopes were expected but not received:
      punctuation.terminator.expression.php
  These scopes were received but not expected:

  These were all scopes recieved:
      source.php

Happy to share if that seems like it may be helpful.

@savetheclocktower
Copy link
Sponsor Contributor Author

OK, the main takeaway here is that a lot of these decisions will end up being different from what the TM grammar did. Part of the purpose of this whole thing is that it’s a rare opportunity to flex some muscle and enforce more scope consistency across languages than what we’ve had in the past. Consistency means that syntax theme authors can make high-level decisions about colors without having to check everything in every language and add a ton of language-specific exceptions. I wrote up a whole reference document for scopes and probably should’ve pointed you to it before this review.

It’s also a chance to break with some bad conventions that had emerged and enforce some clarity and separation. For instance, most community TextMate grammars had decided that all function calls — not just function definitions — would be scoped as entity.name.function, thereby making half of the text in my editor orange and making it impossible to use scope names to make meaningful semantic decisions. (The PHP TextMate grammar seems to have gone the other way and made everything support in one way or another.)

My tactical rules were something like:

  1. Aim for consistency, but embrace existing scope names unless they’re just flat-out wrong.
  2. When possible, add detail. If all keywords were keyword.control.php before, it’s a decent idea to make try be keyword.control.try.php, to make catch be keyword.control.catch.php, et cetera. I only avoid this in situations where the addition of a scope segment would make a theme go haywire. (For instance, most themes will color foo.bar.string.php like a string, because hardly any of them enforce that string is the first segment in the scope name.)
  3. Most scopes in the meta namespace are superfluous. Honestly, their purpose in a TextMate grammar was just as much about helping grammar authors keep track of which rule did which thing. I’ve added them back more sparingly when they can serve a meaningful goal (for example, to allow contextual snippets as described here).

Your list is great for helping me realize where I’d missted stuff, so thanks a bunch. There are still a lot of places where I deviated from what the TM grammar did. If the outcome is nonetheless catastrophic for your editor experience, that’s the point at which we’d talk about changes to the theme. (We made a handful of changes to built-in themes to ensure continuity even when scopes changed.)

Here’s what I addressed (changes soon to be added to the PR):

  • Your screenshots made me realize that the ? and | for optional and union types (respectively) were not addressed at all. Those are now keyword.operator.nullable-type.php and keyword.operator.union-type.php.

  • use Illuminate\Support\Arr; is fixed. I don't remember when I decided to scope these as entity, but it's as valid a choice as any — entity is also used for the names of declared classes and functions.

  • The static modifier is now scoped under storage. Visibility modifiers (public, private, protected) are now scoped under storage, too, just like TM. Not sure why I had them as keywords.

  • self/$this is a fun one. I had this scoped as constant because that’s what I’m used to — I think in JavaScript this was constant.language.builtin.js in the TextMate days — but I forgot that the consensus seems to have moved to variable.language.builtin. I’ll make sure all of $this is scoped that way. It feels weird to have self scoped as a variable in PHP when all PHP variables have sigils in my mind… but I know there are probably lots of counterexamples to my reckonings. self gets scoped as variable.language.builtin.self.php alongside $this’s variable.language.builtin.this.php despite their differences. (The TM grammar has self scoped as storage.type and that’s just nuts.) All of this also applies to self’s friend parent.

  • In the snippet function __construct() {}, __construct is properly regarded as entity rather than support, but we should identify it as a constructor, so I’ll do entity.name.function.magic.constructor.php.

  • Likewise, other magic methods can be scoped as entity.name.function.magic.php. (The leading __ in the names precludes automatic interpolation of each one into the scope name, and I'm too lazy to do each one manually.)

  • Static methods can be scoped as entity.name.function.method.static.php.

  • You say “as for usage (eg $foo->bar() or Foo::bar())…” and then state that the PR grammar scopes them as entity.name.function.method.php. I’ve already got bar scoped on the former as support.other.function.method.php and on the latter as support.other.function.method.static.php. If you’re not seeing that on your end, can you give me a code sample?

  • The $bar in $foo->$bar is now scoped as variable.other.property.php.

  • Exception in } catch (Exception $ex) { is, I think, properly treated as a type. It’s got all the trappings of a type annotation in usage, and tree-sitter-php has it as a node of the type named_type.

  • Same for the Collection in function foo(): Collection — it’s a type annotation. In TypeScript I convinced myself that types were important enough to merit highlighting beyond what you’d get from other things in the storage namespace (like visibility modifiers), so TypeScript types are scoped with both storage.type.ts and support.type.ts. We could do something similar here if you think it’s important.

  • The fn in an arrow function is now scoped as storage.type.function.arrow.php. (And I was missing scopes for the parentheses around parameters, so I've addressed that as well.)

  • Doc comments are generally where the taxonomy falls apart, but I’m realizing as I look over this that my initial decisions were entirely governed by wanting to make different things look different in my own custom theme. Let me try to regroup:

    • Each individual type (rather than the type_list in your PHPDoc parser) will now have a storage.type scope name. This means that the <s in generic syntax will not be colored the same way as the types. Again, this is a place where the TM grammar preferred support, but if those other usages are types, then so is this one. And if you really think they should have both storage and support, we can consider that.

    • Tags like @return will now be scoped as entity.name.tag like other sorts of “tags” — e.g., HTML tags. If that’s just a term you made up for the parser, and they're more commonly called something else in the wild, then I’m happy to revisit this and treat them more like keywords.

Open to more dialogue here, so let me know if any of this feels flat-out wrong. And thanks again; I wish all language packages could have someone advocating for them like this.

@savetheclocktower
Copy link
Sponsor Contributor Author

BTW, I was playing with the tests that I originally wrote for my old PR at atom/language-php#438 and I was able to get them working w/ this WASM grammar.

I might take you up on that. Hardly any of the language packages are testing the new grammars because I didn't want to codify a bunch of tests while scope decisions were still up in the air. There are other styles of scope testing that have historically been used, but I don't love them for various reasons.

@claytonrcarter
Copy link
Contributor

This is very cool. Between atom/language-php#303 and atom/language-php#438, I've been looking forward to this for a long time and I'm really excited to see it happening!

I drove with this today and it seemed stable and consistent. I found a few more small issues, and have a few questions, too.

Scope issues I noticed

  • final keyword (eg final class Foo)
    • TM: storage.modifier.final.php
    • PR: keyword.control.final.php
    • If I'm starting to understand scopes, I wonder if storage.modifier.final.php is right here?
    • Of note, it looks like abstract is also keyword.control.abstract.php
  • readonly (eg public readonly $var)
  • class (as in class Foo {})
    • TM: storage.type.class.php
    • PR: storage.type.TYPE.php
    • that TYPE looks iffy. Maybe it should be storage.type.class.php?
  • Opening and closing tags (eg <?php, <?= and ?>)
    • TM: punctuation.section.embedded.begin.php and ...end.php
    • PR: nothing

Classes scoped differently in different places

Take this example:

function (Aaa $x): Bbb { Ccc::x(); }

use Ddd;
use Eee\Fff;

class Ggg extends Hhh implements Jjj {}
tree-sitter screenshot

Screen Shot 2024-01-09 at 9 23 57 PM

text mate screenshot

Screen Shot 2024-01-09 at 9 24 13 PM

In TM:

  • Aaa, Bbb, Ccc, Ddd and Fff are all support.class.php
  • Eee is support.other.namespace.php
  • Ggg is entity.name.type.class.php
  • Hhh and Jjj are both entity.other.inherited-class.php

In this PR:

  • Aaa, Bbb are both storage.type.php
    • I think that these are right.
  • Ccc, Hhh and Jjj are all support.other.function.constructor.php
    • Is function.constructor right for these?
  • Ddd, Eee and Fff are all entity.name.type.namespace.php
    • I think that this is right, but I have to admit that I like how TM scopes allow me to only highlight the last segment (eg Fff but no Eee) ... what do you think about scoping Ddd and Fff as storage.type?
  • Ggg is entity.name.type.class.php
    • I think that this is right, too.

I'm also noting that the highlighting here in github doesn't really line up with either of these screenshots. 😆

Responses

opportunity to flex some muscle and enforce more scope consistency

This makes sense and I'm totally on board with this.

If the outcome is nonetheless catastrophic for your editor experience

No, not at all. And if it changes something, it think that just means that my theme needs a post-merge update. 😄

magic methods ... too lazy to do each one manually

I think how you handled this (adding them as a regex) makes sense. It's a shortish, well-defined list that doesn't change very often.

You say “as for usage (eg $foo->bar() or Foo::bar())…” and then state that the PR grammar scopes them as entity.name.function.method.php.

Oops. I had this wrong ... I think I copied from the wrong thing. The visual changes I'm seeing aren't a big deal; I'm not sure the TM highlighting is even right so much as it's just what I have to compare against. 😄 In my theme, here's a few examples of how these are a little all over the place in TM (w/ my theme):

  • names of magic methods are green in both declarations and calls, except __construct is purple in the declaration but green when called
  • Names of regular and static methods are blue in declarations, but green in calls
  • function calls ... depend on the function? It looks like PHP builtin functions are purple, while user space functions are blue
  • In the PR, it looks like things are much simpler: all names are blue in declarations, and purple in calls

Exception and Collection ... properly treated as a type

OK, I think this makes sense. Question though, why would Foo be scoped one way when used as an exception type, parameter type or return type, but then differently when used used as a static class? For example, in function (Foo $bar): Foo { Foo::bar(); }, the first 2 Foo are highlighted differently from the last. (They're both storage.type.php, while the latter is support.other.function.constructor.php.) Is this just related to how TM differentiates things like declarations vs usages?

Tags like @return will now be scoped as entity.name.tag like other sorts of “tags” — e.g., HTML tags. If that’s just a term you made up for the parser, and they're more commonly called something else in the wild, then I’m happy to revisit this and treat them more like keywords.

You had me scared for minute 😄, but no, that's the official name of them in phpdocumentor. (See docs or tag reference docs at phpdocumentor.) I don't have a strong opinion here; I can see why TM called them keywords , but I think that entity.name.tag makes sense based on your reference doc.

Thank you again for working on this!

@savetheclocktower
Copy link
Sponsor Contributor Author

Is function.constructor right for these?

No! Let's fix it.

  • ccc is now support.class.php
  • Hhh is now entity.other.inherited-class.php
  • Jjj is now entity.other.implemented-interface.php

I have to admit that I like how TM scopes allow me to only highlight the last segment (eg Fff but no Eee) ... what do you think about scoping Ddd and Fff as storage.type?

Ugh. I'm looking around for what we do for similar constructs in other languages and there is just no consistency or logic. You're right that Ddd and Fff should be distinguishable from Eee, so let's go with support.other.namespace.php for Eee just like the TM grammar.

Question though, why would Foo be scoped one way when used as an exception type, parameter type or return type, but then differently when used used as a static class? For example, in function (Foo $bar): Foo { Foo::bar(); }, the first 2 Foo are highlighted differently from the last. (They're both storage.type.php, while the latter is support.other.function.constructor.php.) Is this just related to how TM differentiates things like declarations vs usages?

Yes. I've never understood the argument that Foo should always be the same color in all contexts, though I do understand that that's how lots of people seem to intuitively approach syntax highlighting.

My feeling is that I want Foo to look one way in the context class Foo {} and another in the context new Foo();. When I'm scrolling through my editor looking for where function do_something is defined, it helps that it's a different color from an invocation of do_something.

Consider this block on which I’ve purposefully omitted syntax highlighting:

function Foo(Foo $foo): Foo {
}

At a glance it’s a bunch of words and some punctuation. Obviously they won’t all have the same name unless you’re a masochist, but the more hints you have to tell things apart, the better. The $ in front of foo helps with that. But if I’m trying to understand their meanings on that line without stopping to think about it, then the two type annotations should definitely look different from the function name.

If that’s not how some people approach highlighting, that’s fine; this approach doesn’t preclude other philosophies. A syntax theme can choose to scope entity.name.function and support.other.function identically — and most of the built-in themes do. (That’s how I was able to win the battle of separating definitions from invocations — many themes scope function as a certain color no matter where it occurs in the scope name.)

Scoping is about assigning meaning, and the aesthetics come later.

I would be happy to get some more detail onto the type annotation scope names if we can figure out how; since int and bool and whatnot are scoped as storage.type.builtin.php, maybe other things can be scoped as storage.type.other.php or storage.type.class.php or something.

@savetheclocktower
Copy link
Sponsor Contributor Author

Whoops, I missed the whole first section of your last comment. Looking again now.

@savetheclocktower
Copy link
Sponsor Contributor Author

If I'm starting to understand scopes, I wonder if storage.modifier.final.php is right here?

Quite so. Changing!

that TYPE looks iffy. Maybe it should be storage.type.class.php?

Yeah, I changed the syntax for interpolating the node type a long time ago and this was a weird holdover. Well spotted!

Opening and closing tags

Also addressed.

claytonrcarter added a commit to claytonrcarter/pulsar that referenced this pull request Jan 10, 2024
These cover some of the cases mentioned at
pulsar-edit#852 (comment)
@claytonrcarter
Copy link
Contributor

Oh, one more before I turn in: self is lacking scopes in new self(). new static() and things like self::foo() seem to be OK, though.

@savetheclocktower
Copy link
Sponsor Contributor Author

savetheclocktower commented Jan 10, 2024

PHP is bizarre.

Just ensured new self(), new static(), and new parent() (whether or not the latter is even valid) are highlighted identically.

claytonrcarter added a commit to claytonrcarter/pulsar that referenced this pull request Jan 10, 2024
These cover some of the cases mentioned at
pulsar-edit#852 (comment)
…when straddling injection boundaries.
Spotted it falling down on a very large JSON file I had. Seems to be fixed on `master`. Not sure which verison I originally built it from.
@savetheclocktower
Copy link
Sponsor Contributor Author

Just rebased after landing #855. I hope it didn't make anything explode.

@savetheclocktower
Copy link
Sponsor Contributor Author

I have placed two small symbols-view fixes into this PR because I am too exhausted to juggle branches at this late hour. I beg your forgiveness. As penance, I have written exhaustive specs for both changes.

@claytonrcarter
Copy link
Contributor

claytonrcarter commented Jan 12, 2024

Hi there, I think the recent rebase had a few casualties, at least in the PHP grammar: https://github.com/pulsar-edit/pulsar/compare/16a6b4b577366f5b1f09ba7698bebd53e30e7dfd..50bfa5141eb2cb95a35c8375b5e003fdf43ef854#diff-9d0f8cee91ca39fc6d6b6e98a21367194ae76b7df77aa5f7bd6b26f1cabd952b I wasn't following the rest of the changes in this PR, so I didn't check on them.

Also, I've been crash coursing myself through tree-sitter grammars and I have a PR that I will try to push soon that adds and fixes a few things. I'll target it at this one for your review.

@savetheclocktower
Copy link
Sponsor Contributor Author

Yup. Not sure how that happened. Let me try to apply those commits again.

@savetheclocktower
Copy link
Sponsor Contributor Author

Amazingly, I think it was just one commit that got skipped somehow. Thanks for catching it.

@claytonrcarter
Copy link
Contributor

Thank you again for all of your work on this, and for putting up w/ my constant nitpicking on behalf of PHP. 😆 FYI I opened savetheclocktower#2 w/ some updates to this. It's targeted at this branch.

@claytonrcarter
Copy link
Contributor

Another question on scope decisions: in namespaces, why is \ scoped as keyword.operator.namespace.php and not punctuation? It seems like it's just a separator, like a comma in an array. Or am I thinking about these wrong?

claytonrcarter added a commit to claytonrcarter/pulsar that referenced this pull request Jan 12, 2024
These cover some of the cases mentioned at
pulsar-edit#852 (comment)
@savetheclocktower
Copy link
Sponsor Contributor Author

Another question on scope decisions: in namespaces, why is \ scoped as keyword.operator.namespace.php and not punctuation? It seems like it's just a separator, like a comma in an array. Or am I thinking about these wrong?

No, it's just a difference of opinion. You'll just as often see stuff like that scoped as punctuation in the wild.

I was on the fence about this until my experience with the JavaScript highlights.scm. In JavaScript, the . in foo.bar is technically an operator, though it's arguable that you don't want it to show up as prominently in your editor as operators like + or ===. But the optional chaining operator?. — is definitely an operator, and you don't want to overlook it on a line of code. So if one of them is an operator, so should the other be for consistency.

Likewise, in PHP, once :: and -> are scoped as operators, it's harder (in my opinion) to justify scoping \ differently.

Copy link
Contributor

@claytonrcarter claytonrcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple very minor updates to the PHP grammar, then I promise I'll be quiet about this so you can land it. 😄

"final"
"implements"
"namespace"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] is this because namespace controls how we address the class in usage (eg by changing its namespace)? I would have pegged it more as a keyword (which is how TM scopes it).

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace declares an entity. Just like class or function. If those are storage, so is namespace.

[
"&&"
"||"
"??"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is correct. ?? isn't a logical operator like && and ||. I suggest moving it to keyword.operator.comparison.php. [docs]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you agree, then it looks like <> and <=> could also be added as comparison operators: https://www.php.net/manual/en/language.operators.comparison.php

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If || is a logical operator, so is ??. The former will return the left-hand side unless it's falsy. The latter returns the left-hand side unless it's null.

<> and <=>, on the other hand, are comparison operators, so I'll add them as such.

Copy link
Contributor

@claytonrcarter claytonrcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, as far as I can tell! I have several small items for improvement, but I don't know that they are critical.

I didn't look at the symbols-view changes very hard, because I'm only familiar with the tree-sitter stuff, and new to it at that. The indent changes seemed to make sense from a read through, but I didn't test them specifically. (Except that I've been driving with this branch this week, and it's been good, and better every day.)

Suggestion: the Release Notes should may also mention that many other languages got new TODO and url highlighting.

packages/language-css/lib/main.js Show resolved Hide resolved
@@ -0,0 +1 @@
; placeholder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] In the new php grammar, empty.scm, folds.scm and highlights-html.scm are all empty. Can they be removed instead of left empty? Or is there a strategic reason to have them empty?

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, shit, folds.scm shouldn't be empty. I just forgot about it.

highlights-html.scm doesn't need to exist, and empty.scm is empty on purpose, as the name should hint at :)

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to answer why we need empty.scm: I thought I'd made it so that no queries were mandatory, but Pulsar complained when I tried to omit highlightsQuery, so I'm doing an empty one for now and I'll fix it later.

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added folds queries for everything I could think of. All classes, functions, control loops, conditionals, and enums should be foldable now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to answer why we need empty.scm: I thought I'd made it so that no queries were mandatory, but Pulsar complained when I tried to omit highlightsQuery, so I'm doing an empty one for now and I'll fix it later.

Might I suggest that we add this as a TODO, just in case someone is searching for TODOs in the project, they can see that it should be removed eventually? Might help future-you

Comment on lines +8 to +11
(tag_name) @entity.name.tag.phpdoc.php
(named_type) @storage.type.instance.phpdoc.php
(variable_name) @variable.other.phpdoc.php
(uri) @markup.underline.link.phpdoc.php
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience so far, the phpdoc highlights have been working really well as is!

Copy link
Contributor

@claytonrcarter claytonrcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, for what that's worth, and with the previously stated caveats re symbols view etc. Thank you!

Copy link
Member

@Spiker985 Spiker985 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't have any code that I can test any of these changes with, however, glancing through the changes there were not any obvious errors.

Most of the changes seem to be the additional of TODO and url highlighting, as well as Tree-Sitter scope standardization

I had a single question which should be posted in a comment after this review is posted.

From a theoretical reading, this change seems fine and the package tests succeeded as well.

@savetheclocktower savetheclocktower merged commit 7dd52e5 into pulsar-edit:master Jan 14, 2024
103 checks passed
@confused-Techie confused-Techie mentioned this pull request Apr 18, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants