Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow '.' in symbol tokens #3797

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

ahornace
Copy link
Contributor

@ahornace ahornace commented Nov 25, 2021

Suggester suggests OpenGrok's package names as single tokens for search; however, the search will split the text by . into multiple tokens and the final result is empty.

Example:
Screen Shot 2021-11-25 at 17 02 09

http://demo.opengrok.org/search?project=OpenGrok&full=&defs=org.opengrok.indexer.history&refs=&path=&hist=&type=&xrd=&nn=1

This change fixes the problem as now the search is successful:
Screen Shot 2021-11-25 at 17 04 55

However, the xref is still wrong, e.g.: http://demo.opengrok.org/xref/OpenGrok/opengrok-indexer/src/main/java/org/opengrok/indexer/history/PerforceHistoryParser.java?r=51e20d51#26

I'm not sure if this can have an adverse effect on other things and if we should rather try to split the defs terms by . -> but then they do not represent the whole definition (in this instance the package).

@vladak, what do you think?

@vladak
Copy link
Member

vladak commented Nov 26, 2021

Makes me wonder what will happen with e.g. C structure members. I.e. if you have a structure:

struct mystruct {
    int foo;
    char bar;
};

and it gets used in the code like this:

struct mystruct my;
my.foo += my.bar;

will it be still possible to find references of foo and bar no matter what the variable name is ?

@tarzanek
Copy link
Contributor

tarzanek commented Dec 13, 2021

workaround is of course distance search
so look for "org opengrok indexer" not further than 3 away

xref has its own analyzer, hence it might need changes there too

@tarzanek
Copy link
Contributor

if I'd have to vote, I'd vote against it ...
what would happen when you would have a sentence ending with "." and without space after it? (typo, I know)
would it still be able to search for parts?
I expect not (which answers Vlads Q above), since you'd have to use org.* to find the word, which would now be a full token
suggester however might show it as a partial hit, but full search will depend on "*"

@tarzanek
Copy link
Contributor

OR the other option is to allow it only for certain languages that use such notation ... such as java ...
but I am not sure if there is any other language ... likely not

@ahornace
Copy link
Contributor Author

workaround is of course distance search
so look for "org opengrok indexer" not further than 3 away

Unfortunately, no. As the whole term is stored, then the proposed workaround won't work. We can try to run the definitions found by ctags through the tokenizer again but that brings more problems and it's not completely correct.

@ahornace
Copy link
Contributor Author

The idea here was that the C analyzer will take care of things as my.foo += my.bar; and split it accordingly into multiple terms separated by . (the . would stored as a token as well).

The purpose of this PR was that . would be treated fairly when doing a search as now it's completely discarded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants