Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds limit parameter to ansible.builtin.find #83153

Open
wants to merge 14 commits into
base: devel
Choose a base branch
from

Conversation

colin-nolan
Copy link
Contributor

@colin-nolan colin-nolan commented Apr 28, 2024

SUMMARY

ansible.builtin.find currently looks for matches in all non-filtered files. When there is a very big file system in the search scope, and/or an expensive search is made (e.g. based off whole file contents), the operation can be very expensive.

This PR adds the ability for the user to limit the number of matches that are made. Use cases that would benefit/be enabled with this functionality include:

  • Determining if there are > n matching files in a directory.
  • Finding the first directory in a nested tree with any files.
  • Discovering where the file with matching content is (stopping once found!).
ISSUE TYPE
  • Feature Pull Request
ADDITIONAL INFORMATION

Try the module using:

# Can't run module in place due to import issues caused by other modules in directory
cp lib/ansible/modules/find.py /tmp/find.py && python /tmp/find.py <<EOF
{
    "ANSIBLE_MODULE_ARGS": {
        "paths": "/var/log",
        "file_type": "file",
        "contains": "wally",
        "read_whole_file": true,
        "patterns": "^.*\\\.log$",
        "use_regex": true,
        "recurse": true,
        "limit": 1
    }
}
EOF

@ansibot ansibot added feature This issue/PR relates to a feature request. needs_triage Needs a first human triage before being processed. module This issue/PR relates to a module. labels Apr 28, 2024
@mkrizek mkrizek removed the needs_triage Needs a first human triage before being processed. label Apr 30, 2024
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label May 7, 2024
@colin-nolan colin-nolan closed this May 8, 2024
@colin-nolan colin-nolan reopened this May 8, 2024
@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label May 8, 2024
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label May 22, 2024
@colin-nolan colin-nolan reopened this May 30, 2024
@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label May 30, 2024
@colin-nolan
Copy link
Contributor Author

I appreciate you likely have more important work to deal with but tagging module author @bcoca, as it feels like ansibot should have.

@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jun 9, 2024
Copy link
Member

@bcoca bcoca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure this feature is needed as it is a simple thing to cut down the return of files to only those you need returned_files[:5].

Though this can be a performance improvement, it would only be really noticeable with huge amounts of files.

- Set to V(-1) for unlimited matches.
- Default is unlimited matches.
type: int
default: -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use None for 'unset/unlimited'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change. Am I correct in thinking that the type should still be int, given there's no Optional[int] type (https://docs.ansible.com/ansible/latest/dev_guide/developing_program_flow_modules.html#argument-spec)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed. I'm not sure what the correct value to set as default in the docs should be now that it's None (see test failure). Any help would be appreciated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just remove it, None is the default for default. All types allow None as a default as it is 'unset'.

@@ -154,6 +154,15 @@
- When doing a C(contains) search, determine the encoding of the files to be searched.
type: str
version_added: "2.17"
max_matches:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rename limit or 'maximum` as this does not relate to a 'match' field, but to all the conditions the other options can set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-named to limit.

@@ -154,6 +154,15 @@
- When doing a C(contains) search, determine the encoding of the files to be searched.
type: str
version_added: "2.17"
max_matches:
description:
- Set the maximum number of matches to make before returning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more like:
The maximum number of matching paths returned, after finding this many the find action will stop looking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -530,7 +553,8 @@ def handle_walk_errors(e):
if not os.path.isdir(npath):
raise Exception("'%s' is not a directory" % to_native(npath))

for root, dirs, files in os.walk(npath, onerror=handle_walk_errors, followlinks=params['follow']):
# Setting `topdown=True` to explicitly guarantee matches are made from the shallowest directory first
for root, dirs, files in os.walk(npath, onerror=handle_walk_errors, followlinks=params['follow'], topdown=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes the normal search by adding topdown, this is not really backwards compatible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would even add this as an option before i limit matching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topdown is True by default (https://docs.python.org/3/library/os.html#os.walk) so it should not change the current behaviour. The intention was to make it explicit that the module is guaranteed to be topdown - perhaps the docs is a better place for this though?

I'm not sure how many people will have a use-case where they'd want to switch between topdown/bottomup. However, I'm happy to add it if you think it could be useful.

@ansibot ansibot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Jun 10, 2024
@ansibot ansibot added the stale_review Updates were made after the last review and the last review is more than 7 days old. label Jun 10, 2024
@colin-nolan colin-nolan changed the title Adds max_matches parameter to ansible.builtin.find Adds limit parameter to ansible.builtin.find Jun 10, 2024
@ansibot ansibot added ci_verified Changes made in this PR are causing tests to fail. and removed stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Jun 10, 2024
@ansibot
Copy link
Contributor

ansibot commented Jun 10, 2024

The test ansible-test sanity --test validate-modules [explain] failed with 2 errors:

lib/ansible/modules/find.py:0:0: doc-default-incompatible-type: Argument 'limit' in documentation defines default as ('None') but this is incompatible with parameter type 'int'
lib/ansible/modules/find.py:0:0: incompatible-default-type: DOCUMENTATION.options.limit: Argument defines default as ('None') but this is incompatible with parameter type int: invalid literal for int() with base 10: 'None' for dictionary value @ data['options']['limit']. Got {'description': ['Limit the maximum number of matching paths returned. After finding this many, the find action will stop looking.', 'Matches are made from the top, down (i.e. shallowest directory first).', 'Set to V(None) for unlimited matches.', 'Default is unlimited matches.'], 'type': 'int', 'default': 'None', 'version_added': '2.18', 'version_added_collection': 'ansible.builtin'}

click here for bot help

@colin-nolan
Copy link
Contributor Author

@bcoca many thanks for reviewing. I have addressed most of your comments. Just two things left to resolve:

  1. How to document a default of None, where None is not of type int (currently failing a check).
  2. Whether you think an option to change between topdown/bottom up is required. I'd personally suggest putting it out of scope, as it's difficult to think of many use-cases where it would be useful.

I'm not even sure this feature is needed as it is a simple thing to cut down the return of files to only those you need returned_files[:5].

Though this can be a performance improvement, it would only be really noticeable with huge amounts of files.

This is the exact situation that I have - lot's of files, some of them very large. The module isn't really viable for my use-case without this option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci_verified Changes made in this PR are causing tests to fail. feature This issue/PR relates to a feature request. module This issue/PR relates to a module. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. stale_review Updates were made after the last review and the last review is more than 7 days old.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants