Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [python-package] add arm64 macOS wheels #6391

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Mar 29, 2024

Contributes to #5328
Contributes to #4229
Might help with #5764

  • adds arm64 macOS wheels for the Python package (for the M1/M2/M3 Macs)
  • patches OpenMP linking details in lib_lightgbm.dylib so that a pip-installed lightgbm won't segfault when used in a conda environment

Notes for Reviewers

Proposed approach for building arm64 wheels

It seems that Azure DevOps does not support arm64 macOS runners... only GitHub Actions: actions/runner-images#8971.

So I'm proposing using GitHub Actions new macos-14 runners, which are arm64 (#5328 (comment)).

Proposed approach for fixing loading errors

The problem "using a pip-installed lightgbm inside a conda environment segfaults" is not specific to arm64. It's been around in lightgbm's macOS support for a few years. I decided to try to fix it here to be able to test the new arm64 wheels.

I'm proposing the following approach:

  • continue dynamically linking to OpenMP
  • do not vendor a copy of libomp
  • patch the search paths in lib_lightgbm.dylib to be more forgiving about where libomp.dylib is found

In short:

  • the hard-coded path like /opt/homebrew/opt/libomp/lib/libomp.dylib currently stored in lib_lightgb.dylib forces the macOS loader to load a library from that exact path, even if there is already a libomp.dylib loaded from some other source (like the llvm-openmp shipped y conda)
  • this PR replaces that with @rpath/libomp.dylib, which means that at runtime of lightgbm, the loader will do the following (in order)
    1. if there is a libomp.dylib already loaded, just use that
    2. if not, first search for one in /opt/homebrew/opt/libomp/lib/
    3. if there isn't one there, search wherever one was found at build time (which might also be /opt/homebrew/opt/libomp/lib/)
    4. if one still hasn't been found, search standard paths like /usr/local/lib and /usr/lib

See "References" section for the MANY resources I consulted to get to this point. Particularly:

How I tested this

downloaded the wheel built in CI (click me)

Got a run ID from CI, looking at the URL.

e.g. for this build: https://github.com/microsoft/LightGBM/actions/runs/9476495757/job/26109485388?pr=6391

the run id is 9476495757.

Then, following https://docs.github.com/en/actions/managing-workflow-runs/downloading-workflow-artifacts?tool=cli, used the GitHub CLI to download the wheels. A fine-grained access token with just read-only permission on Actions should be enough.

From the root of the repo, on my M2 Mac laptop:

rm -rf ./whl-tmp
mkdir -p ./whl-tmp

RUN_ID=9476495757
gh run download \
    --repo microsoft/LightGBM \
    "${RUN_ID}" \
    -n macosx-arm64-wheel \
    --dir ./whl-tmp
created a new conda environment (click me)

Created a conda environment.

conda create \
  --name delete-me-lgb-dev \
  --yes \
  --file ./.ci/conda-envs/ci-core.txt \
  python=3.10 \
  pydistcheck

source activate delete-me-lgb-dev
Inspected the wheel with `pydistcheck` (click me)
pydistcheck \
    --inspect ./whl-tmp/lightgbm-4.3.0.99-py3-none-macosx_14_0_arm64.whl
==================== running pydistcheck ====================                                                                                                  
                      
checking './whl-tmp/lightgbm-4.3.0.99-py3-none-macosx_14_0_arm64.whl'
----- package inspection summary -----
file size
  * compressed size: 1.5M
  * uncompressed size: 6.5M
  * compression space saving: 76.7%
contents
  * directories: 0
  * files: 16 (1 compiled)
size by extension
  * .dylib - 6.0M (93.2%)
  * .py - 0.4M (6.4%)
  * no-extension - 21.7K (0.3%)
  * .txt - 9.0B (0.0%)
  * .typed - 0.0B (0.0%)
largest files
  * (6.0M) lightgbm/lib/lib_lightgbm.dylib
  * (0.2M) lightgbm/basic.py
  * (65.4K) lightgbm/dask.py
  * (63.6K) lightgbm/sklearn.py
  * (35.7K) lightgbm/engine.py
------------ check results -----------
errors found while checking: 0

Installed from the wheel (on my M2 mac, using macOS 14.4.1).

pip install \
    --no-deps \
    ./whl-tmp/lightgbm-4.3.0.99-py3-none-macosx_14_0_arm64.whl

Ran the tests

pytest tests/python_package_test

Those all passed! Removed the brew-installed libomp on my system and ran that again.

brew uninstall libomp
pytest tests/python_package_test

They passed again 🎉

Can lightgbm really safely link to any libomp.dylib?

This project's macOS wheels have already been doing that for years, by shipping with an absolute path that evaluates to "whatever brew install libomp installs", and no limits comparing the version linked against at build time to the one used at runtime.

The changes in this PR are actually safer than the existing behavior of lightgbm's macOS wheels because they avoid a well-known source of runtime issue (loading a libomp.dylib that conflicts with the one loaded by scikit-learn).

Why not just vendor libomp.dylib like other libraries do?

Other projects like lightgbm bundle a copy of LLVM OpenMP (libomp.dylib) in their wheels.

That comes with a significantly reduced risk of "cannot find libomp.dylib" types of runtime issues, but in exchange for:

  • risk of conflicts caused by multiple copies of the same library being loaded
  • larger wheels (and therefore environments)
  • lack of coordination across libraries (e.g. one's omp_set_num_threads() does not affect all the loaded OpenMP libraries, and so projects' parallelism can fight each other and lead to oversubscription)

Since lightgbm has already been using dynamic linking + not vendoring for several years and since "you have to brew install libomp" has generally not been a problem for macOS users, I'm proposing continuing those practices in this release.

Will these CMake changes break the lightgbm conda package?

They shouldn't.

conda-build rewrites paths in binaries it produces to correctly link between its packages. See "Making packages relocatable" (conda docs)

References

@jameslamb jameslamb mentioned this pull request May 1, 2024
33 tasks
@jameslamb jameslamb changed the title WIP: [ci] [python-package] add arm64 macOS wheels [ci] [python-package] add arm64 macOS wheels Jun 12, 2024
@jameslamb jameslamb marked this pull request as ready for review June 12, 2024 05:05
@jameslamb
Copy link
Collaborator Author

tagging some others who I think might be interested in this approach / topic:

@hcho3 @trivialfis @thomasjpfan

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants