-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speculative: --only-binary by default? #9140
Comments
We could attempt to make the default more intelligent (or maybe just more magical). Basically have the implicit default be that if a wheel is found at all for some project, that project defaults to only allowing wheels.
…Sent from my iPhone
On Nov 16, 2020, at 12:40 PM, Paul Moore ***@***.***> wrote:
What's the problem this feature will solve?
A lot of users are reporting issues when there's no Python 3.9 binary for projects they need, and pip tries to build from source and fails with an obscure error (because the user doesn't have a compiler, or isn't set up to build the relevant packages).
Describe the solution you'd like
Pip shouldn't try to build from source if the user isn't prepared to deal with build errors. As it's not possible to know the user's level of expertise, we should err on the side of caution, and by default only allow wheels to be installed. Users who know they need to install from source and have checked that they can do so, can explicitly say so using a new --allow-source flag, which acts as an "opt-in" to source builds.
Alternative Solutions
Improve the error messages when a source build fails. This is hard, because the details of what went wrong are entirely the responsibility of the build backend.
Additional context
I don't realistically think this can be added without a lot of disruption, but given that significant numbers of projects ship wheels these days, maybe it isn't as unthinkable as it once was. I do think it's worth discussing the implications, if only as a thought experiment, and I don't know where else we could do that apart from here.
One big problem area is that we can't distinguish between "pure Python" projects that are shipped only as sdists, but which only need Python to build, and complex projects that need a compiler. So restricting to wheels only would require an explicit opt-in for some projects which currently install with no issue.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
But we can’t know what “all projects” means before deciding whether to set the flag, since dependency information is inside the sdist/wheel 🙃 |
@uranusjr I'm suggesting making |
Oops, my previous response was toward @dstufft’s “intelligent” suggestion. Sorry for the confusion. To express my thoughts in more words, I think the “only wheel unless some project needs to compile from source” would be very difficult to implement since the two parts in the logic depend on each other. I would much prefer @pfmoore’s original suggestion of having |
The logic isn’t hard and has nothing to do with dependency information.
Current logic is roughly:
1. Fetch a list of links from the index for project X.
2. Filter said list of links using the value of —only-binary (among other things like platform tag).
3. Return list of links for use in the dep solver.
The proposed change only slightly changes the logic in step 2 slightly, such that unless the user has explicitly configured only-binary, we will set the value of it implicitly by inspecting the entire list of links we’ve discovered for project X, and determining if there is a wheel file or not.
This is simple, and would prevent the breakage that Paul is currently seeing, projects which generally make wheels available, but that haven’t for this version of Python / OS / Whatever.
It wouldn’t change anything for projects which don’t ship wheels at all, some of which will be pure Python, some of which will be compiled code, but in any case there’s no “upgrade to Python 3.9 and suddenly start compiling code” problem for these projects since they are consistent in what they require.
The biggest issue with this that I see is in the effort of being smarter about our default to not break certain kinds of projects, we make it easier for projects to accidentally break their users. If my project historically did not upload wheels, and then I start uploading wheels with version 3.1, all previous versions suddenly stop working without opting in to some flag. This is done without any obvious change by the user (upgrading versions of pip is an obvious change, but some thing I install starting to upload wheels is not).
We could work around that problem by trying to reduce the blast radius of the implicit “wheels only” setting, by saying that we will only filter out non wheel links by default that are of the same version of a wheel we’ve found. Thus if we find an sdist for 1.0, 2.0, 3.0, and 3.1 and we find a wheel for 3.1, when we filter the list of links, we will filter it so it has the sdists for 1.0, 2.0, and 3.0 and the wheel for 3.1.
This makes it so that as soon as you upload a wheel for a given version, you’re effectively signaling that not only should a wheel version be preferable, but that the sdist should only be used if explicitly configured to by the user.
…Sent from my iPhone
On Nov 17, 2020, at 4:48 AM, Tzu-ping Chung ***@***.***> wrote:
Oops, my previous response was toward @dstufft’s “intelligent” suggestion. Sorry for the confusion.
To express my thoughts in more words, I think the “only wheel unless some project needs to compile from source” would be very difficult to implement since the two parts in the logic depend on each other. I would much prefer @pfmoore’s original suggestion of having --only-binary :all: unless the user explicitly allows source distributions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Maybe we simply make |
I like it? I think something like 98% of packages on PyPI have wheels in the latest release, so I don't think this is catastrophically bad.
IMO one of the improvements we should make here is adding a sentence like: "This failure occurred while trying to generate [a wheel / metadata] for packageName. This is not an error in pip." This also applies to the proposed approach here too -- clearer error messaging would be good. :) |
I suspect the number would be significantly lower if you count percentage of downloads instead. There are a bunch of popular pure-Python projects that don’t bother with wheels because the effect is minimal. |
I'm pretty sure that's a figure I gave you, and I found the bug in my calculation a bit later 🙁 I need to re-do the sums, but I think it's a lot lower than that, unfortunately.
The number's a lot lower without doing the sums incorrectly 🙂 Sorry about that. I don't have download information, but I'm re-doing the numbers right now, and I'll see what things look like if you factor in "uploaded a file in the last 12 months" as well. I might try getting download numbers from the BigQuery data for offline analysis. Downloads per project, per year (month?) might be sufficiently interesting, if I can work out how to get that relatively easily in a CSV format or similar. To confirm, my query has just completed. Comparing "number of projects that distribute sdists but no wheels for their latest version", vs "number of projects that distribute wheels for their latest version", the numbers are almost identical (124508 vs 124782). Looking at projects which have released at least one file in the last year, the values are 32890 and 66635. So half of all projects, 2/3 of projects active in the last year, have wheels. As I say, I think that however we did this, it would result in a lot of breakage. |
It's a backwards incompatible change, so regardless it's going to break someone. The goal behind my proposal is to limit the blast radius, so that we limit the breakage, either to specific projects, or to specific versions within a project. I think there's two questions here too:
I'm not sure about the long term "right" answer. I can see an argument that we want to encourage wheels where possible.. but I also think that there are some projects that simply cannot be shipped as wheels, and maybe will never be able to be shipped as wheels. We need to figure out if going wheel only by default will end up being worth it, or if we will push too many projects out of viability. For the second one, I think having the default by to filter out sdists, for any project version that has any wheels uploaded, solves the main driver to this proposal, without breaking projects that are not shipping wheels (or used to ship wheels, but found out that was problematic). That could be useful as a stepping stone for getting to a wheel only default (for instance, we could provide warning when installing from sdist then), or it could be a reasonable end state that solves the surprising accidental sdist install, without dropping support for sdist only projects by default. |
I think we could do a lot better if we could somehow identify which projects are "hard" to build from source. I feel like blocking sdists that build into universal wheels is going a bit far. In the most general sense, that's basically impossible, but maybe we could add metadata somewhere (in the simple index?) to mark "pure Python" projects? I agree it's not clear what the best long term answer is. We're seeing a lot more people using Python nowadays who honestly don't want to, or know how to, deal with building stuff from source. For those people, pip downloading a sdist that needs a compiler to build is almost certainly just a source of problems. But they are also precisely the sorts of user who won't know enough to add |
I wonder if we can leverage PyPI in some way to encourage wheels, or to at least surface better information to highlight which projects don't ship wheels? This might be a better question for discourse? I dunno. |
I've got a big chunk of downloaded data from PyPI that I am querying to get a better feel for this sort of stuff. The biggest problem is the vast amount of (to be polite) "limited value projects" on there - without some form of insight, it's hard to know for sure whether it's OK to ignore a project called "0html" or "django-3-jet-zupit" - especially when it comes up in the same query as "090807040506030201testpip"... |
What if PyPI automatically builds the simplest pure Python wheels? There’s recent interest to detect malicious source distributions on PyPI, and the wheel it would produce as the side effect should be able to be reused. |
Any more thoughts here? I especially like the idea
The metadata option also seems reasonable, then the scientific python community could mark NumPy, Scipy, tensorflow, pytorch as "prefer binary by default" and save a lot of CI and cloud resources. |
I like the idea as well, maybe with a twist: Versions with only sdist are excluded, unless there are no wheels available at all prior to that version. Use django-grappelli as an example, this means that
|
For another data point here is an issue filed by a python3.5 user of cffi where they cannot build with the sdist, and changing the default would have helped them. |
Please edit the title binary-only -> only-binary. I always have to check |
FWIW, that tells me that we should add an alias for that option. |
+1 for a solution via either package metadata or via a simple rule like " Otherwise it has the risk of becoming a
This does not seem like a good idea. Not only is it harder to understand, it also partially defeats the purpose here. If a package has a very old source-only release (e.g., from the pre-wheels era) then that will be will be found the moment there's no suitable wheel for a user. In your particular example, |
Makes sense. I think it's quite difficult to gauge the actual impact here, since people here all care much about Python packaging (for apparent reasons) and likely push for wheels in projects we are involved. So I feel the only way to go forward is to actually try to implement this (maybe as a There are probably still some implementation details we need to sort out. Should we go with |
I would propose an alternate path forward. Rather than changing the default behavior of As has been mentioned above, pip currently mixes two different things (building and installing from source and installing from pre-built binaries) and I think it is a mistake to tilt pip even more in favor of being a binary-only package manager. By adding a new CLI entry point it is possible to make what ever changes are needed to make pip behave like a binary package manager without having to worry about breaking an existing users. I think another issue here is a disagreement as to what exactly wheels are for. I have always considered (and I may be the only one to hold this position) the sdist the canonical source of truth for what the released version of the package is on pypi with the wheels are provided for the convenience of the user (the linux wheel spec is "manylinux" which suggests it is a best-effort rather than authoritative artifact!). I think making pip more-binary package-manager like by default will only re-enforce the expectation that projects will (promptly) provide a wheel for your platform / Python version / Python implementation and one not existing is a "bug". There was a discussion on the numpy mailing list about the ever expanding number of platforms that projects are expected provide wheels for becoming un-sustainable (the latest beta-release of Matplotlib has 21 wheels and we are not yet covering the full Python version/Python implementation/arch/OS matrix https://pypi.org/manage/project/matplotlib/release/3.5.0b1/). If pip is going keep going down the path of binary packaging, I think there needs to more discussions about how filling out the build matrix can be lifted from the projects to some centralized build service like the homebrew, conda-forge, and the Linux distributions do already. Separating the wheels into their own channel/management chain would also make it easier to manage things like updating version pinning on the wheels post-facto (e.g. putting an upper bound on something or banning known-bad version combinations), re-building with updated versions of non-Python dependencies (xref h5py/h5py#1942), or dealing with CVEs much easier. |
How can we make the abstract discussion here more concrete? I see a couple of subjects being mixed together
I apologize if I missed some of the topics here, please feel free to add to the table. The next question is who will do the work ... |
I’m dropping a link to the RFC proposing to disable install scripts by default for NPM, which would have roughly the same effect as making |
Previously @rgommers was looking into getting funding for this. Is that still in progress, or did it end up not getting anywhere? Regardless, I see no problem with having two attempts to get funding under way 🙂 |
FWIW, I just realised that we have a clear migratory mechanism, for allowing people to build wheels intentionally: With that, the migration in broad strokes would look something like:
|
I see that I failed to reply to this in August, apologies. Thanks for asking @pfmoore. This topic is still on my radar and of high interest. Regarding funding: I did not manage to get it externally funded, however I did/do plan on self-funding it from my team's budget (assuming the plan I outlined seemed reasonable, and there's good confidence we can execute). A response on that effort and budget estimate would still be great (I'll ping everyone). This year we did invest a significant amount of effort compared to the year before on packaging topics. This particular one took a back seat to some other ones that were higher-effort than expected, in particular:
I'm really looking forward to those two things being sorted out completely (I think by Q2 2023).
Agreed. |
That seems very reasonable to me.
This is very unintuitive UX by the way, I've been misled by it multiple times. The |
It's mostly there to let you do |
FWIW, I think we got a decent amount forward with #10795 on the error messaging front (that was mentioned+discussed as an alternative to doing this). I do want to eventually setup something akin to https://sphinx-theme-builder.readthedocs.io/en/latest/errors/ within pip's documentation; but we're talking longer-term goals for #10421. :) |
As an outsider lurker, I think making When installing from sdists, pip could print a big fat warning like
or
with the wording subject to reflection of course, but you get the idea. |
Some more thoughts. I think pip should try to be much more informative in its error messages than it is currently. Granted, building the package is under the sole responsibility of the build backend. However, 95% of problematic cases should be covered by a few simple heuristics.
Current message:
Wished warning:
Because if there are wheels specific to some Python versions but no cross-Python (
Current message:
Wished warning:
Overall, I have to say that pip's current error messages are sometimes quite uninformative; I would recommend improving them first and seeing if the current problem of zillions of people reporting that pip doesn't work persists. Yes, it's true that many people won't read them — but many will, too. |
I don't disagree with you, and it would be nice if this could be done. But I hope you don't think that we haven't been trying to improve things here. If anyone has any good ideas on how to improve things, we'd love to work with them on this. But the important point is thinking about how improved messages can be implemented - it's often easy to think "it would be nice if pip could tell me XYZ" but when you look at the code, it becomes impossible to even see how pip can ever know that XYZ is the case. To give a specific example, I have no idea how we could usefully deliver the warnings you suggest. Information on what wheels are available is only available in the finder, and when the finder is called, we have no assurance that we'll ever do a build of that package. Furthermore, if we do select a source-only candidate, there's still no certainty that we'll do a build - we'll call the backend hook to prepare the metadata as part of the resolution process, but the backend might very well not do a build at that point, it might be able to calculate metadata without needing to do a build (for example, setuptools has the All of which is to say that your suggestions are really useful feedback, and match a lot of other suggestions we've seen (including a number we received from the user interface work that was funded a few years back). But unless someone comes along to explain how we can implement these ideas in practice, they will never become anything more than "nice to have" suggestions, I'm afraid. |
Just found this discussion whilst looking for options that help protect me and the inexperienced users I regularly have contact with against the apparently growing trend of malicious packages that act through arbitrary code execution on package install (I wrote about that here referencing the 2022 W4SP stealer amongst other resources). Am I right in thinking the A carefully worded opt-in flag that can be set as an environment variable would seem to give users who decide it's worth the risk a minimal-fuss option to continue as before (maybe |
Using Making it a default is generally considered to be a good idea, as well, but we need to plan the transition, and that's what this is stalled on. Breaking every package that hasn't uploaded wheels isn't really a good move... |
To be clear, IDK if this was just a typo, or a misunderstanding. It may be surprising if you're used to thinking of 'binary' meaning executable, as in the |
I think the confusion comes from thinking about install time versus run time. Both can run arbitary code at run time (when the package is imported). But at install time with binary you only need to download it, you don't need to execute it, because it's already been pre-built (which could include compiled executables). Whereas non-binary is source code and you need to build it which can involve running arbitary code. |
Btw, as real world evidence, in the now defunct rip project they started off with only supporting wheels (binary) and it made lots of resolutions impossible, and I wasn't able to test it against any real world work project I had until it started to support sdists. |
@takluyver you're right, it was a typo on my part - I did mean |
I think there's another approach here that's more straightforward to implement:
The inspiration here is there are a few projects that already do not provide sdists at all because the build process is so difficult that they don't want users to try. That's the current workaround for this issue, and while it's deeply unfortunate that sdists don't exist, the nice thing about that workaround is that it's opt-in on a per-project basis. This proposal merely formalizes that workaround and makes the sdists available if you really want them. Existing projects that provide only sdists, or that do not reliably provide wheels, would continue to upload normal sdists, and they would be resolved like normal. No changes to build tools are strictly required (as with the proposals to add metadata); all you need to do is rename the file. We certainly could make this nicer but it's not necessary. This is backwards-compatible with existing versions of pip (and other installers): for projects that provide discourage sdists, they will just not recognize the discouraged sdists as a file type they can use and they'll gracefully degrade to same behavior as if sdists are not uploaded at all. And for projects that don't, the current behavior (including attempting to build normal sdists) is correct. (We can probably come up with a better name than "discouraged" ... "manual"? "intentional"?) Does this seem like a reasonable approach? If so I can suggest it on the forums, since it's mostly not a pip change. (Thanks @zooba for mentioning this issue to me in another forum thread.) |
Doing that would require a new standard, and hence a PEP. It's not something pip (or PyPI) would adopt unless it was standardised. Personally, I don't think it's something we should try to standardise, but if you want to, then feel free to develop a PEP for it. I'd rather we simply made |
I think the only unavoidable complication is how to handle packages that have an sdist but no wheels at all. In that case, I'd prefer the default to be to use the sdist, but that's a complication (and if I specify |
Maybe it needs a |
Which is what already happens, so it's slightly stronger than that in that the presence of any binary options at all prevents choosing the source option. I don't have a problem with the naming, just that the obvious interpretation isn't what we need here. |
Well, from the discussion on this thread, that is quite complicated itself, right? I think the least complicated proposal is Did I miss something in the discussion above that makes this change easy to implement? I'm happy to write the code if so, I've got a bit of free time and I care about this problem, but I didn't see any designs that looked uncomplicated - but I did see comments about fundraising and a careful rollout process with community outreach. :) My proposal does not change any existing behavior, and is therefore safe to roll out immediately without any coordination.
Yes, agreed this is a PEP and not just a pip issue. I started a discussion here: https://discuss.python.org/t/preventing-unwanted-attempts-to-build-sdists/54169 |
Yes. I just think it's less complicated than adding a new type of sdist into the mix. |
I will say: I think using a rollout coupled with a specific unreleased-at-start-time Python version will be a better mechanism to do this. |
What's the problem this feature will solve?
A lot of users are reporting issues when there's no Python 3.9 binary for projects they need, and pip tries to build from source and fails with an obscure error (because the user doesn't have a compiler, or isn't set up to build the relevant packages).
Describe the solution you'd like
Pip shouldn't try to build from source if the user isn't prepared to deal with build errors. As it's not possible to know the user's level of expertise, we should err on the side of caution, and by default only allow wheels to be installed. Users who know they need to install from source and have checked that they can do so, can explicitly say so using a new
--allow-source
flag, which acts as an "opt-in" to source builds.Alternative Solutions
Improve the error messages when a source build fails. This is hard, because the details of what went wrong are entirely the responsibility of the build backend.
Additional context
I don't realistically think this can be added without a lot of disruption, but given that significant numbers of projects ship wheels these days, maybe it isn't as unthinkable as it once was. I do think it's worth discussing the implications, if only as a thought experiment, and I don't know where else we could do that apart from here.
One big problem area is that we can't distinguish between "pure Python" projects that are shipped only as sdists, but which only need Python to build, and complex projects that need a compiler. So restricting to wheels only would require an explicit opt-in for some projects which currently install with no issue.
The text was updated successfully, but these errors were encountered: