Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: allow pick() to rename in distinct() + some thoughts about arrange() allowing renaming. #7028

Open
olivroy opened this issue May 18, 2024 · 0 comments

Comments

@olivroy
Copy link

olivroy commented May 18, 2024

This would be useful in packages to avoid cran warnings as pick() is the new preferred way.

library(tidyverse)
options(lifecycle_verbosity = "error")
mtcars <- utils::head(mtcars, n = 6)
# distinct -----------
# renaming generally makes a lot of sense, as distinct is like `transmute(...)` + `distinct()`
# consider this in EDA
mtcars |> 
  distinct(x = vs)
#>            x
#> Mazda RX4  0
#> Datsun 710 1
# as it is equivalent to: (to avoid cran warnings)
mtcars |> 
  distinct(x = .data$vs)
# Before dplyr 1.1, `across()` was recommended, which works, but .fns = NULL is soft-deprecated.
mtcars |> 
  distinct(across(all_of(c(x = "vs"))))
# The alternative when you may not know variable names in a package
# with `pick()` should therefore work too, but errors.
# I think it should work
mtcars |> 
  distinct(pick(x = "vs"))
#> Error in `distinct()`:
#>  ℹ In argument: `pick(x = "vs")`.
#> Caused by error in `pick()`:
#> ! Can't rename variables in this context.
# Similarly:
mtcars |> 
  distinct(pick(all_of(c(x = "vs"))))

# I would expect distinct() and transmute()  + distinct() to be same as above (and it is!) ---
# transmute() is still very useful as it is mutate + select in a single call)
mtcars |> 
  transmute(x = vs) |> 
  distinct()
#>            x
#> Mazda RX4  0
#> Datsun 710 1
mtcars |> 
  transmute(x = .data$vs) |> 
  distinct()
# I think this should be allowed.
mtcars |> 
  transmute(across(all_of(c(x = "vs")))) |> 
  distinct()
#> Error in `transmute()`:
#> ℹ In argument: `pick(all_of(c(x = "vs")))`.
#> Caused by error in `pick()`:
#> ! Can't rename variables in this context.

Now comes arrange()...

arrange()

While I believe distinct() should accept renaming under any circonstances, I believe that accepting renaming in arrange() is inconsistent, as arrange() ignores it silently. renaming (or attempt to do so) should error in all cases for arrange().

However, it behaves exactly the same as distinct(). (so it is consistent in that sense, but ignores )

# arrange() is same as distinct() but silently ignores renaming...
# no x variable
mtcars |> 
  arrange(x = vs)
# no x variable
mtcars |> 
  arrange(x = .data$vs)
# no x variable
mtcars |> 
  arrange(across(all_of(c(x = "vs"))))
# this rightfully errors!
mtcars |> 
  arrange(pick(x = "vs"))
#> Error in `arrange()`:
#>  ℹ In argument: `..1 = pick(x = "vs")`.
#> Caused by error in `pick()`:
#> ! Can't rename variables in this context.

Maybe a soft-deprecation warning would be desirable for attempting to rename in arrange()
Something like names ignored, use unnamed, or remove name, or create this variable with mutate() to keep it.

#  A beginner may think that this will create the variable as nothing is stopping them.
y <- mtcars |> arrange(x = vs, y = desc(disp))

## later tries to access the variable and gets a warning or even worse. This confusing data frame.

y |> mutate(z = y **2) # gets very bad output!

Suggestions for signaling that arrange() ignores renaming.

  1. (preferred) soft-deprecation warning

something like: named input is ignored in arrange(). Please omit them.
in all_of(named_external_selection) the adjustment required would just be all_of(unname(named_external_selection))

# before
x |> arrange(new_var = var)
# now
x |> arrange(var)

if you want to create a new variable or rename, use rename() or mutate(), as arrange silently ignores names provided.

using the case of tidyselect deprecation of external selection as an example

it seems pretty similar as the deprecation of select(.data$vs) in favour of select("vs") + requiring any_of() all_of(). it ended up just increasing
general consitency of code base, and improve clarity.

# Before when you read this in someone else's code (i.e. with no context)
mtcars |>
  select(vs)
# you had no idea if the data had a `vs` column, or `vs` was an external vector
# the alternative made everything much clearer.
mtcars |>
  select(all_of(vs))

My proposed change would force users to remove this potentially misleading code
have the benefit that no one would question if the resulting data frame has a x column or not.

#6980 would be addressed automatically as part of this idea. i.e no longer allowing named selection would act a bit like a check_dots_used() condition

# before
mtcars |>
  arrange(x = vs)
# this code is clearer about what it does (and gives the same result as above)
# proposed
mtcars |> 
  arrange(vs)
  1. Breaking change (respect renaming in arrange() i.e. arrange() == mutate() + arrange()), Downsides probably outweigh benefits here at this point..

Quoting the 1.1 release notes:

across(), c_across(), if_any(), and if_all() now require the .cols and .fns arguments. In general, we now recommend that you use pick() instead of an empty across() call or across() with no .fns (e.g. across(c(x, y)). (#6523).

Relying on the previous default of .fns = NULL is not yet formally soft-deprecated, because there was no good alternative until now, but it is discouraged and will be soft-deprecated in the next minor release.

I think that allowing renaming in pick() would allow you to soft-deprecate .fns = NULL of across and warn on distinct() + across().

Edit: pick() should also allow renaming in group_by()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant