Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support names from the usb.ids database. #331

Open
Gestas opened this issue Oct 3, 2020 · 18 comments
Open

Feature request: Support names from the usb.ids database. #331

Gestas opened this issue Oct 3, 2020 · 18 comments

Comments

@Gestas
Copy link

Gestas commented Oct 3, 2020

Feature: Export vendor and product names as described in the usb.ids file. Slightly related to #330.

The usb.ids file is the the canonical mapping of vendor and product names to ids. It is used by almost all USB related tooling that return vendor and product names.

libusb support here is limited and inconsistent, I'm not sure what's going on. On my Ubuntu 20.04 system it doesn't return vendor names for ids [0bda, 8087] but it does for [1d6b, 13d3]. 13d3 is incorrect. All of these devices are included in the usb.ids file.

>>> _ds = usb.core.find(find_all=True)
>>> for d in _ds:
...    print(d)
# Partial results -
DEVICE ID 0bda:0316 on Bus 004 Address 002 =================
 ...
 iManufacturer          :    0x1 Generic    ## Missing vendor name
 iProduct               :    0x2 USB3.0-CRW  ## Incorrect product description
 iSerialNumber          :    0x3 20120501030900000
 ...
DEVICE ID 1d6b:0003 on Bus 004 Address 001 =================
 ...
 iManufacturer          :    0x3 Linux 5.3.0-7625-generic xhci-hcd
 iProduct               :    0x2 xHCI Host Controller
 iSerialNumber          :    0x1 0000:3b:00.0
 ...
DEVICE ID 13d3:56b2 on Bus 001 Address 003 =================
 ...
 iManufacturer          :    0x1 SunplusIT Inc  ## Incorrect vendor name
 iProduct               :    0x2 Integrated Camera   ## Incorrect product description
 iSerialNumber          :    0x0 
 ...
DEVICE ID 8087:0a2b on Bus 001 Address 002 =================
 ...
 iManufacturer          :    0x0   ## Missing manufacturer name
 iProduct               :    0x0  ## Missing product name
 iSerialNumber          :    0x0 
 ...

There are several implementation options. If you think this is something that belongs in pyusb let me know and we can figure something out.

@jonasmalacofilho
Copy link
Member

First, I think there would be value in being able to reliably access the USB IDs database from Python.

That said, there are two major sources of USB ID -> name mappings:

  • compiled USB IDs databases (like the one used by the Linux kernel)
  • manufacturer and product string descriptors returned by the device (when available)

PyUSB, for instance, only returns the string descriptors; lsusb returns the string descriptors in verbose mode, but in normal mode (and IIRC) it uses the names it receives from udev (which in turn should come from the USB IDs database).


So I think we need to answer:

  • Can we reliably access a (already available) USB IDs database on all Linux distros?
  • How about BSDs?
  • And MacOS?
  • And Windows?
  • Based on the above, do we need to ship our copy of the database?
  • What's the size of the Linux USB IDs database? In raw form, 242 KB compressed.
  • What interface would we provide our users?
  • And, finally, do we put this in PyUSB or in a separate library?

@Gestas
Copy link
Author

Gestas commented Oct 5, 2020

@jonasmalacofilho I dig some digging, it looks like the usb.ids database is consistently available on most *nix like systems (excluding Darwin) via userspace tooling at one or more of these paths -

/usr/share/doc/usb.ids
/var/lib/usbutils/usb.ids
/usr/share/usb.ids                  # Debian & derivatives
/usr/share/hwdata/usb.ids           # RHEL & derivatives
/usr/local/share/usbids/usb.ids     # BSD & derivatives
/usr/share/misc/usb.ids             # Raspbian 

As an example Debian's lsusb (lsusb.py.in) searches for the first existing file from this list -

# Lines 30 - 35
usbids = [
	"@usbids@", # CLI parameter
	"/usr/share/usb.ids",
	"/usr/share/libosinfo/usb.ids",
	"/usr/share/kcmusb/usb.ids",
]

There is a similar pattern in other distributions. It doesn't look like the kernel necessarily includes this data, the only references I can find to it are in the USB over IP tooling, here and here where it's an optional parameter

I can't find a built-in way to access the same data on Darwin or Windows. For ease of cross-platform compatibility I suggest that we include it in this project. We could -

  1. Add a ./tools/get_usb.ids.py script.
    • This would download the usb.ids database, convert it to a dict and write it to file. If I was going to do this today (I'm not) I'd put it in ./usb/_lookup.py.
  2. Add a step to ./deploy.sh that runs the script.

As to an interface, as we've discussed in #330 I'm comfortable with a public Lookup class and requiring users to make a separate call to get these details. I believe you would prefer a more integrated experience so I'll take your lead here.

As to including this in PyUSB, before version 1.0 it would have been out of scope but now the project has grown into "...an API rich, backend neutral Python USB module..." I think this is an appropriate feature. However we decide to implement this I'm happy to own it, I'd submit a pull request on the same schedule as #330.

@jonasmalacofilho
Copy link
Member

@Gestas thanks for looking into this, I'll take a look at what you gathered.

@mcuee
Copy link
Member

mcuee commented Jul 24, 2021

FYI, right now we use a snapshot version of usb.ids file in libusbk project (Windows).
https://github.com/mcuee/libusbk/tree/master/libusbK/src/kList

But it is a good idea to allow users to update by themselves if possible.

@jonasmalacofilho
Copy link
Member

I think we can include a copy of the data, but prefer the system provided copy on Linux (on the possible paths we have collected).


Regarding the API, I think a mixed approach would be useful: a low level DB lookup module plus some convenience methods or properties on Device.

@harryCM

This comment has been minimized.

@jonasmalacofilho

This comment has been minimized.

@Gestas
Copy link
Author

Gestas commented Oct 30, 2021

I have some cycles to work on this. @jonasmalacofilho, I can implement this per your suggestion ^^^.

The outstanding question is where to source the data on Darwin and Windows. I can't find a path forward beyond including the usb.ids database in this package as per my comment, #331 (comment). I am open to other options, suggestions?

@jonasmalacofilho
Copy link
Member

The outstanding question is where to source the data on Darwin and Windows. I can't find a path forward beyond including the usb.ids database in this package as per my comment, #331 (comment). I am open to other options, suggestions?

I can't either, although @mcuee is probably right that there should be some way for our API users to specify their own databases. So, we read the database from:

  • the system, assuming the usual paths (Linux and most Unix-like systems);
  • a copy shipped with PyUSB (maybe only from Windows & Mac OS wheels);
  • an explicitly specified path.

One remaining question is whether the end users should be allowed to influence the path (and thus use a possibly more up-to-date database), perhaps through an environment variable. For now I don't think that's necessary: the USB database isn't updated that often (my personal inclusions took quite a while to be integrated), and there should be a new PyUSB release every 6 months anyway.

@mcuee
Copy link
Member

mcuee commented Oct 31, 2021

So, we read the database from:

  • the system, assuming the usual paths (Linux and most Unix-like systems);
  • a copy shipped with PyUSB (maybe only from Windows & Mac OS wheels);
  • an explicitly specified path.

I think the above should be good enough, especially since the idea is to have regular pyusb release.

For macOS, I use homebrew and it does have a formulae for usb.ids.
https://formulae.brew.sh/formula/usb.ids

Same for Mac Ports.
https://ports.macports.org/port/usbids/

@Gestas
Copy link
Author

Gestas commented Nov 1, 2021

In order of default priority -

  1. If supplied, a user specified path.
  2. The system, assuming the usual paths. I'll include the paths Brew and Mac Ports use.
  3. The copy shipped with PyUSB.

I'll provide a way for a user to override the defaults.

For 1; Any preferences on environment variable name? Should we fail forward or exit if a path is specified but there isn't a valid file there?

@jonasmalacofilho
Copy link
Member

I think the order should be:

  1. [possible future addition:] path specified by the end user, probably via an environment variable
  2. path specified by the PyUSB (API) user/caller, through the API itself;
  3. the system, assuming the usual paths;
  4. the copy shipped with PyUSB.

But it seems that mcuee agrees with me that 0 is not necessary, at least not yet.

Additionally, I don't think we should ship the database on platforms where we assume the database will be available at the system level. It would be useless (the system version would have priority) and potentially confusing.

Regarding any explicitly passed in paths, I think that if the path is invalid we should just log the problem and, like you called it, "fail forward".

In fact, this entire API should only provide a "a best effort" guarantee: if neither the supplied path, nor the system or the shipped data is available (or, more commonly, if there simply isn't an entry for the desired vid:pid), it should simply return None and, at most, log any issues in DEBUG level.

@Gestas
Copy link
Author

Gestas commented Nov 2, 2021

@jonasmalacofilho got it, I'll work on this.

@mcuee
Copy link
Member

mcuee commented Nov 3, 2021

But it seems that mcuee agrees with me that 0 is not necessary, at least not yet.

I agree. Thanks.

@tormodvolden
Copy link
Contributor

Just reading through this and there is something that needs clarification about the original post:
iManufacturer, iProduct and iSerialNumber are indexes to strings stored on the device and which can be retrieved with control requests (and often the OS has already done it and the cached device strings would be available - this is platform dependent). It is fully OK that these strings are empty if the user doesn't have permission to request the strings and the OS doesn't offer cached values. These strings are not and should not be taken from a usb.ids databases (note also that modern Linux systems use their udev database instead of the usb.ids text files)! They should be exactly what the device reports.

The values from udev or usb.ids are used to "pretty-print" the idVendor and idProduct values (e.g. 0bda and 0316). This is what you see e.g. when you run "lsusb" without any verbose options.

It would be wrong to mix this up and transparently fall back to usb.ids values when device strings are missing. The premise of the original post with "Incorrect" or "Missing" values, and using usb.ids to remedy this, is wrong. Displaying device strings and displaying human-readable IDs are two different things and should be kept separate.

BTW, I remember there was a bug in usbutils (or was it in libusb?) some years ago, I think it was present in e.g. Ubuntu 16.04, where the output from lsusb would be messed up and idVendor/idProduct strings from one device would be printed for another device. This came to my mind when I saw the description here, but I don't think it is related.

@jonasmalacofilho
Copy link
Member

jonasmalacofilho commented Feb 9, 2022

@tormodvolden,

Especially (but not only) in relation to:

It would be wrong to mix this up and transparently fall back to usb.ids values when device strings are missing.

I absolutely agree with you.

And I don't think I was clear as you about this on my previous comments in this thread, so thank you!

BTW, I remember there was a bug in usbutils (or was it in libusb?) some years ago, I think it was present in e.g. Ubuntu 16.04, where the output from lsusb would be messed up and idVendor/idProduct strings from one device would be printed for another device. This came to my mind when I saw the description here, but I don't think it is related.

Do you remember anything else about that issue? I saw it as recently as last year (in bug reports, IIRC on liquidctl), but wasn't able to track down the problem with the limited information I had at the time.

@tormodvolden
Copy link
Contributor

tormodvolden commented Feb 9, 2022

Here is my summary in one of the upstream PRs: gregkh/usbutils#103 (comment)
Since it was only fixed late 2020 it is probably present in Ubuntu 20.04 LTS as well.

Funny enough, the bug was introduced when someone added such a "fallback" as I recommend against here.

@jonasmalacofilho
Copy link
Member

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants