Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-Type parsing breaks on valid test cases #51842

Open
timoschwarzer opened this issue May 15, 2024 · 4 comments
Open

Content-Type parsing breaks on valid test cases #51842

timoschwarzer opened this issue May 15, 2024 · 4 comments

Comments

@timoschwarzer
Copy link

Steps to reproduce

The current implementation of the Content-Type parser breaks in some valid test cases:

  • application/pdf; name=test.pdf works
  • application/pdf; name="test with spaces.pdf" does not work, yields in Content-Type: application/pdf; name="test, which is an invalid header value and breaks reverse proxies (for example Traefik 3.0)

The current Regex solution fails with evaluating quoted values with spaces, and all in all is very prone to unexpected errors:

  • application/pdf; name=";charset=utf-8" would parse application/pdf; name=" as the content type and read utf-8" as the charset, which is wrong twice: First of all it was never set, and then it incorrectly includes the quote as part of the charset.

Expected behavior

The Content-Type parser must not yield invalid header values and must parse Content-Type headers according to RFC 9110.

Additional Notes

While working on a fix, a few questions rose up:

  • Why does Rails assume parameters before charset to belong to the media type, but parameters after charset to not belong to it? From what I read in RFC 9110, everything (including charset) is part of the media type.

def test_html_fragment
@request.accept = "text/html; fragment"
get :my_html_fragment
assert_equal "text/html; fragment; charset=utf-8", @response.headers["Content-Type"]
assert_equal "neat", @response.body
end

  • Reading RFC 9110, parameters without a value don't seem to be valid at all. So from what I can see, text/html; fragment is not a valid Content-Type header.

To me, the current way Media- and MIME types, charset and the Content-Type header are parsed and serialized seems to be a mess that breaks in many valid cases (example) and leads to unexpected, hard-to-debug errors. While trying to fix this issue, the special handling of charset everywhere made it impossible to fix the issue for me without breaking changes.

Suggestion

An idea I had while trying to fix the issue is introducing a ContentType type, that holds a Mime string and zero or more key/value pairs for parameter. Such a struct should be easily serializable and deserializable, and setting the charset on a response could then just be setting the according parameter on there without serializing and parsing the Content-Type header string multiple times.

System configuration

Rails version: 7.0.8.1

Ruby version: 3.2.4

@sikachu
Copy link
Member

sikachu commented May 16, 2024

  • Why does Rails assume parameters before charset to belong to the media type, but parameters after charset to not belong to it? From what I read in RFC 9110, everything (including charset) is part of the media type.

I was checking the code and noticed that the original code was written by @kamipo, so I'm curious if that was intentional or an oversight? @kamipo, would you mind giving a quick look?

@justinko
Copy link
Contributor

In HTTP headers, including the Content-Type header, parameters generally follow the parameter=value format. However, certain scenarios and non-standard or experimental use cases might involve parameters without explicit values. One such example is the fragment parameter.

Understanding the fragment Parameter

The fragment parameter is part of the multipart/byteranges content type. It’s used to indicate that a particular part of a multipart message represents a fragment of a larger resource.

Why Parameters Might Not Have Values

  1. Implicit Values:

    • Some parameters are designed to convey information implicitly by their presence alone. Their mere inclusion indicates a specific condition or behavior, so an explicit value is unnecessary.
  2. Non-Standard Extensions:

    • In non-standard or experimental extensions, parameters might be used creatively, sometimes without values, to trigger specific behaviors in custom implementations.
  3. Backward Compatibility:

    • Certain legacy systems or protocols might define parameters without values for backward compatibility or for simplicity in parsing.

Example: fragment Parameter

In the context of multipart/byteranges, the fragment parameter can be used to indicate a fragment of a resource:

Content-Type: multipart/byteranges; boundary=boundary1; fragment

Here’s why the fragment parameter might not have an explicit value:

  1. Semantic Meaning:

    • The presence of the fragment parameter alone conveys that the parts in the multipart message are fragments of a larger resource. There is no additional value needed to convey this meaning.
  2. Specification Simplicity:

    • Simplifies the specification and implementation by not requiring an explicit value. This reduces the complexity of parsing and processing.

Conclusion

While the standard practice for HTTP header parameters is to use the parameter=value format, there are valid reasons for parameters to exist without explicit values, such as conveying implicit information, maintaining simplicity, or adhering to specific use cases like the fragment parameter in multipart/byteranges. The presence of such parameters alone is enough to provide the necessary context or behavior expected by the client or server handling the HTTP message.

@justinko
Copy link
Contributor

Why does Rails assume parameters before charset to belong to the media type, but parameters after charset to not belong to it?

Considering the simplicity or naivety of the existing parser, probably oversight.

While trying to fix this issue, the special handling of charset everywhere made it impossible to fix the issue for me without breaking changes.

Breaking changes are sometimes necessary.

@timoschwarzer
Copy link
Author

Thanks for the clarification! Once the confusion with parameter orders is resolved, I can try updating my branch with a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants