-
-
Notifications
You must be signed in to change notification settings - Fork 396
File.dirname: add a spec for Shift JIS handling #1330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e8c8ec8 to
7cfe29b
Compare
7cfe29b to
38a2dbf
Compare
While trying to speedup various `File.*` methods, I realized they were way slower and complicated than they should for no apparent reason. However after asking Nobu he explained that Shift JIS encoded text can contain `0x5C` (ASCII backslash) as the second byte of a two byte character sequence. Since on Windows `0x5C` is `File::ALT_SEPARATOR`, this can easily break naive path related algorithms searching for directory separators.
38a2dbf to
5c63521
Compare
|
Thanks! Nasty edge case indeed 😅 |
|
Yep. I can add it to more |
Yeah that'd be great for The extra ASCII-compatible/or-not examples are good to have too. |
Ok, I'll add some as I go when I optimize these methods (like: ruby/ruby#15898, ruby/ruby#15902, ruby/ruby#15912)
Indeed. But since the same spec should pass on both I figured it was simpler not to restrict it. |
Thanks. Feel free to just add them directly as part of the ruby/ruby PRs for convenience. I think they don't need extra reviews given it would be similar to this PR. |
While trying to speedup various
File.*methods, I realized they were way slower and complicated than they should for no apparent reason. However after asking Nobu he explained that Shift JIS encoded text can contain0x5C(ASCII backslash) as the second byte of a two byte character sequence.Since on Windows
0x5CisFile::ALT_SEPARATOR, this can easily break naive path related algorithms searching for directory separators.cc @eregon what do you think of that spec? I think it would have helped me figure things out way sooner, so would have been valuable. If you agree I can generalize it to more "path handling" methods.