Improve heredoc end detection for embedded languages #3920
+127
−110
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Related to1 #3919
Embedded heredoc patterns (e.g.,
<<C,<<SQL,<<HTML) could fail to terminate properly when the heredoc content contained characters that start begin/end patterns in the embedded language grammar. This caused syntax highlighting to "leak" into subsequent Ruby code.When a heredoc with embedded language highlighting contained certain characters (like
?which starts C's ternary operator pattern, or(which starts a group), the embedded grammar'sbegin/endpattern would start but never find its closing match. Because the inner heredoc pattern used end to detect the terminator, and end is only checked after nested patterns are processed, the heredoc terminator was never recognized.Example that failed:
Implementation
Changed all 15 embedded heredoc language patterns from using
endtowhilefor the inner pattern:endbehavior: Check terminator after processing nested patterns (gets blocked if nested pattern is stuck open)whilebehavior: Check terminator before processing nested patterns on each line (exits immediately when terminator is found)The while condition is evaluated at the start of each line before any nested patterns run, so when it fails (line matches terminator), the entire block closes, including any nested patterns that might be open. This prevents the inner syntax highlighting from leaking beyond the closing terminator when it's busted.
Automated Tests
See the added automated test.
Manual Tests
I ran a local VSCode instance with this change to check that tokens after the closing heredoc delimiter were correctly identified as Ruby again.
Before:
After:
Footnotes
note that I think there's more to do to close that issue out, see my comment on interpolation highlighting -- this is just a stopgap to prevent things being broken after the heredoc ends ↩