mirror of
https://github.com/ruby/ruby.git
synced 2025-08-15 21:49:06 +02:00

While #177 is reported as being caused by a comment, the underlying behavior is a problem due to the newline that we generated (from a comment). The prior commit fixed that problem by preserving whitespace before the comment. That guarantees that a block will form there from the frontier before it will be expanded there via a "neighbors" method. Since empty lines are valid ruby code, it will be hidden and be safe.
## Problem setup
This failure mode is not fixed by the prior commit, because the indentation is 0. To provide good results, we must make the algorithm less greedy. One heuristic/signal to follow is developer added newlines. If a developer puts a newline between code, it's more likely they're unrelated. For example:
```
port = rand(1000...9999)
stub_request(:any, "localhost:#{port}")
query = Cutlass::FunctionQuery.new(
port: port
).call
expect(WebMock).to have_requested(:post, "localhost:#{port}").
with(body: "{}")
```
This code is split into three chunks by the developer. Each are likely (but not guaranteed) to be intended to stand on their own (in terms of syntax). This behavior is good for scanning neighbors (same indent or higher) within a method, but bad for parsing neighbors across methods.
## Problem
Code is expanded to capture all neighbors, and then it decreases indent level which allows it to capture surrounding scope (think moving from within the method to also capturing the `def/end` definition. Once the indentation level has been increased, we go back to scanning neighbors, but now neighbors also contain keywords.
For example:
```
1 def bark
2
3 end
4
5 def sit
6 end
```
In this case if lines 4, 5, and 6 are in a block when it tries to expand neighbors it will expand up. If it stops after line 2 or 3 it may cause problems since there's a valid kw/end pair, but the block will be checked without it.
TLDR; It's good to stop scanning code after hitting a newline when you're in a method...it causes a problem scanning code between methods when everything inside of one of the methods is an empty line.
In this case it grabs the end on line 3 and since the problem was an extra end, the program now compiles correctly. It incorrectly assumes that the block it captured was causing the problem.
## Extra bit of context
One other technical detail is that after we've decided to stop scanning code for a new neighbor block expansion, we look around the block and grab any empty newlines. Basically adding empty newlines before of after a code block do not affect the parsing of that block.
## The fix
Since we know that this problem only happens when there's a newline inside of a method and we know this particular failure mode is due to having an invalid block (capturing an extra end, but not it's keyword) we have all the metadata we need to detect this scenario and correct it.
We know that the next line above our block must be code or empty (since we grabbed extra newlines). Same for code below it. We can count all the keywords and ends in the block. If they are balanced, it's likely (but not guaranteed) we formed the block correctly. If they're imbalanced, look above or below (depending on the nature of the imbalance), check to see if adding that line would balance the count.
This concept of balance and "leaning" comes from work in https://github.com/ruby/syntax_suggest/pull/152 and has proven useful, but not been formally introduced into the main branch.
## Outcome
Adding this extra check introduced no regressions and fixed the test case. It might be possible there's a mirror or similar problem that we're not handling. That will come out in time. It might also be possible that this causes a worse case in some code not under test. That too would come out in time.
One other possible concern to adding logic in this area (which is a hot codepath), is performance. This extra count check will be performed for every block. In general the two most helpful performance strategies I've found are reducing total number of blocks (therefore reducing overall N internal iterations) and making better matches (the parser to determine if a close block is valid or not is a major bottleneck. If we can split valid code into valid blocks, then it's only evaluated by the parser once, where as invalid code must be continuously re-checked by the parser until it becomes valid, or is determined to be the cause of the core problem.
This extra logic should very rarely result in a change, but when it does it should tend to produce slightly larger blocks (by one line) and more accurate blocks.
Informally it seems to have no impact on performance:
``
This branch:
DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures 3.01s user 1.62s system 113% cpu 4.076 total
```
```
On main:
DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures 3.02s user 1.64s system 113% cpu 4.098 total
```
13739c6946
232 lines
6.5 KiB
Ruby
232 lines
6.5 KiB
Ruby
# frozen_string_literal: true
|
|
|
|
module SyntaxSuggest
|
|
# Turns a "invalid block(s)" into useful context
|
|
#
|
|
# There are three main phases in the algorithm:
|
|
#
|
|
# 1. Sanitize/format input source
|
|
# 2. Search for invalid blocks
|
|
# 3. Format invalid blocks into something meaninful
|
|
#
|
|
# This class handles the third part.
|
|
#
|
|
# The algorithm is very good at capturing all of a syntax
|
|
# error in a single block in number 2, however the results
|
|
# can contain ambiguities. Humans are good at pattern matching
|
|
# and filtering and can mentally remove extraneous data, but
|
|
# they can't add extra data that's not present.
|
|
#
|
|
# In the case of known ambiguious cases, this class adds context
|
|
# back to the ambiguitiy so the programmer has full information.
|
|
#
|
|
# Beyond handling these ambiguities, it also captures surrounding
|
|
# code context information:
|
|
#
|
|
# puts block.to_s # => "def bark"
|
|
#
|
|
# context = CaptureCodeContext.new(
|
|
# blocks: block,
|
|
# code_lines: code_lines
|
|
# )
|
|
#
|
|
# lines = context.call.map(&:original)
|
|
# puts lines.join
|
|
# # =>
|
|
# class Dog
|
|
# def bark
|
|
# end
|
|
#
|
|
class CaptureCodeContext
|
|
attr_reader :code_lines
|
|
|
|
def initialize(blocks:, code_lines:)
|
|
@blocks = Array(blocks)
|
|
@code_lines = code_lines
|
|
@visible_lines = @blocks.map(&:visible_lines).flatten
|
|
@lines_to_output = @visible_lines.dup
|
|
end
|
|
|
|
def call
|
|
@blocks.each do |block|
|
|
capture_first_kw_end_same_indent(block)
|
|
capture_last_end_same_indent(block)
|
|
capture_before_after_kws(block)
|
|
capture_falling_indent(block)
|
|
end
|
|
|
|
@lines_to_output.select!(&:not_empty?)
|
|
@lines_to_output.uniq!
|
|
@lines_to_output.sort!
|
|
|
|
@lines_to_output
|
|
end
|
|
|
|
# Shows the context around code provided by "falling" indentation
|
|
#
|
|
# Converts:
|
|
#
|
|
# it "foo" do
|
|
#
|
|
# into:
|
|
#
|
|
# class OH
|
|
# def hello
|
|
# it "foo" do
|
|
# end
|
|
# end
|
|
#
|
|
def capture_falling_indent(block)
|
|
AroundBlockScan.new(
|
|
block: block,
|
|
code_lines: @code_lines
|
|
).on_falling_indent do |line|
|
|
@lines_to_output << line
|
|
end
|
|
end
|
|
|
|
# Shows surrounding kw/end pairs
|
|
#
|
|
# The purpose of showing these extra pairs is due to cases
|
|
# of ambiguity when only one visible line is matched.
|
|
#
|
|
# For example:
|
|
#
|
|
# 1 class Dog
|
|
# 2 def bark
|
|
# 4 def eat
|
|
# 5 end
|
|
# 6 end
|
|
#
|
|
# In this case either line 2 could be missing an `end` or
|
|
# line 4 was an extra line added by mistake (it happens).
|
|
#
|
|
# When we detect the above problem it shows the issue
|
|
# as only being on line 2
|
|
#
|
|
# 2 def bark
|
|
#
|
|
# Showing "neighbor" keyword pairs gives extra context:
|
|
#
|
|
# 2 def bark
|
|
# 4 def eat
|
|
# 5 end
|
|
#
|
|
def capture_before_after_kws(block)
|
|
return unless block.visible_lines.count == 1
|
|
|
|
around_lines = AroundBlockScan.new(code_lines: @code_lines, block: block)
|
|
.start_at_next_line
|
|
.capture_neighbor_context
|
|
|
|
around_lines -= block.lines
|
|
|
|
@lines_to_output.concat(around_lines)
|
|
end
|
|
|
|
# When there is an invalid block with a keyword
|
|
# missing an end right before another end,
|
|
# it is unclear where which keyword is missing the
|
|
# end
|
|
#
|
|
# Take this example:
|
|
#
|
|
# class Dog # 1
|
|
# def bark # 2
|
|
# puts "woof" # 3
|
|
# end # 4
|
|
#
|
|
# However due to https://github.com/ruby/syntax_suggest/issues/32
|
|
# the problem line will be identified as:
|
|
#
|
|
# > class Dog # 1
|
|
#
|
|
# Because lines 2, 3, and 4 are technically valid code and are expanded
|
|
# first, deemed valid, and hidden. We need to un-hide the matching end
|
|
# line 4. Also work backwards and if there's a mis-matched keyword, show it
|
|
# too
|
|
def capture_last_end_same_indent(block)
|
|
return if block.visible_lines.length != 1
|
|
return unless block.visible_lines.first.is_kw?
|
|
|
|
visible_line = block.visible_lines.first
|
|
lines = @code_lines[visible_line.index..block.lines.last.index]
|
|
|
|
# Find first end with same indent
|
|
# (this would return line 4)
|
|
#
|
|
# end # 4
|
|
matching_end = lines.detect { |line| line.indent == block.current_indent && line.is_end? }
|
|
return unless matching_end
|
|
|
|
@lines_to_output << matching_end
|
|
|
|
# Work backwards from the end to
|
|
# see if there are mis-matched
|
|
# keyword/end pairs
|
|
#
|
|
# Return the first mis-matched keyword
|
|
# this would find line 2
|
|
#
|
|
# def bark # 2
|
|
# puts "woof" # 3
|
|
# end # 4
|
|
end_count = 0
|
|
kw_count = 0
|
|
kw_line = @code_lines[visible_line.index..matching_end.index].reverse.detect do |line|
|
|
end_count += 1 if line.is_end?
|
|
kw_count += 1 if line.is_kw?
|
|
|
|
!kw_count.zero? && kw_count >= end_count
|
|
end
|
|
return unless kw_line
|
|
@lines_to_output << kw_line
|
|
end
|
|
|
|
# The logical inverse of `capture_last_end_same_indent`
|
|
#
|
|
# When there is an invalid block with an `end`
|
|
# missing a keyword right after another `end`,
|
|
# it is unclear where which end is missing the
|
|
# keyword.
|
|
#
|
|
# Take this example:
|
|
#
|
|
# class Dog # 1
|
|
# puts "woof" # 2
|
|
# end # 3
|
|
# end # 4
|
|
#
|
|
# the problem line will be identified as:
|
|
#
|
|
# > end # 4
|
|
#
|
|
# This happens because lines 1, 2, and 3 are technically valid code and are expanded
|
|
# first, deemed valid, and hidden. We need to un-hide the matching keyword on
|
|
# line 1. Also work backwards and if there's a mis-matched end, show it
|
|
# too
|
|
def capture_first_kw_end_same_indent(block)
|
|
return if block.visible_lines.length != 1
|
|
return unless block.visible_lines.first.is_end?
|
|
|
|
visible_line = block.visible_lines.first
|
|
lines = @code_lines[block.lines.first.index..visible_line.index]
|
|
matching_kw = lines.reverse.detect { |line| line.indent == block.current_indent && line.is_kw? }
|
|
return unless matching_kw
|
|
|
|
@lines_to_output << matching_kw
|
|
|
|
kw_count = 0
|
|
end_count = 0
|
|
orphan_end = @code_lines[matching_kw.index..visible_line.index].detect do |line|
|
|
kw_count += 1 if line.is_kw?
|
|
end_count += 1 if line.is_end?
|
|
|
|
end_count >= kw_count
|
|
end
|
|
|
|
return unless orphan_end
|
|
@lines_to_output << orphan_end
|
|
end
|
|
end
|
|
end
|