ruby/spec/syntax_suggest/unit/code_search_spec.rb
schneems 5487ee4fe8 [ruby/syntax_suggest] Fix sibling bug to #177
While #177 is reported as being caused by a comment, the underlying behavior is a problem due to the newline that we generated (from a comment). The prior commit fixed that problem by preserving whitespace before the comment. That guarantees that a block will form there from the frontier before it will be expanded there via a "neighbors" method. Since empty lines are valid ruby code, it will be hidden and be safe.

## Problem setup

This failure mode is not fixed by the prior commit, because the indentation is 0. To provide good results, we must make the algorithm less greedy. One heuristic/signal to follow is developer added newlines. If a developer puts a newline between code, it's more likely they're unrelated. For example:

```
port = rand(1000...9999)
stub_request(:any, "localhost:#{port}")

query = Cutlass::FunctionQuery.new(
  port: port
).call

expect(WebMock).to have_requested(:post, "localhost:#{port}").
  with(body: "{}")
```

This code is split into three chunks by the developer. Each are likely (but not guaranteed) to be intended to stand on their own (in terms of syntax). This behavior is good for scanning neighbors (same indent or higher) within a method, but bad for parsing neighbors across methods.

## Problem

Code is expanded to capture all neighbors, and then it decreases indent level which allows it to capture surrounding scope (think moving from within the method to also capturing the `def/end` definition. Once the indentation level has been increased, we go back to scanning neighbors, but now neighbors also contain keywords.

For example:

```
  1 def bark
  2
  3 end
  4
  5 def sit
  6 end
```

In this case if lines 4, 5, and 6 are in a block when it tries to expand neighbors it will expand up. If it stops after line 2 or 3 it may cause problems since there's a valid kw/end pair, but the block will be checked without it.

TLDR; It's good to stop scanning code after hitting a newline when you're in a method...it causes a problem scanning code between methods when everything inside of one of the methods is an empty line.

In this case it grabs the end on line 3 and since the problem was an extra end, the program now compiles correctly. It incorrectly assumes that the block it captured was causing the problem.

## Extra bit of context

One other technical detail is that after we've decided to stop scanning code for a new neighbor block expansion, we look around the block and grab any empty newlines. Basically adding empty newlines before of after a code block do not affect the parsing of that block.

## The fix

Since we know that this problem only happens when there's a newline inside of a method and we know this particular failure mode is due to having an invalid block (capturing an extra end, but not it's keyword) we have all the metadata we need to detect this scenario and correct it.

We know that the next line above our block must be code or empty (since we grabbed extra newlines). Same for code below it. We can count all the keywords and ends in the block. If they are balanced, it's likely (but not guaranteed) we formed the block correctly. If they're imbalanced, look above or below (depending on the nature of the imbalance), check to see if adding that line would balance the count.

This concept of balance and "leaning" comes from work in https://github.com/ruby/syntax_suggest/pull/152 and has proven useful, but not been formally introduced into the main branch.

## Outcome

Adding this extra check introduced no regressions and fixed the test case. It might be possible there's a mirror or similar problem that we're not handling. That will come out in time. It might also be possible that this causes a worse case in some code not under test. That too would come out in time.

One other possible concern to adding logic in this area (which is a hot codepath), is performance. This extra count check will be performed for every block. In general the two most helpful performance strategies I've found are reducing total number of blocks (therefore reducing overall N internal iterations) and making better matches (the parser to determine if a close block is valid or not is a major bottleneck. If we can split valid code into valid blocks, then it's only evaluated by the parser once, where as invalid code must be continuously re-checked by the parser until it becomes valid, or is determined to be the cause of the core problem.

This extra logic should very rarely result in a change, but when it does it should tend to produce slightly larger blocks (by one line) and more accurate blocks.

Informally it seems to have no impact on performance:

``
This branch:
DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures  3.01s user 1.62s system 113% cpu 4.076 total
```

```
On main:
DEBUG_DISPLAY=1 bundle exec rspec spec/ --format=failures  3.02s user 1.64s system 113% cpu 4.098 total
```

13739c6946
2023-04-06 15:45:28 +09:00

505 lines
12 KiB
Ruby

# frozen_string_literal: true
require_relative "../spec_helper"
module SyntaxSuggest
RSpec.describe CodeSearch do
it "rexe regression" do
lines = fixtures_dir.join("rexe.rb.txt").read.lines
lines.delete_at(148 - 1)
source = lines.join
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join.strip).to eq(<<~'EOM'.strip)
class Lookups
EOM
end
it "squished do regression" do
source = <<~'EOM'
def call
trydo
@options = CommandLineParser.new.parse
options.requires.each { |r| require!(r) }
load_global_config_if_exists
options.loads.each { |file| load(file) }
@user_source_code = ARGV.join(' ')
@user_source_code = 'self' if @user_source_code == ''
@callable = create_callable
init_rexe_context
init_parser_and_formatters
# This is where the user's source code will be executed; the action will in turn call `execute`.
lookup_action(options.input_mode).call unless options.noop
output_log_entry
end # one
end # two
EOM
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM'.indent(2))
trydo
end # one
EOM
end
it "regression test ambiguous end" do
source = <<~'EOM'
def call # 0
print "lol" # 1
end # one # 2
end # two # 3
EOM
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
end # two # 3
EOM
end
it "regression dog test" do
source = <<~'EOM'
class Dog
def bark
puts "woof"
end
EOM
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
class Dog
EOM
expect(search.invalid_blocks.first.lines.length).to eq(4)
end
it "handles mismatched |" do
source = <<~EOM
class Blerg
Foo.call do |a
end # one
puts lol
class Foo
end # two
end # three
EOM
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM'.indent(2))
Foo.call do |a
end # one
EOM
end
it "handles mismatched }" do
source = <<~EOM
class Blerg
Foo.call do {
puts lol
class Foo
end # two
end # three
EOM
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM'.indent(2))
Foo.call do {
EOM
end
it "handles no spaces between blocks and trailing slash" do
source = <<~'EOM'
require "rails_helper"
RSpec.describe Foo, type: :model do
describe "#bar" do
context "context" do
it "foos the bar with a foo and then bazes the foo with a bar to"\
"fooify the barred bar" do
travel_to DateTime.new(2020, 10, 1, 10, 0, 0) do
foo = build(:foo)
end
end
end
end
describe "#baz?" do
context "baz has barred the foo" do
it "returns true" do # <== HERE
end
end
end
EOM
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join.strip).to eq('it "returns true" do # <== HERE')
end
it "handles no spaces between blocks" do
source = <<~'EOM'
context "foo bar" do
it "bars the foo" do
travel_to DateTime.new(2020, 10, 1, 10, 0, 0) do
end
end
end
context "test" do
it "should" do
end
EOM
search = CodeSearch.new(source)
search.call
expect(search.invalid_blocks.join.strip).to eq('it "should" do')
end
it "records debugging steps to a directory" do
Dir.mktmpdir do |dir|
dir = Pathname(dir)
search = CodeSearch.new(<<~'EOM', record_dir: dir)
class OH
def hello
def hai
end
end
EOM
search.call
expect(search.record_dir.entries.map(&:to_s)).to include("1-add-1-(3__4).txt")
expect(search.record_dir.join("1-add-1-(3__4).txt").read).to include(<<~EOM)
1 class OH
2 def hello
> 3 def hai
> 4 end
5 end
EOM
end
end
it "def with missing end" do
search = CodeSearch.new(<<~'EOM')
class OH
def hello
def hai
puts "lol"
end
end
EOM
search.call
expect(search.invalid_blocks.join.strip).to eq("def hello")
search = CodeSearch.new(<<~'EOM')
class OH
def hello
def hai
end
end
EOM
search.call
expect(search.invalid_blocks.join.strip).to eq("def hello")
search = CodeSearch.new(<<~'EOM')
class OH
def hello
def hai
end
end
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM'.indent(2))
def hello
EOM
end
describe "real world cases" do
it "finds hanging def in this project" do
source_string = fixtures_dir.join("this_project_extra_def.rb.txt").read
search = CodeSearch.new(source_string)
search.call
document = DisplayCodeWithLineNumbers.new(
lines: search.code_lines.select(&:visible?),
terminal: false,
highlight_lines: search.invalid_blocks.flat_map(&:lines)
).call
expect(document).to include(<<~'EOM')
> 36 def filename
EOM
end
it "Format Code blocks real world example" do
search = CodeSearch.new(<<~'EOM')
require 'rails_helper'
RSpec.describe AclassNameHere, type: :worker do
describe "thing" do
context "when" do
let(:thing) { stuff }
let(:another_thing) { moarstuff }
subject { foo.new.perform(foo.id, true) }
it "stuff" do
subject
expect(foo.foo.foo).to eq(true)
end
end
end # line 16 accidental end, but valid block
context "stuff" do
let(:thing) { create(:foo, foo: stuff) }
let(:another_thing) { create(:stuff) }
subject { described_class.new.perform(foo.id, false) }
it "more stuff" do
subject
expect(foo.foo.foo).to eq(false)
end
end
end # mismatched due to 16
end
EOM
search.call
document = DisplayCodeWithLineNumbers.new(
lines: search.code_lines.select(&:visible?),
terminal: false,
highlight_lines: search.invalid_blocks.flat_map(&:lines)
).call
expect(document).to include(<<~'EOM')
1 require 'rails_helper'
2
3 RSpec.describe AclassNameHere, type: :worker do
> 4 describe "thing" do
> 16 end # line 16 accidental end, but valid block
> 30 end # mismatched due to 16
31 end
EOM
end
end
# For code that's not perfectly formatted, we ideally want to do our best
# These examples represent the results that exist today, but I would like to improve upon them
describe "needs improvement" do
describe "mis-matched-indentation" do
it "extra space before end" do
search = CodeSearch.new(<<~'EOM')
Foo.call
def foo
puts "lol"
puts "lol"
end # one
end # two
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
Foo.call
end # two
EOM
end
it "stacked ends 2" do
search = CodeSearch.new(<<~'EOM')
def cat
blerg
end
Foo.call do
end # one
end # two
def dog
end
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
Foo.call do
end # one
end # two
EOM
end
it "stacked ends " do
search = CodeSearch.new(<<~'EOM')
Foo.call
def foo
puts "lol"
puts "lol"
end
end
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
Foo.call
end
EOM
end
it "missing space before end" do
search = CodeSearch.new(<<~'EOM')
Foo.call
def foo
puts "lol"
puts "lol"
end
end
EOM
search.call
# expand-1 and expand-2 seem to be broken?
expect(search.invalid_blocks.join).to eq(<<~'EOM')
Foo.call
end
EOM
end
end
end
it "returns syntax error in outer block without inner block" do
search = CodeSearch.new(<<~'EOM')
Foo.call
def foo
puts "lol"
puts "lol"
end # one
end # two
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
Foo.call
end # two
EOM
end
it "doesn't just return an empty `end`" do
search = CodeSearch.new(<<~'EOM')
Foo.call
end
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
Foo.call
end
EOM
end
it "finds multiple syntax errors" do
search = CodeSearch.new(<<~'EOM')
describe "hi" do
Foo.call
end
end
it "blerg" do
Bar.call
end
end
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM'.indent(2))
Foo.call
end
Bar.call
end
EOM
end
it "finds a typo def" do
search = CodeSearch.new(<<~'EOM')
defzfoo
puts "lol"
end
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM')
defzfoo
end
EOM
end
it "finds a mis-matched def" do
search = CodeSearch.new(<<~'EOM')
def foo
def blerg
end
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM'.indent(2))
def blerg
EOM
end
it "finds a naked end" do
search = CodeSearch.new(<<~'EOM')
def foo
end # one
end # two
EOM
search.call
expect(search.invalid_blocks.join).to eq(<<~'EOM'.indent(2))
end # one
EOM
end
it "returns when no invalid blocks are found" do
search = CodeSearch.new(<<~'EOM')
def foo
puts 'lol'
end
EOM
search.call
expect(search.invalid_blocks).to eq([])
end
it "expands frontier by eliminating valid lines" do
search = CodeSearch.new(<<~'EOM')
def foo
puts 'lol'
end
EOM
search.create_blocks_from_untracked_lines
expect(search.code_lines.join).to eq(<<~'EOM')
def foo
end
EOM
end
end
end