-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky test with DataWriter (part 2) #376
Conversation
test/helpers/data_writer_helper.rb
Outdated
@started_callback_cmd << n unless @started | ||
n += 1 | ||
# Kind of makes the following race condition a bit better... | ||
# https://github.com/Shopify/ghostferry/issues/280 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a problem 👀
test/helpers/data_writer_helper.rb
Outdated
begin | ||
until @stop_requested do | ||
write_data(connection, &on_write) | ||
@started_callback_cmd << n unless @started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets #start
return after first write
@number_of_writers.times do |i| | ||
@threads << Thread.new do | ||
@logger.info("data writer thread in wait mode #{i}") | ||
connection = Mysql2::Client.new(@db_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should still close the connection after the stop is requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I forgot to add it back here after an experiment with the ConnectionPool gem (but thought it's a an overkill for this test setup)
95f5d10
to
8c817bf
Compare
test/helpers/data_writer_helper.rb
Outdated
sleep(0.03) | ||
end | ||
ensure | ||
connection.close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this ensure
will only get called if something goes wrong in begin
block right? So if anything goes wrong before that we won't close the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's right, ensures are always called after any rescues or at the end of the block. Just moved it from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I do this:
def test
raise "here"
begin
puts "Hi"
ensure
puts "always ensure"
end
end
ensure is not called.
irb(main):010> test
(irb):2:in `test': here (RuntimeError)
but when I do this:
def test
puts "here"
begin
raise "Hi"
ensure
puts "always ensure"
end
end
I get this:
irb(main):020> test
here
always ensure
(irb):15:in `test': Hi (RuntimeError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh right, just saw what happened there as I moved the connection
outside of the block. Simplified and fixed in 50c78b2
50c78b2
to
837f027
Compare
Currently re-running the CI suite several times to evaluate its stability ⌛ I don't think it's going to resolve all concurrency issues, but hopefully Edit: 96f92f5 is a side-quest for another |
Speed up the setup by creating threads and connecting earlier in initialize. Includes synchronization on DataWriter threads, so that calling #start ensures a first write to db before returning. This attempts to improve flakiness of test_interrupt_resume_inline_verifier_with_datawriter.
…ped before first write
7148582
to
96f92f5
Compare
…nts, potential source of flakiness Hypothesis: MixedActionDataWriter doesn't has a chance to start (goroutine), give it more time. Ideal solution would be to introduce some form of synchronization, but let's try this first. Partially reverts a speedup change from a288a28
96f92f5
to
7750793
Compare
Improves the
DataWriter
setup by creating threads and connecting earlier ininitialize
. Includes synchronization onDataWriter
threads, so that calling#start
ensures a first write to db before returning and letting the test proceed.This attempts to improve flakiness of
test_interrupt_resume_inline_verifier_with_datawriter
, which sometimes fails as expectedBinlogVerifyStore
contents are empty on interrup, when they should contain some data fromDataWriter
.I found out with the below log entry that nothing gets to the inline verifier before interrupt in cases where
test_interrupt_resume_inline_verifier_with_datawriter
fails (feel free to try out locally).Hoping this will be the definitive (and correct) fix, as opposed to the speed-up from #371.