Ruby Threads: failing fast

Ruby Threads: failing fast.md

In our code base we spin up some Threads…

metas.map do |meta|
  Thread.new do
    fetch(meta, timeout)
  end
end.map(&:value)

The fetch function will call some code from within another class which itself internally will throw and catch lots of exceptions.

In one instance it’ll throw an exception that isn’t caught by the class, and so it bubbles up to the top level of the application.

The issue is that although the exception is thrown and is eventually rescued by a function in the top level of the application, the current code takes far too long to get there. The reason being: our loop keeps going - regardless of the unhandled exception - until all the other threads have finished (so we DON’T “fail fast” in this scenario).

The following code replicates this behaviour and then tries to solve it…

Ruby Threads: failing fast.rb

# SETUP...

class ::FooError < StandardError
  def initialize(msg)
    super "msg: #{msg}"
  end
end

class ::BarError < StandardError
  def initialize(msg)
    super "msg: #{msg}"
  end
end

def foo(i)
  sleep i; fail ::FooError.new("new foo error")
rescue ::FooError => e
  e
end

def bar(i)
  sleep i; fail ::BarError.new("new bar error") 
  # to see a full length run through, change this to... 
  # sleep i; "some success value"
end

# ACTUAL SOLUTION...

start = Time.now

times = [10, 10, 2].map do |i|
  Thread.new do
    begin
      if i == 2
        bar(i) # run bar with 2 second sleep
      else
        foo(i) # run foo with 10 second sleep
      end
    rescue
      Thread.new { check(times) }
      Thread.kill(Thread.current)
    end
  end
end

def check(threads)
  threads.select { |t| t unless t.alive? }.tap do |result|
    threads.each { |t| t.kill } if result.size > 0
  end
end

p times.map { |i| i.value }

finish = Time.now - start

p finish # should be ~2 seconds, definitely not 10! unless you modify the code as indicated above