3

Say I have a big composite formula to compute the quality of a widget

quality = 0.4(factory_quality) + 0.3(1/days_since_manufacture) + 0.3(materials_quality)

Each of these three factors are functions themselves, which require joins to the factories table, and maybe to a bill of materials join table with materials, where the associated records are averaged or something or other.

Architecturally, how would you manage this in a Rails project? What's the best practice to a) produce the correct query and b) manage the code in Rails?

Currently for the sql, I'm using a subquery in the FROM statement:

SELECT *,
  (0.4 * factory_quality + 0.3 * (1/days_since_manufacture) + 0.3 * materials_quality) AS quality
FROM (
  SELECT *,
    ((factories.last_inspection_score + factories.variance_score)/2) AS factory_quality,
    (now() - widgets.created_at) AS days_since_manufacture,
    SUM(materials.quality_score) AS materials_quality
  FROM widgets,
  JOIN factories ON widget.factory_id = factories.id
  JOIN bills_of_materials ON widget.id = bills_of_materials.widget_id
  JOIN materials ON bills_of_materials.material_id = materials.id
  GROUP BY widgets.id
) AS widgets;

In rails, I have this implemented mostly using ActiveRecord:

class Widget < ActiveRecord::Base
  belongs_to :factory
  has_many :bills_of_material
  has_many :materials, through :bills_of_material

  class << self
    def with_quality
      select([
        "widgets.*",
        "(0.4 * factory_quality + 0.3 * (1/days_since_manufacture) + 0.3 * materials_quality) AS quality"
      ].join(",")
      .from("(#{subquery}) AS widgets")
    end
    private
      def subquery
        select([
          "widgets.*",
          "((factories.last_inspection_score + factories.variance_score)/2) AS factory_quality",
          "(now() - widgets.created_at) AS days_since_manufacture",
          "SUM(materials.quality_score) AS materials_quality"
        ].join(","))
        .joins(:factory,:materials)
        .group("widgets.id")
        .to_sql
      end
  end
end

That said, I feel like I could make this a custom function in Postgres, move all this sql in to that function, migrate it, and clean up the rails to look like

def with_scores
  select("*,quality_score_func(id) AS quality")
end

or something to that effect, but I feel like it will be a pain in the ass to manage what will be an evolving formula through database migrations, not to mention somewhat of a task to find out what the current form of the formula is (and also difficult to test).

How have other people solved this problem? Any tips or suggestions?

1
  • Just to clarify, I'm not trying to avoid sql, and I believe the calculation belongs in the database for performance reasons. I'm just wondering if anyone has developed a design pattern to clean this code up. Commented Mar 17, 2014 at 5:24

1 Answer 1

2

Here is the least SQL-ly way I could think of to do this. I couldn't really test this, but hopefully it's a helpful exercise at least. As I understand it, if you use includes, Rails will put together the joins and eager load all the relevant data in one query.

# All of these are additional Widget instance methods; you decide if they are private
#
# Example use:
#
#   @widget = Widget.includes(:factory, :materials).find(1)
#   puts @widget.quality_score
# or
#   @widgets = Widget.includes(:factory, :materials).all
#   @widgets.each { |widget| puts widget.quality_score }

# Consider making these weights named constants
def quality_score
  0.4 * factory_quality + \
  0.3 * (1/days_since_manufacture) + \
  0.3 * (materials_quality_score )
end

def days_since_manufacture
  Time.now - created_at
end

def factory_quality
  (factory.last_inspection_score + factory.variance_score)/2
end

def materials_quality_score
  materials.inject(0) {|sum, material| sum + material.quality_score }
end
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for this answer, but I'm not really looking to do more of the work in Ruby, quite the opposite. If I do the calculation in Ruby, yes the code will be cleaner, but in order to do queries like Top 5 by quality I will need to instantiate all records, calculate for all records, sort in memory, then discard N - 5. I am willing to live with code ugliness to avoid that performance hit, but I feel like this must be a common enough problem that there might be a less ugly way than I've done it.
Apologies for the misunderstanding. If that's the case I agree defining a Postgres function is a bad idea. Instead I think I would define Ruby methods that generate the appropriate snippets and compose those into something you can send to find_by_sql. I don't expect this will look that much better, but it allows you to separate out calculations that might change from the blocking and tackling.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.