codedecoder

breaking into the unknown…

raw sql for rails joins

1 Comment

There are time when ActiveRecord query will not server our purpose and you needs to write raw sql query.

Lets understands with examples when such case arise and how to write raw sql. We will also see how irresponsibly written code crash your system.

One of our application facing a lots of out of memory exception in Heroku and also some of payments not at all getting render in UI as Heroku kills any process which take more than 30 seconds.

Now question is why any request will need more than 30 seconds to load ?.

I was given the task to find the issue and fix it.

The model association in the Application look something as below:

class Payment < ActiveRecord::Base
  has_many :ledgers, dependent: :destroy, autosave: true
  has_many :invoices, through: :ledgers, autosave: true
  has_many :remittances, dependent: :destroy, autosave: true
end

class Ledger < ActiveRecord::Base
  belongs_to :invoice
  belongs_to :payment
end

class Invoice < ActiveRecord::Base belongs_to :customer_information has_many :ledgers has_many :payments, :through => :ledgers
end

class Remittance < ActiveRecord::Base
  belongs_to :payment
end

The business requirement look straight forward, say you have raised invoice for any service and than client will send you Payment for a bunch of invoices. Ledger is the join table between invoice and payment.

Remittance is something like you received extra payment which you return back to Client.
Here the Remittance table do not have direct link with Invoice table. The money returned may be part on a invoice or without it. The Remittance table do have a name column which keep Invoice number and possibly match to a invoice in Invoice table.

Now let see, how current code is written and Problem with it.

Case 1 : Display all the invoices with a payment on UI

Existing Code look as below:

payment_ledgers = payment.ledgers.includes(:invoice).where(matched: true)
payment_ledgers.each do |ledger|
  payment_invoice = ledger.invoice
  if payment_invoice && payment_invoice.remaining_amount > 0        
    customer = payment_invoice.customer_information.customer
    customer_name = customer.legal_name.present? ?  customer.legal_name : customer.name         
    date = payment_invoice.date.strftime("%m/%d/%Y") if payment_invoice.try(:date)
    invoice_hash = {
      number: payment_invoice.number, date: date,
      customer_name: customer_name, remaining_amount: payment_invoice.remaining_amount.to_f.round(2),
      total_amount: payment_invoice.original_amount,applied_amount: payment_invoice.applied_amount,
      id: payment_invoice.id
    }
    invoice_hash.merge!({ledger_id: ledger.id, ledger_applied_amount: ledger.applied_amount.to_f.round(2)})
    invoices << invoice_hash
  end
end

So what developer did here is that. He fetched all the ledgers with a payment and eager loaded invoices so he do not needs to hit a query again on DB.He looping over them as he needs data from 2 other tables also – invoices and customer_informations.

But, this approach have below issue:
It will return say 500 ledgers and than 500 invoices also. So there will be 1000 object occupying your memory. Worst part is you are looping over the ledgers, so the memory will be not released till process complete.

The optimized code look as below:

# Find Payment's Ledger Invoices whose remaining amount greater than zero
# Query will pluck required columns from all the three tables - Ledger, Invoice, CustomerInformation
invoices_data = Ledger.includes(invoice: :customer_information)
                .where("ledgers.payment_id = ?  and matched = ? and invoices.remaining_amount > ?", payment.id, true, 0)
                .pluck('ledgers.id', 'ledgers.applied_amount', 'customer_informations.legal_name', 'customer_informations.name',
                 'invoices.id', 'invoices.number', 'invoices.date', 'invoices.remaining_amount', 'invoices.original_amount', 'invoices.applied_amount'
                )
#Convert invoices_data to array of hash
invoices_data.each do |inv_data|
  invoices << {
    ledger_id: inv_data[0], 
    ledger_applied_amount: inv_data[1].to_f.round(2),
    customer_name: inv_data[2].present? ? inv_data[2] : inv_data[3],
    id: inv_data[4],
    number: inv_data[5], 
    date: inv_data[6],
    remaining_amount: inv_data[7].to_f.round(2),
    total_amount: inv_data[8].to_f.round(2),
    applied_amount: inv_data[9].to_f.round(2)
  }

The above Code with includes generate below sql query:
#The Sql Query above will generate is:
SELECT ledgers.applied_amount, invoices.number, customer_informations.legal_name FROM “ledgers” LEFT OUTER JOIN “invoices” ON “invoices”.”id” = “ledgers”.”invoice_id” LEFT OUTER JOIN “customer_informations” ON “customer_informations”.”id” = “invoices”.”customer_information_id” WHERE “ledgers”.”payment_id” = $1 [[“payment_id”, 4]]

So what we do is:

  1. we removed any looping in ruby code.
  2. We fetched the needed data from 3 tables from DB query itself.
  3. We are not fetching ActiveRecord object but set of info as strings.

You can see that we manged to write DB query with ActiveRecord itself. We do not need any raw sql here…. Rails is great !

Case 2 : Show all Remittances with a payment on UI

Existing Code look as below:

payment_rmh = []
payment.remittances.order('remittance_sequence_number').each do |rmh|
  invoice_id = Invoice.find_by('lower(number) = ? ',rmh.name.downcase).try(:id) if rmh.name.present?
  payment_rmh << {name: rmh.name, amount: rmh.amount,invoice_id: invoice_id}
end

So what developer is doing here is first finding all remittances with Payment, looping over each remittance. Finding Invoice whose number match with remittance name.

Problem here is that Your DB query will become proportional to number of remittances. If 1000 remittances you will make 1000 call to DB. Your system start chocking with memory and execution time will increase.

NOTE: Never ever make DB query within a loop

Now, lets try to optimized the code by eager loading approach in last case:

Remittance.includes(invoice: :customer_information)
Remittance Load (31.0ms) SELECT “remittances”.* FROM “remittances”
ActiveRecord::AssociationNotFoundError: Association named ‘invoice’ was not found on Remittance; perhaps you misspelled it?

So we can see that, includes will not work here unlike last case, as there is no direct association between remittances and invoices. So ActiveRecord failed to generated the Query for us.

Here we need to move out of ActiveRecord queries and build our own raw sql query.

The optimized code look as below:

# Find Payment RMH
rmh_raw_query = "SELECT ri.name, ri.amount, inv.id FROM remittances ri LEFT JOIN invoices inv ON LOWER(ri.name) = LOWER(inv.number) WHERE ri.payment_id = #{payment.id} ORDER BY remittance_sequence_number ASC"
rmh_records = ActiveRecord::Base.connection.exec_query(rmh_raw_query).rows
payment_rmh = rmh_records.map{|rme| {name: rme[0], amount: rme[1].to_f.round(2),invoice_id: rme[2]}}

Here the way to execute raw sql query with rails to call the exec_query method of ActiveRecord::Base.connection.

We have called row method on result returned by exec_query method which basically will return all column you need as an array.

With the above optimization :

  1. 1000 remittances took less than half seconds as compared to 20 seconds earlier.
  2. It will be independent of number of children.Unlike earlier approach, it will always take less than 1 second even if no of remittances reach 1 lakhs or more.
  3. There will be no load on memory.

Author: arunyadav4u

over 10 years experience in web development with Ruby on Rails.Involved in all stage of development lifecycle : requirement gathering, planing, coding, deployment & Knowledge transfer. I can adept to any situation, mixup very easily with people & can be a great friend.

One thought on “raw sql for rails joins

  1. Pingback: 週刊Railsウォッチ(20181009)Rails 6の新機能:WYSIWYGエディタ「Action Text」、Rails 6の青写真スライド、Apache POIはスゴイほか

Leave a comment