How to Read & Parse CSV Files With Ruby

CSV stands for “Comma-Separated Values”.

It’s a common data format which consist of rows with values separated by commas. It’s used for exporting & importing data.

For example:

You can export your Gmail contacts as a CSV file, and you can also import them using the same format.

This is what a CSV file looks like:

id,name
1,chocolate
2,bacon
3,apple
4,banana
5,almonds

Now you’re going to learn how to use the Ruby CSV library to read & write CSV files.

Ruby CSV Parsing

Ruby comes with a built-in CSV library.

You can read a file directly:

require 'csv'

CSV.read("favorite_foods.csv")

Or you can parse a string with CSV data:

require 'csv'

CSV.parse("1,chocolate\n2,bacon\n3,apple")

The result?

You get a two-dimensional array where every entry is one row in the table.

It looks like this:

[
  ["id", "name"],
  ["1", "chocolate"],
  ["2", "bacon"],
  ["3", "apple"],
  ["4", "banana"],
  ["5", "almonds"]
]

You can use array indices like data[1][1] to work with this data.

But there is a better way!

CSV Options

If your file has headers you can tell the CSV parser to use them.

Like this:

table = CSV.parse(File.read("cats.csv"), headers: true)

Now instead of a multi-dimensional array you get a CSV Table object.

Here’s the description:

“A CSV::Table is a two-dimensional data structure for representing CSV documents. Tables allow you to work with the data by row or column, manipulate the data, and even convert the results back to CSV.”

Given one of these tables, you can get the data you need from any row.

Example:

table[0]["id"]
# "1"

table[0]["name"]
# "chocolate"

Here 0 is the first row, id & name are the column names.

There are two table modes:

  • by_col
  • by_row

By changing the table mode (row by default) you can look at the data from different angles.

For example:

table.by_col[0]
# ["1", "2", "3", "4", "5"]

table.by_col[1]
# ["chocolate", "bacon", "apple", "banana", "almonds"]

Here 0 is the first column, 1 is the second column.

These two methods return a copy of the table.

If you want to make changes to the original table then you can use the by_col! & by_row! methods.

This is going to be more memory-efficient because no copy of the table is created.

How to Use CSV Converters

You may have noticed that we got our id column as an array of strings.

What if we need Integers?

You can get them by calling to_i on each string…

But there is a shortcut!

The Ruby CSV library implements something called converters.

A converter will automatically transform values for you.

For example:

CSV.parse("1,2,3,4,5")
# [["1", "2", "3", "4", "5"]]

CSV.parse("1,2,3,4,5", converters: :numeric)
# [[1, 2, 3, 4, 5]]

There are 6 built-in converters:

  • Integer
  • Float
  • Numeric (Float + Integer)
  • Date
  • DateTime
  • All

But you can also create your own custom converters.

Here’s how:

CSV::Converters[:symbol] = ->(value) { value.to_sym rescue value }

You can use your new converter like this:

CSV.parse("a,b,c", headers: false, converters: :symbol)

# [[:a, :b, :c]]

How to Create a New CSV File

On top of being able to parse & read CSV files in different ways you can also create a CSV from scratch.

This is the easy way:

cats = [
  [:blue, 1],
  [:white, 2],
  [:black_and_white, 3]
]

cats.map { |c| c.join(",") }.join("\n")

You can also use the generate method:

CSV.generate do |csv|
  csv << [:blue, 1]
  csv << [:white, 2]
  csv << [:black_and_white, 3]
end

This prepares the data to be in the right format.

If you want to write to a file you'll have to use something like File.write("cats.csv", data), or instead of generate you can use open with a file name & write mode enabled.

Like this:

CSV.open("cats.csv", "w") do |csv|
  csv << [:white, 2]
end

Now you have a new CSV file!

CSV Gems & Performance

The built-in library is fine & it will get the job done.

But you can also find a few CSV parsing gems with different features.

For example, the smarter_csv gem will convert your CSV data into an array of hashes.

Example:

require 'smarter_csv'

IntegerConverter = Object.new

def IntegerConverter.convert(value)
  Integer(value)
end

SmarterCSV.process('testing.csv', value_converters: { id: IntegerConverter })

# [{:id=>1, :name=>"a"}, {:id=>2, :name=>"b"}, {:id=>3, :name=>"c"}]

Here's a performance comparison:

Comparison:
       CSV:      112.9 i/s
Smarter CSV:     21.7 i/s - 5.21x  slower
   Tabular:      17.3 i/s - 6.52x  slower

Summary

You've learned how to read & write CSV files in Ruby! You've also learned about converters & alternative Ruby gems to process your CSV data.

If you want to process big CSV files (> 10MB) you may want to use the CSV.foreach(file_name) method with a block. This will read one row at a time & use a lot less memory.

Please share this article so more people can find it!