2

How do I construct an array of different types given a comma-separated string and another array dictating the type?


By parsing CSV input taken from stdin, I have an array of column header Symbols:

cols = [:IndexSymbol, :PriceStatus, :UpdateExchange, :Last]

and a line of raw input:

raw = "$JX.T.CA,Open,T,933.36T 11:10:00.000"

I would like to construct an an array, cells from the raw input, where each element of cells is a type identified by the corresponding element in cols. What are the idiomatic Ruby-sh ways of doing this?


I have tried this, which works but doesn't really feel right.

1) First, define a class for each type which needs to be encapsulated:

class Sku
  attr_accessor :mRoot, :mExch,, :mCountry
  def initialize(root, exch, country)
    @mRoot = root
    @mExch = exch
    @mCountry = country
  end
end

class Price
  attr_accessor :mPrice, :mExchange, :mTime
  def initialize(price, exchange, time)
    @mPrice = price
    @mExchange = exchange
    @mTime = time
  end
end

2) Then, define conversion functions for each unique column type which needs to be converted:

def to_sku(raw)
  raw.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
end

def to_price(raw)

end

3) Create an array of strings from the input:

cells = raw.split(",")

4) And finally modify each element of cells in-place by constructing the type dictated by the corresponding column header:

cells.each_index do |i|
    cells[i] = case cols[i]
        when :IndexSymbol
            to_sku(cells[i])
        when :PriceStatus
            cells[i].split(";").collect {|st| st.to_sym}
        when :UpdateExchange
            cells[i]
        when :Last
            cells[i].match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
        else
            puts "Unhandled column type (#{cols[i]}) from input string: \n#{cols}\n#{raw}"
            exit -1
    end
end

The parts that don't feel right are steps 3 and 4. How is this done in a more Ruby fashion? I was imagining some kind of super concise method like this, which exists only in my imagination:

cells = raw.split_using_convertor(",")
3
  • 2
    Be warned that you cannot always parse CSV with .split(","). One common format of CSV allows individual fields to be quoted with " on either side, and requires " characters to be escaped as "". Ruby’s CSV library can handle that: require 'csv', then use CSV.parse or CSV.parse_line. Commented Aug 19, 2013 at 15:09
  • Thanks for the warning, @RoryO'Kane. Im ny particular case the CSV comes from a propietary tool, so I can be sure of the format. Commented Aug 19, 2013 at 15:35
  • 1
    As an aside, you can improve the Ruby "feel" further by sticking to Ruby formatting conventions when possible. Things like 2-space indents (looking @ your last example), :snake_case for symbols, and no use of hungarian notation. Also, if you are going to be creating a lot of PORO (Plain Old Ruby Object)s, then you could use a gem like virtus to get rid of the :attr_accessor/initialize boilerplate and make the attribute types explicit. Commented Aug 19, 2013 at 17:28

4 Answers 4

2

You can make the fourth step simpler with #zip, #map, and destructuring assignment:

cells = cells.zip(cols).map do |cell, col|
    case col
    when :IndexSymbol
        to_sku(cell)
    when :PriceStatus
        cell.split(";").collect {|st| st.to_sym}
    when :UpdateExchange
        cell
    when :Last
        cell.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
    else
        puts "Unhandled column type (#{col}) from input string: \n#{cols}\n#{raw}"
        exit -1
    end
end

I wouldn’t recommend combining that step with the splitting, because parsing a line of CSV is complicated enough to be its own step. See my comment for how to parse the CSV.

Sign up to request clarification or add additional context in comments.

2 Comments

You can just write cells.zip(cols).map do |cell, col| ....
@toro2k Good idea. I hadn’t known that. Fixed.
2

You could have the different types inherit from a base class and put the lookup knowledge in that base class. Then you could have each class know how to initialize itself from a raw string:

class Header
  @@lookup = {}

  def self.symbol(*syms)
    syms.each{|sym| @@lookup[sym] = self}
  end

  def self.lookup(sym)
    @@lookup[sym]
  end
end

class Sku < Header
  symbol :IndexSymbol
  attr_accessor :mRoot, :mExch, :mCountry

  def initialize(root, exch, country)
    @mRoot = root
    @mExch = exch
    @mCountry = country
  end

  def to_s
    "@#{mRoot}-#{mExch}-#{mCountry}"
  end

  def self.from_raw(str)
    str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
  end
end

class Price < Header
  symbol :Last, :Bid
  attr_accessor :mPrice, :mExchange, :mTime

  def initialize(price, exchange, time)
    @mPrice = price
    @mExchange = exchange
    @mTime = Time.new(time)
  end

  def to_s
    "$#{mPrice}-#{mExchange}-#{mTime}"
  end

  def self.from_raw(raw)
    raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
  end
end

class SymbolList
  symbol :PriceStatus
  attr_accessor :mSymbols

  def initialize(symbols)
    @mSymbols = symbols
  end

  def self.from_raw(str)
    new(str.split(";").map(&:to_sym))
  end

  def to_s
    mSymbols.to_s
  end
end

class ExchangeIdentifier
  symbol :UpdateExchange
  attr_accessor :mExch

  def initialize(exch)
    @mExch = exch
  end

  def self.from_raw(raw)
    new(raw)
  end

  def to_s
    mExch
  end
end

Then you can replace step #4 like so (CSV parsing not included):

cells.each_index.map do |i|
  Header.lookup(cols[i]).from_raw(cells[i])
end

7 Comments

You have missed to handle these cases: Header.lookup(:UpdateExchange) and Header.lookup(:PriceStatus).
A name for this type of refactoring is “converting a case statement to polymorphism”. It’s heuristic G23 in the book Clean Code (which is about clean code in Java).
@toro2k I left those out on purpose since he doesn't have example class names/definitions for those symbols, but yeah I guess I should have noted that.
@RoryO'Kane Thanks, I believe that book is on my Amazon wishlist but have yet to purchase it!
Thanks, I think this is probably how I'll go. Will accept after I've written the code.
|
1

Ruby’s CSV library includes support for this sort of thing directly (as well as better handling of the actual parsing), although the docs are a bit awkward.

You need to provide a proc that will do your conversions for you, and pass it as an option to CSV.parse:

converter = proc do |field, info|
  case info.header.strip # in case you have spaces after your commas
  when "IndexSymbol"
      field.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| Sku.new(m[1], m[2], m[3])}
  when "PriceStatus"
      field.split(";").collect {|st| st.to_sym}
  when "UpdateExchange"
      field
  when "Last"
      field.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| Price.new(m[1], m[2], m[3])}
  end
end

Then you can parse it almost directly into the format you want:

c =  CSV.parse(s, :headers => true, :converters => converter).by_row!.map do |row|
  row.map { |_, field| f }  #we only want the field now, not the header
end

Comments

1

@AbeVoelker's answer steered me in the right direction, but I had to make a pretty major change because of something I failed to mention in the OP.

Some of the cells will be of the same type, but will still have different semantics. Those semantic differences don't come in to play here (and aren't elaborated on), but they do in the larger context of the tool I'm writing.

For example, there will be several cells that are of type Price; some of them are :Last, ':Bid, and :Ask. They are all the same type (Price), but they are still different enough so that there can't be a single Header@@lookup entry for all Price columns.

So what I actually did was write a self-decoding class (credit to Abe for this key part) for each type of cell:

class Sku
    attr_accessor :mRoot, :mExch, :mCountry
    def initialize(root, exch, country)
        @mRoot = root
        @mExch = exch
        @mCountry = country
    end

    def to_s
        "@#{mRoot}-#{mExch}-#{mCountry}"
    end

    def self.from_raw(str)
        str.match('(\w+)\.(\w{0,1})\.(\w{,2})') { |m| new(m[1], m[2], m[3])}
    end
end

class Price
    attr_accessor :mPrice, :mExchange, :mTime
    def initialize(price, exchange, time)
        @mPrice = price
        @mExchange = exchange
        @mTime = Time.new(time)
    end
    def to_s
        "$#{mPrice}-#{mExchange}-#{mTime}"
    end
    def self.from_raw(raw)
        raw.match('(\d*\.*\d*)(\w?) (\d{1,2}:\d{2}:\d{2}\.\d{3})') { |m| new(m[1], m[2], m[3])}
    end
end

class SymbolList
    attr_accessor :mSymbols
    def initialize(symbols)
        @mSymbols = symbols
    end
    def self.from_raw(str)
        new(str.split(";").collect {|s| s.to_sym})
    end
    def to_s
        mSymbols.to_s
    end
end

class ExchangeIdentifier
    attr_accessor :mExch
    def initialize(exch)
        @mExch = exch
    end
    def self.from_raw(raw)
        new(raw)
    end
    def to_s
        mExch
    end
end

...Create a typelist, mapping each column identifier to the type:

ColumnTypes =
{
    :IndexSymbol => Sku,
    :PriceStatus => SymbolList,
    :UpdateExchange => ExchangeIdentifier,
    :Last => Price,
    :Bid => Price
}

...and finally construct my Array of cells by calling the appropriate type's from_raw:

cells = raw.split(",").each_with_index.collect { |cell,i|
    puts "Cell: #{cell}, ColType: #{ColumnTypes[cols[i]]}"
    ColumnTypes[cols[i]].from_raw(cell)
}

The result is code that is clean and expressive in my eyes, and seems more Ruby-ish that what I had originally done.

Complete example here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.