How do I use an array of an array of strings to match another string?

Question

I have a list of messy titles (lets say 1000 of them). These titles I want to analyze for "keywords" that match a small number of genres that I have created (the titles arent a model, but the genres are).

For example, say the first title string is "awesome playlist of house, EDM and ambient"

Now, say I also have 15 Genres, each with an attribute name

My end goal is I want to assign genres to that title string. This is easy enough by doing some string normalization, and then using .include?

But it doesnt help if there are synonyms. For example, my @genre.name is called chill, which SHOULD apply to ambient on the string above. Likewise, my @genre.name for dance music is called dance, and should include EDM in the string above (edm = electronic dance music)

So what I'd love to do is for each genre add in 10 or so synonyms so it can check for those as well.

The problem is I'm not sure how to go about doing this in the loop.. I guess a loop inside a loop?

This is my code for a 'single level', without synonyms

  def determine_genres(title)
    relevant_genres = []
    @genres.each do |genre|
      if normalize_string(title).include? normalize_string(genre.name)
        relevant_genres << genre.id
      end
    end
    relevant_genres
  end

Too verbose. Make it shorter. Just say what is the input, and what output you want. — SwiftMango
– SwiftMango, Commented Oct 19, 2012 at 23:46

Steph Rose · Accepted Answer · 2012-10-20 00:25:24Z

1

You're definitely on the right track when you say array of array of strings. I'd structure it more like:

genres = {
    'chill' => ['ambient','mood','chill'],
    'dance' => ['edm','trance','house',]
}

etc. so, each key in the hash is the name of the @genre.name, and the corresponding array is a list of all of the possible synonyms / subgenres for that @genre.

In ruby, there is a nifty array method that using & allows you to "intersect" two arrays and find the common values. Like so:

[1,2,3,4,5] & [0,3,5,6,8]  OUTPUT: [3,5]

See more here: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-26

If you intersect the normalized sentence and the array of all of the key terms, then you can say if the length of the outputted intersected array is > 0, then there were key terms that matched that genre and that genre is relevant.

So you would edit the loop as such (using the genres hash of arrays above):

def determine_genres(title)
  relevant_genres = []
  genres.each do |genre, terms|
    intersecting_terms = normalize_string(title) & terms
    if intersecting_terms.length > 0
      relevant_genres << Genre.find_by_name(genre).id
    end
  end
  relevant_genres
end

You could also have a field in the DB for the Genre model that stores the hash / array of synonomous terms.

answered Oct 20, 2012 at 0:25

Steph Rose

2,1363 gold badges23 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steph Rose Over a year ago

If you need any clarifications let me know.

yeyo · Accepted Answer · 2012-10-19 23:51:03Z

0

mmm ok

what do you think about this approach, for each genre you will take a generic name (like ambient) and for each synonyms you will associate them with a hash. ie

hsh = {"chill" => "ambient",
 "chillout" => "ambient",
 "chilloff" => "ambient",
 "ambient" => "ambient",
 "trance"  => "electronic"
}

#then you just need to check the Hash like this:

puts hsh['chill']  #=> ambient
puts hsh['chillout'] #= ambient
puts hsh['trance'] #=> electronic

the down side is that you need to write down all these synonyms.

answered Oct 19, 2012 at 23:51

yeyo

3,0292 gold badges32 silver badges40 bronze badges

Comments

sawa · Accepted Answer · 2012-10-20 00:25:11Z

0

For each synonym, create an instance of Genre with its name being the synonym and the id being the same one as the representative one.

I am not sure if your structure is the most effective, but using it, you can still refactor it as this:

def determine_genres(title)
  title = normalize_string(title)
  @genres.select{|genre| title.include? normalize_string(genre.name)}.map(&:id)
end

answered Oct 20, 2012 at 0:25

sawa

169k51 gold badges287 silver badges398 bronze badges

Collectives™ on Stack Overflow

How do I use an array of an array of strings to match another string?

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related