2

I'm trying to create objects from each recurring set in the text below (an .srt subtitle file):

1
00:02:12,446 --> 00:02:14,406
The Hovitos are near.

2
00:02:15,740 --> 00:02:18,076
The poison is still fresh,
three days.

3
00:02:18,076 --> 00:02:19,744
They're following us.

For example, I could take the three or four lines and assign them to attributes of the new object. So for the first set, I could have Sentence.create(number: 1, time_marker: '00:02:12', content: "The Hovitos are near.")

Start with script.each_line, and what other general structure might put me on the right track? I'm having a hard time with this and any help would be fantastic!

Edit

Some of the messy unfinished code I have so far is below. It actually works (I think). Would you have taken a totally different route? I don't have any experience with this.

number = nil
time_marker = nil
content = []

script = script.strip
script.each_line do |line|
  line = line.strip
  if line =~ /^\d+$/
    number = line.to_i
  elsif line =~ /-->/
    time_marker = line[0..7]
  elsif line =~ /^\b\D/
    content << line
  else
    if content.size > 1
      content = content.join("\n") 
    else
      content = content[0]
    end

    Sentence.create(movie: @movie, number: number, 
      time_marker: time_marker, content: content)
    content = []
  end
end

2 Answers 2

1

Here is a way you can do it:

File.read('subtitles.srt').split(/^\s*$/).each do |entry| # Read in the entire text and split on empty lines
  sentence = entry.strip.split("\n")
  number = sentence[0] # First element after empty line is 'number'
  time_marker =  sentence[1][0..7] # Second element is 'time_marker'
  content = sentence[2..-1].join("\n") # Everything after that is 'content'
end
Sign up to request clarification or add additional context in comments.

4 Comments

Assuming fixed width of time_marker is wrong, time can be greater than an hour. But, I like the split on whitespace lines. Somehow, didnt occur to me :)
@Stoic, in theory, yes, it's wrong, but I don't know any movies that last longer than 99 hours ;-)
(offtopic) :P try watching a Bollywood movie.
lol, 99 hours? I like both your suggestions, @Stoic and Mischa. Many thanks. The whitespace splitting idea is awesome.
1

Assume that subtitles are in the following variable:

subtitles = %q{1
00:02:12,446 --> 00:02:14,406
The Hovitos are near.

2
00:02:15,740 --> 00:02:18,076
The poison is still fresh,
three days.

3
00:02:18,076 --> 00:02:19,744
They're following us.}

Then, you can do this:

def split_subs subtitles
  grouped, splitted = [], []
  subtitles.split("\n").push("\n").each do |sub|
    if sub.strip.empty?
      splitted.push({
        number: grouped[0],
        time_marker: grouped[1].split(",").first,
        content: grouped[2..-1].join(" ")
      })
      grouped = []
    else
      grouped.push sub.strip
    end
  end
  splitted
end

puts split_subs(subtitles)

# output:
# ➲ ruby 23025546.rb                                  [10:00:07] ▸▸▸▸▸▸▸▸▸▸
# {:number=>"1", :time_marker=>"00:02:12", :content=>"The Hovitos are near."}
# {:number=>"2", :time_marker=>"00:02:15", :content=>"The poison is still fresh, three days."}
# {:number=>"3", :time_marker=>"00:02:18", :content=>"They're following us."}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.