4

I am trying to store the results from my scrapping exercice into a CSV file.

The current CSV file gives me the following output :

Name of Movie 1

Rating 1

Name of Movie 2 

Rating 2     

I would like to get the following output :

Name of Movie 1 Rating 1 

Name of Movie 2 Rating 2 

Here is my code, I guess it has to deal with the row / column separator :

require 'open-uri'
require 'nokogiri'
require 'csv'

array = []


for i in 1..10
  url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
  html_file = open(url).read
  html_doc = Nokogiri::HTML(html_file)


  html_doc.search('.img_side_content').each do |element|
    array << element.search('.no_underline').inner_text
    element.search('.note').each do |data|
      array << data.inner_text
    end
  end
end

puts array


csv_options = { row_sep: ',', force_quotes: true, quote_char: '"' }
filepath    = 'allocine.csv'

CSV.open(filepath, 'wb', csv_options) do |csv|
  array.each { |item| csv << [item] }
end
1
  • Hi @Pierrre, there are 2 types of rating: Presse and Spectateurs. What is your expected output? Coco,4.1,4.6? Commented May 23, 2018 at 13:09

1 Answer 1

2

I think the problem here is that you are not pushing the elements correctly into your array variable. Basically, your array ends up looking like this:

['Movie 1 Title', 'Movie 1 rating', 'Movie 2 Title', 'Movie 2 rating', ...]

What you actually want is an array of arrays, like so:

[
  ['Movie 1 Title', 'Movie 1 rating'],
  ['Movie 2 Title', 'Movie 2 rating'],
  ...
]

And once your array is correctly set, you don't even need to specify a row separator in your CSV options.

The following should do the trick:

require 'open-uri'
require 'nokogiri'
require 'csv'

array = []


10.times do |i|
  url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
  html_file = open(url).read
  html_doc = Nokogiri::HTML(html_file)


  html_doc.search('.img_side_content').each do |element|
    title = element.search('.no_underline').inner_text.strip
    notes = element.search('.note').map { |note| note.inner_text }
    array << [title, notes].flatten
  end
end

puts array

filepath    = 'allocine.csv'
csv_options = { force_quotes: true, quote_char: '"' }

CSV.open(filepath, 'w', csv_options) do |csv|
  array.each do |item|
    csv << item
  end
end

( I also took the liberty of changing your for loop to a times, which is more ruby-like ;) )

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.