1

I would like ask, how can I count duplicate values?
which format : USER, ITEM, EVENT
I want to count, how many times every item is shown.
Here are some examples:

US50137,IT1548,7), (US42215,IT6298,7), (US98606,IT5305,7), (US34696,IT5914,7), (US74972,IT2796,7), (US1729,IT7696,7), (US76310,IT9790,7), (US49102,IT6487,7), (US25430,IT7901,7), (US50600,IT4156,7), (US65972,IT9830,7), (US50879,IT1902,7), (US36024,IT6484,7), (US46284,IT3281,7), (US55565,IT5303,7), (US18932,IT2025,7), (US39467,IT8677,7), (US12477,IT9678,7), (US94819,IT8427,7), (US19956,IT1402,7), (US41507,IT3624,7), (US845,IT4823,7), (US18860,IT7860,7), (US68784,IT4759,7), (US79752,IT421,7), (US18563,IT5329,7), (US79628,IT2351,7), (US83729,IT6082,7), (US61097,IT9643,7), (US69368,IT3162,7), (US59566,IT814,7), (US9726,IT7519,7), (US1157,IT5908,7), (US1176,IT3981,7), (US79409,IT8578,7), (US11786,IT5147,7), (US88604,IT8501,7), (US6857,IT2333,7), (US82349,IT6143,7), (US27666,IT9085,7), (US90508,IT352,7), (US48578,IT4503,7), (US14526,IT9551,7), (US29031,IT1992,7), (US57012,IT4353,7), (US97235,IT77,7), (US88666,IT2715,7), (US31035,IT7865,7), (US45054,IT6664,7), (US92069,IT9951,7), (US27175,IT913,7), (US60402,IT8480,7), (US28426,IT9309,7), (US23641,IT4518,7), (US10889,IT7348,7), (US16950,IT6087,7), (US68766,IT683,7), (US87726,IT7594,7), (US63638,IT8101,7), (US78079,IT4344,7), (US47257,IT3315,7), (US3915,IT8971,7), (US59440,IT3441,7), (US64466,IT3980,7), (US79624,IT3502,7), (US29356,IT6778,7)


From this link :
Scala how can I count the number of occurrences in a list

My code :

val RATING_SPLITER = N1.map(
      {
        baris => (
          baris(0),
          baris(1),
          baris(2) match {
            case "read" => 10
            case "play" => 6
            case "share" => 7
          }
          )
      }
    ).take(1000)
val MM = RATING_SPLITER.groupBy(kk => kk._2).map(x1 => (x1._2))
    MM.foreach(println)

and then, the output below :

[Lscala.Tuple3;@fd53053
[Lscala.Tuple3;@4527f70a
[Lscala.Tuple3;@707b1a44
[Lscala.Tuple3;@7132a9dc
[Lscala.Tuple3;@57435801
[Lscala.Tuple3;@2da66a44
[Lscala.Tuple3;@527fc8e
[Lscala.Tuple3;@61bfc9bf
[Lscala.Tuple3;@2c7106d9
[Lscala.Tuple3;@329bad59


Any idea, why the output looks like that? and is it my code correct to count duplicate values?

1
  • 1
    Try printing the Tuple field by field instead MM.foreach(tup => println(tup._1 + tup._2 ...)) instead of throwing the entire object into the output. Commented Aug 23, 2016 at 8:56

1 Answer 1

4

You should map the values resulting from the groupBy to their size - groupBy creates key-value pairs where the value is the collection of all items with same key, you're only interested in the size of that collection:

// sample data:
val RATING_SPLITER = List(("A", "b", 4), ("A", "b", 5), ("A", "c", 6), ("A", "e", 7))

val result: Map[String,Int] = RATING_SPLITER.groupBy(_._2).mapValues(_.size)
result.foreach(println)
// prints:
// (e,1)
// (b,2)
// (c,1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.