Scala : How to use variable in for loop outside loop block

Question

How can I create Dataframe with all my json files, when after reading each file I need to add fileName as field in dataframe? it seems Variable in for loop is not recognized outside loop. How to overcome this issue?

for (jsonfilenames <- fileArray) {
      var df = hivecontext.read.json(jsonfilename)
      var tblLanding = df.withColumn("source_file_name", lit(jsonfilename))

    }

   // trying to create temp table from dataframe created in loop

tblLanding.registerTempTable("LandingTable") // ERROR here, can't resolved tblLanding

Thank in advance
Hossain

Abhishek Anand · Accepted Answer · 2016-12-07 06:22:05Z

3

I think you are new to programming itself. Anyways here you go.

Basically you specify the type and initialise it before loop.

var df:DataFrame = null
for (jsonfilename <- fileArray) {
      df = hivecontext.read.json(jsonfilename)
      var tblLanding = df.withColumn("source_file_name", lit(jsonfilename))

    }

df.registerTempTable("LandingTable") // Getting ERROR here

Update

Ok you are completely new to programming, even loops.

Suppose fileArray is having values as [1.json, 2.json, 3.json, 4.json]

So, the loop actually created 4 dataframe, by reading 4 json files. Which one you want to register as temp table.

If all of them,

var df:DataFrame = null
var count = 0
for (jsonfilename <- fileArray) {
      df = hivecontext.read.json(jsonfilename)
      var tblLanding = df.withColumn("source_file_name", lit(jsonfilename))
      df.registerTempTable(s"LandingTable_$count")
      count++;
    }

And reason for df being empty before this update is, your fileArray is empty or Spark failed to read that file. Print it and check.

To query any of those registered LandingTable

val df2 = hiveContext.sql("SELECT * FROM LandingTable_0")

Update Question has changed to making a single dataFrame from all the json files.

var dataFrame:DataFrame = null
for (jsonfilename <- fileArray) {
   val eachDataFrame = hivecontext.read.json(jsonfilename)
   if(dataFrame == null)
      dataFrame = eachDataFrame
   else
      dataFrame = eachDataFrame.unionAll(dataFrame)
}
dataFrame.registerTempTable("LandingTable")

Insure, that fileArray is not empty and all json files in fileArray are having same schema.

edited Dec 7, 2016 at 6:22

answered Dec 7, 2016 at 2:44

Abhishek Anand

1,99215 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jhon Over a year ago

Thanks for you response. I believe,

Jhon Over a year ago

Thanks for you response. Please note, I am trying to create temp table out side for loop, with DataFrame created in for loop. I believe, var df:DataFrame =null at beaning are not same that I have created in for loop. I got NullPointerException when tried df.show(). Expecting prudent response.

Jhon Over a year ago

Thanks again for your time. I feel, I could not make you understand my query. First, I need ONE table with all json files. Secondly, I need to access that table out of for loop block. Note, I can't avoid loop to add fileName as DF column.

Shyamendra Solanki · Accepted Answer · 2016-12-07 06:49:27Z

2

// Create list of dataframes with source-file-names
val dfList = fileArray.map{ filename =>
  hivecontext.read.json(filename)
             .withColumn("source_file_name", lit(filename))
}

// union the dataframes (assuming all are same schema)
val df = dfList.reduce(_ unionAll _)  // or use union if spark 2.x

// register as table
df.registerTempTable("LandingTable")

answered Dec 7, 2016 at 6:49

Shyamendra Solanki

8,8512 gold badges33 silver badges25 bronze badges

2 Comments

Jhon Over a year ago

Thanks, @Shyamendra Solanki !! this is what I was looking for. I have tested your code.It works !!!

Shyamendra Solanki Over a year ago

Glad it was helpful. Please consider accepting the answer: stackoverflow.com/help/accepted-answer

Collectives™ on Stack Overflow

Scala : How to use variable in for loop outside loop block

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related