2

I am trying to read data from a table that is in a csv file. It does not have a header so when I try and query the table using Spark SQL, all the results are null.

I have tried creating a schema struct, and while it does display when I do printschema(), when I try and ( select * from tableName ) it does not work, all values are null. I have also tried the StructType() and .add( colName ) instead of StructField and that yielded the same results.

        val schemaStruct1 = StructType(
            StructField( "AgreementVersionID", IntegerType, true )::
            StructField( "ProgramID", IntegerType, true )::
            StructField( "AgreementID", IntegerType, true )::
            StructField( "AgreementVersionNumber", IntegerType, true )::
            StructField( "AgreementStatusID", IntegerType, true )::
            StructField( "AgreementEffectiveDate", DateType, true )::
            StructField( "AgreementEffectiveDateDay", IntegerType, true )::
            StructField( "AgreementEndDate", DateType, true )::
            StructField( "AgreementEndDateDay", IntegerType, true )::
            StructField( "MasterAgreementNumber", IntegerType, true )::
            StructField( "MasterAgreementEffectiveDate", DateType, true )::
            StructField( "MasterAgreementEffectiveDateDay", IntegerType, true )::
            StructField( "MasterAgreementEndDate", DateType, true )::
            StructField( "MasterAgreementEndDateDay", IntegerType, true )::
            StructField( "SalesContactName", StringType, true )::
            StructField( "RevenueSubID", IntegerType, true )::
            StructField( "LicenseAgreementContractTypeID", IntegerType, true )::Nil
        )

        val df1 = session.read
            .option( "header", true )
            .option( "delimiter", "," )
            .schema( schemaStruct1 )
            .csv( LicenseAgrmtMaster )
        df1.printSchema()
        df1.createOrReplaceTempView( "LicenseAgrmtMaster" )

Printing this schema gives me this schema which is correct

root
 |-- AgreementVersionID: integer (nullable = true)
 |-- ProgramID: integer (nullable = true)
 |-- AgreementID: integer (nullable = true)
 |-- AgreementVersionNumber: integer (nullable = true)
 |-- AgreementStatusID: integer (nullable = true)
 |-- AgreementEffectiveDate: date (nullable = true)
 |-- AgreementEffectiveDateDay: integer (nullable = true)
 |-- AgreementEndDate: date (nullable = true)
 |-- AgreementEndDateDay: integer (nullable = true)
 |-- MasterAgreementNumber: integer (nullable = true)
 |-- MasterAgreementEffectiveDate: date (nullable = true)
 |-- MasterAgreementEffectiveDateDay: integer (nullable = true)
 |-- MasterAgreementEndDate: date (nullable = true)
 |-- MasterAgreementEndDateDay: integer (nullable = true)
 |-- SalesContactName: string (nullable = true)
 |-- RevenueSubID: integer (nullable = true)
 |-- LicenseAgreementContractTypeID: integer (nullable = true)

which is correct however trying to query this gives me a table yielding only null values even though the table is not filled with nulls. I need to be able to read this table in order to join to another to complete a stored procedure

1 Answer 1

3

I would suggest go with steps below then you can change your code based on your need

val df = session.read.option( "delimiter", "," ).csv("<Path of your file/dir>")
val colum_names = Seq("name","id")// this is example define exact number of columns
val dfWithHeader = df.toDF(colum_names:_*)
// now you have header here and data should be also here check the type or you can cast
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.