1

Here is my Spark data frame output of dfMainOutput.

4295858898,177,SelfSourcedPublic,INC,Cost of sales,Umsatzkosten,,ECOR,false,,,,,false,False,,,,505096,505074,505074,505096,505096,,505074,False,,3014830,,I|!|

Now i want to replace , with |^| and dropping one column DataPartition

This is what I am doing:

val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

val headerColumn = df.columns.filter(v => (!v.contains("^") && !v.contains("_c"))).toSeq

val header = headerColumn.dropRight(1).mkString("", "|^|", "|!|")

val dfMainOutputFinalWithoutNull = dfMainOutputFinal.withColumn("concatenated", regexp_replace(col("concatenated"), "null", "")).withColumnRenamed("concatenated", header)


dfMainOutputFinalWithoutNull.repartition(1).write.partitionBy("DataPartition","StatementTypeCode")
  .format("csv")
  .option("nullValue", "")
  .option("header", "true")
  .option("codec", "gzip")
  .save("s3://trfsmallfffile/FinancialLineItem/output")

With this code the output below is generated:

4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^|ECOR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3014830|^|I|!|

Where empty elements are missing. I would like it to be:

4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^||^|ECOR|^|False|^||^||^||^||^|False|^|False|^||^||^||^|505096|^|505074|^|505074|^|505096|^|505096|^||^|505074|^|False|^||^|3014830|^||^|I|!|

Also in the data frame output i am getting false where as we want False Please help me what II am missing ..

Here is my schema

root
 |-- LineItem_organizationId: long (nullable = true)
 |-- LineItem_lineItemId: integer (nullable = true)
 |-- DataPartition: string (nullable = true)
 |-- StatementTypeCode: string (nullable = true)
 |-- LineItemName: string (nullable = true)
 |-- LocalLanguageLabel: string (nullable = true)
 |-- FinancialConceptLocal: string (nullable = true)
 |-- FinancialConceptGlobal: string (nullable = true)
 |-- IsDimensional: boolean (nullable = true)
 |-- InstrumentId: string (nullable = true)
 |-- LineItemSequence: string (nullable = true)
 |-- PhysicalMeasureId: string (nullable = true)
 |-- FinancialConceptCodeGlobalSecondary: string (nullable = true)
 |-- IsRangeAllowed: boolean (nullable = true)
 |-- IsSegmentedByOrigin: string (nullable = true)
 |-- SegmentGroupDescription: string (nullable = true)
 |-- SegmentChildDescription: string (nullable = true)
 |-- SegmentChildLocalLanguageLabel: string (nullable = true)
 |-- LocalLanguageLabel_languageId: string (nullable = true)
 |-- LineItemName_languageId: string (nullable = true)
 |-- SegmentChildDescription_languageId: string (nullable = true)
 |-- SegmentChildLocalLanguageLabel_languageId: string (nullable = true)
 |-- SegmentGroupDescription_languageId: string (nullable = true)
 |-- SegmentMultipleFundbDescription: string (nullable = true)
 |-- SegmentMultipleFundbDescription_languageId: string (nullable = true)
 |-- IsCredit: string (nullable = true)
 |-- FinancialConceptLocalId: string (nullable = true)
 |-- FinancialConceptGlobalId: string (nullable = true)
 |-- FinancialConceptCodeGlobalSecondaryId: string (nullable = true)
 |-- FFAction: string (nullable = true)

In the data DataPartition=SelfSourcePublic and StatementTypeCode=INC

Output from dfMaainOutput

+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+
|LineItem_organizationId|LineItem_lineItemId|DataPartition    |StatementTypeCode|LineItemName                                                                              |LocalLanguageLabel                                                                                    |FinancialConceptLocal|FinancialConceptGlobal|IsDimensional|InstrumentId|LineItemSequence|PhysicalMeasureId|FinancialConceptCodeGlobalSecondary|IsRangeAllowed|IsSegmentedByOrigin|SegmentGroupDescription|SegmentChildDescription|SegmentChildLocalLanguageLabel|LocalLanguageLabel_languageId|LineItemName_languageId|SegmentChildDescription_languageId|SegmentChildLocalLanguageLabel_languageId|SegmentGroupDescription_languageId|SegmentMultipleFundbDescription|SegmentMultipleFundbDescription_languageId|IsCredit|FinancialConceptLocalId|FinancialConceptGlobalId|FinancialConceptCodeGlobalSecondaryId|FFAction|
+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+
|4295858898             |707                |SelfSourcedPublic|INC              |Revenue from long-term construction contracts                                             |Erlöse aus langfristigen Fertigungsaufträgen                                                          |null                 |ROBR                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3015278                 |null                                 |I|!|    |
|4295858898             |3289               |SelfSourcedPublic|INC              |Balancing Item - Net Income available to Controlling Interest                             |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295858922             |808                |SelfSourcedPublic|INC              |Income Taxes - Total                                                                      |Ertragsteuern                                                                                         |null                 |XTAX                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |False   |null                   |3019589                 |null                                 |I|!|    |
|4295858922             |1507               |SelfSourcedPublic|INC              |Balancing Item - Operating Expenses                                                       |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295858951             |1574               |SelfSourcedPublic|INC              |Admin/General Expenses                                                                    |null                                                                                                  |null                 |ESGA                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018991                 |null                                 |I|!|    |
|4295859007             |1645               |SelfSourcedPublic|INC              |Exploration Expenses - Balancing value                                                    |null                                                                                                  |null                 |EEXP                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018916                 |null                                 |I|!|    |
|4295859038             |954                |SelfSourcedPublic|INC              |Sale Investments                                                                          |null                                                                                                  |null                 |EGFA                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018929                 |null                                 |I|!|    |
|4295859038             |1967               |SelfSourcedPublic|INC              |Restructuring Charges/Provisions                                                          |Ergebnis aus Umstrukturierungen                                                                       |null                 |ERES                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |False   |null                   |3018980                 |null                                 |I|!|    |
|4295859038             |1996               |SelfSourcedPublic|INC              |Diluted Weighted Average Shares on Instrument Level multiplied to its Participation Factor|null                                                                                                  |null                 |DWASEPFI              |false        |8590926849  |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |1002023919              |null                                 |I|!|    |
|4295859045             |864                |SelfSourcedPublic|INC              |Results of valuation gains/losses and disposals of non-current securities                 |Ergebnis aus Kursänderungen und Abgängen von Wertpapieren des langfristigen Finanzvermögens („@FVTPL“)|null                 |EGIT                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018932                 |null                                 |I|!|    |
|4295859045             |1092               |SelfSourcedPublic|INC              |Excep. Depreciation                                                                       |null                                                                                                  |null                 |EGLO                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018938                 |null                                 |I|!|    |
|4295859071             |1840               |SelfSourcedPublic|INC              |Other Operating Expense                                                                   |null                                                                                                  |null                 |EOOE                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018974                 |null                                 |I|!|    |
|4295859078             |914                |SelfSourcedPublic|INC              |Balancing Item - Non Operating Income/(Expense), net                                      |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295859106             |514                |SelfSourcedPublic|INC              |Personnel Expenses                                                                        |null                                                                                                  |null                 |ELAS                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018944                 |null                                 |I|!|    |
|4295859106             |903                |SelfSourcedPublic|INC              |Balancing Item - Non Operating Income/(Expense), net                                      |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295859216             |499                |SelfSourcedPublic|INC              |BC - Depreciation of Fixed Assets                                                         |null                                                                                                  |null                 |BCDEP                 |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505084                       |505074                 |505074                            |505084                                   |505084                            |null                           |505074                                    |False   |null                   |1002023928              |null                                 |I|!|    |
|4295859236             |172                |SelfSourcedPublic|INC              |Total Revenue                                                                             |Ventes                                                                                                |null                 |XTLR                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3016345                 |null                                 |I|!|    |
|4295859241             |492                |SelfSourcedPublic|INC              |Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level)        |null                                                                                                  |null                 |XNCNDI                |false        |8589989623  |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |1001214357              |null                                 |I|!|    |
|4295859279             |124                |SelfSourcedPublic|INC              |Income Available to Com Excl ExtraOrd                                                     |Toerekenbaar aan de aandeelhouders van de moederonderneming                                           |null                 |XNCN                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505084                       |505074                 |505074                            |505084                                   |505084                            |null                           |505074                                    |True    |null                   |3016316                 |null                                 |I|!|    |
|4295859298             |488                |SelfSourcedPublic|INC              |Other operating income/expenses                                                           |Other operating expenses                                                                              |null                 |EOIE                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3018969                 |null                                 |I|!|    |
+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+

Output After Code

val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

Here is output

+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|DataPartition    |StatementTypeCode|concatenated                                                                                                                                                                                                                                                                                                            |
+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|SelfSourcedPublic|INC              |4295858898|^|707|^|INC|^|Revenue from long-term construction contracts|^|Erlöse aus langfristigen Fertigungsaufträgen|^|ROBR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3015278|^|I|!|                                                                                      |
|SelfSourcedPublic|INC              |4295858898|^|3289|^|INC|^|Balancing Item - Net Income available to Controlling Interest|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!|                                                                                                                            |
|SelfSourcedPublic|INC              |4295858922|^|808|^|INC|^|Income Taxes - Total|^|Ertragsteuern|^|XTAX|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3019589|^|I|!|                                                                                                                                             |
|SelfSourcedPublic|INC              |4295858922|^|1507|^|INC|^|Balancing Item - Operating Expenses|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!|                                                                                                                                                      |
|SelfSourcedPublic|INC              |4295859236|^|172|^|INC|^|Total Revenue |^|Ventes|^|XTLR|^|false|^|false|^|False|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3016345|^|I|!|                                                                                                                                                           |
|SelfSourcedPublic|INC              |4295859241|^|492|^|INC|^|Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level) |^|XNCNDI|^|false|^|8589989623|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|1001214357|^|I|!|                                                                                     |
|SelfSourcedPublic|INC              |4295859279|^|124|^|INC|^|Income Available to Com Excl ExtraOrd|^|Toerekenbaar aan de aandeelhouders van de moederonderneming|^|XNCN|^|false|^|false|^|False|^|505084|^|505074|^|505074|^|505084|^|505084|^|505074|^|True|^|3016316|^|I|!|                                                                               |
|SelfSourcedPublic|INC              |4295859298|^|488|^|INC|^|Other operating income/expenses|^|Other operating expenses|^|EOIE|^|false|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3018969|^|I|!|                                                                                                                                |
+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

So just be make clear for example 4295858898 |3289 null value in the column LocalLanguageLabel is gone

I dont know how it is missed ...

3
  • 2
    This is exactly what it produces for me, when I execute your code. val test = (432343, 177, "Cost of sales", "Umsatzkosten", "" ,"Ecor", false, "", "", "", false) and val df = List(test).toDF() and df.select(concat_ws("|^|", df.schema.fieldNames.map(c => col(c)): _*).as("concatenated")).collect().head result in [432343|^|177|^|Cost of sales|^|Umsatzkosten|^||^|Ecor|^|false|^||^||^||^|false]. Could you clarify what the issue is? Commented Oct 28, 2017 at 9:25
  • @Anupam I use spark version 2.2.0 with scala 2.11.8 and Java 1.8.0_152. Could you try with the code from my comment and see if you can reproduce your problem with it? Then it looks like a problem with your version. You would need to replace "DataPartition" with "_3" and "StatementTypeCode" with "_4" in my example. Commented Oct 28, 2017 at 12:16
  • I am also using same version but not sure what am i missing Commented Oct 28, 2017 at 13:01

1 Answer 1

2

Your main culprit is that you have null values in your dataframe and concat_ws is filtering out all the null values. So the solution is to replace all null values to "" which should solve your issue. It won't be a problem as you have all the null datatypes in schema as string.

So, replacing the following

val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

with

val dfMainOutputFinal = dfMainOutput.na.fill("").select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

should solve your issue

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.