0

I'm having following problem: In my android tv app I can add (and later update) epg-sources, as .xml, .gz or .xz file (.gz and .xz are decompressed to .xml). So the user adds an url for a file, it gets downloaded and then parsed and saved to the objectbox-database. I tried the XmlPullParser and Sax-Parser and everything was working fine, for a xml with about 50mb and 700.000 lines (350 channels and about 80.000 programs) it took:

XmlPullParser -> 50 seconds on Emulator, 1min 30sec directly on my TV Sax-Parser -> 55 seconds on Emulator, 1min 50sec directy on my TV

I prefered that it would be a bit faster, but it was ok. Then I first realized that if I update the epg-source (download the xml again, parse it, and add the new epgdata to the ob-db) and navigate in my app in the meantime,

  1. it last much longer (some minutes for both, XmlPullParser and Sax-Parser

  2. the app began to lag while using it and on my TV it crashed also after some time - probably for memory reasons. If I updated the epg-source without doing anything other in my app, that didn't happen.

I noticed two things when "investigating" the Profiler.

  1. While parsing (especially the programs), the garbage collector is called very oftern, between 20-40 times in 5 seconds .
  2. When the process is finished the java part in the memory profiler jumps up to 200mb and needs some time before it gets gc.

I am not sure, but I read that the constantly calling of the garbage collector could cause the lags in my app. So I tried to minimize the object creations, but somehow it didn't change anything (or maybe I didn't it correct). I tested the process also without creating the database Object for the EpgDataOB and therefore also no EpgData was added to the database. But I could see still the many garbage collector call in the Profiler, so my parsing code should be the problem.

The only thing that helped me, was adding a delay of 100ms after each parsed program (logically that's no possible solution as it increases the process time for hours), or reducing the batch size (what also increases the process time, for example: using a batch-size of 500 = processtime on emulator: 2min 10sec and the garbage collector is called about 6-10 times in 5 seconds, reducing the batch to 100 -> emulator = nearly 3min, gc called 4-5 times in 5 seconds).

I'll post both my versions.

XmlPullParser

Repository code:

 var currentChannel: Channel? = null
    var epgDataBatch = mutableListOf<EpgDataOB>()
    val batchSize = 10000

    suspend fun parseXmlStream(
        inputStream: InputStream,
        epgSourceId: Long,
        maxDays: Int,
        minDays: Int,
        sourceUrl: String
    ): Resource<String> = withContext(Dispatchers.Default) {
        try {
            val thisEpgSource = epgSourceBox.get(epgSourceId)
            val factory = XmlPullParserFactory.newInstance()
            val parser = factory.newPullParser()
            parser.setInput(inputStream, null)
            var eventType = parser.eventType
          
            while (eventType != XmlPullParser.END_DOCUMENT) {
                when (eventType) {
                    XmlPullParser.START_TAG -> {
                        when (parser.name) {
                            "channel" -> {
                                parseChannel(parser, thisEpgSource)
                            }
                            "programme" -> {
                                parseProgram(parser, thisEpgSource)
                            }
                        }
                    }
                }
                eventType = parser.next()
            }
            if (epgDataBatch.isNotEmpty()) {
                epgDataBox.put(epgDataBatch)
            }

            assignEpgDataToChannels(thisEpgSource)

            _epgProcessState.value = ExternEpgProcessState.Success
            Resource.Success("OK")
        } catch (e: Exception) {
            Log.d("ERROR PARSING", "Error parsing XML: ${e.message}")
            _epgProcessState.value = ExternEpgProcessState.Error("Error parsing XML: ${e.message}")
            Resource.Error("Error parsing XML: ${e.message}")
        } finally {
            withContext(Dispatchers.IO) {
                inputStream.close()
            }
        }
    }

    private fun resetChannel() {
        currentChannel = Channel("", mutableListOf(), mutableListOf(), "")
    }

    private fun parseChannel(parser: XmlPullParser, thisEpgSource: EpgSource) {
        resetChannel()
        currentChannel?.id = parser.getAttributeValue(null, "id")

        while (parser.next() != XmlPullParser.END_TAG) {
            if (parser.eventType == XmlPullParser.START_TAG) {
                when (parser.name) {
                    "display-name" -> currentChannel?.displayName = mutableListOf(parser.nextText())
                    "icon" -> currentChannel?.icon = mutableListOf(parser.getAttributeValue(null, "src"))
                    "url" -> currentChannel?.url = parser.nextText()
                }
            }
        }

        val channelInDB = epgChannelBox.query(EpgSourceChannel_.chEpgId.equal("${thisEpgSource.id}_${currentChannel?.id}")).build().findUnique()
        if (channelInDB == null) {
            val epgChannelToAdd = EpgSourceChannel(
                0,
                "${thisEpgSource.id}_${currentChannel?.id}",
                currentChannel?.id ?: "",
                currentChannel?.icon,
                currentChannel?.displayName?.firstOrNull() ?: "",
                thisEpgSource.id,
                currentChannel?.displayName ?: mutableListOf(),
                true
            )
            epgChannelBox.put(epgChannelToAdd)
        } else {
            channelInDB.display_name = currentChannel?.displayName ?: channelInDB.display_name
            channelInDB.icon = currentChannel?.icon
            channelInDB.name = currentChannel?.displayName?.firstOrNull() ?: channelInDB.name
            epgChannelBox.put(channelInDB)
        }
    }

    private fun parseProgram(parser: XmlPullParser, thisEpgSource: EpgSource) {

        val start = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
            .parse(parser.getAttributeValue(null, "start"))?.time ?: -1

        val stop = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
            .parse(parser.getAttributeValue(null, "stop"))?.time ?: -1

        val channel = parser.getAttributeValue(null, "channel")

        val isAnUpdate = if (isUpdating) {
            epgDataBox.query(EpgDataOB_.idByAccountData.equal("${channel}_${start}_${thisEpgSource.id}")).build().findUnique() != null
        } else {
            false
        }

        if (!isAnUpdate) {
            val newEpgData = EpgDataOB(
                id = 0, 
                idByAccountData = "${channel}_${start}_${thisEpgSource.id}",
                epgId = channel ?: "",
                chId = channel ?: "",
                datum = SimpleDateFormat("yyyy-MM-dd", Locale.getDefault()).format(start),
                name = "",
                sub_title = "",
                descr = "",
                category = null,
                director = null,
                actor = null,
                date = "",
                country = null,
                showIcon = "",
                episode_num = "",
                rating = "",
                startTimestamp = start,
                stopTimestamp = stop,
                mark_archive = null,
                accountData = thisEpgSource.url,
                epgSourceId = thisEpgSource.id.toInt(),
                epChId = "${thisEpgSource.id}_${channel}"
            )
     
            while (parser.next() != XmlPullParser.END_TAG) {
                if (parser.eventType == XmlPullParser.START_TAG) {
                    when (parser.name) {
                        "title" -> newEpgData.name = parser.nextText()
                        "sub-title" -> newEpgData.sub_title = parser.nextText()
                        "desc" -> newEpgData.descr = parser.nextText()
                        "director" -> newEpgData.director?.add(parser.nextText())
                        "actor" -> newEpgData.actor?.add(parser.nextText())
                        "date" -> newEpgData.date = parser.nextText()
                        "category" -> newEpgData.category?.add(parser.nextText())
                        "country" -> newEpgData.country?.add(parser.nextText())
                        "episode-num" -> newEpgData.episode_num = parser.nextText()
                        "value" -> newEpgData.rating = parser.nextText()
                        "icon" -> newEpgData.showIcon = parser.getAttributeValue(null, "src") ?: ""
                    }
                }
            }

            epgDataBatch.add(newEpgData)
            if (epgDataBatch.size >= batchSize) {
                epgDataBox.put(epgDataBatch)
                epgDataBatch.clear()
            }
        }
    }

    private fun assignEpgDataToChannels(thisEpgSource: EpgSource) {
        epgChannelBox.query(EpgSourceChannel_.epgSourceId.equal(thisEpgSource.id)).build().find().forEach { epgChannel ->
            epgChannel.epgSource.target = thisEpgSource
            epgChannel.epgDataList.addAll(epgDataBox.query(EpgDataOB_.epChId.equal(epgChannel.chEpgId)).build().find())
            epgChannelBox.put(epgChannel)
        }
        epgDataBatch.clear()
    }

Sax Parser

Repository code:

suspend fun parseXmlStream(
        inputStream: InputStream,
        epgSourceId: Long,
        maxDays: Int,
        minDays: Int,
        sourceUrl: String
    ): Resource<String> = withContext(Dispatchers.Default) {
        try {
            val thisEpgSource = epgSourceBox.get(epgSourceId)
            inputStream.use { input ->
                val saxParserFactory = SAXParserFactory.newInstance()
                val saxParser = saxParserFactory.newSAXParser()
                val handler = EpgSaxHandler(thisEpgSource.id, maxDays, minDays, thisEpgSource.url, isUpdating)
                saxParser.parse(input, handler)
                if (handler.epgDataBatch.isNotEmpty()) {
                    epgDataBox.put(handler.epgDataBatch)
                    handler.epgDataBatch.clear()
                }
                _epgProcessState.value = ExternEpgProcessState.Success
                return@withContext Resource.Success("OK")
            }
        } catch (e: Exception) {
            Log.e("ERROR PARSING", "${e.message}")
            _epgProcessState.value = ExternEpgProcessState.Error("Error parsing XML: ${e.message}")
            return@withContext Resource.Error("Error parsing XML: ${e.message}")
        }
    }

Handler:

class EpgSaxHandler(
    private val epgSourceId: Long,
    private val maxDays: Int,
    private val minDays: Int,
    private val sourceUrl: String,
    private val isUpdating: Boolean
) : DefaultHandler() {

    private val epgSourceBox: Box<EpgSource>
    private val epgChannelBox: Box<EpgSourceChannel>
    private val epgDataBox: Box<EpgDataOB>


    init {
        val store = ObjectBox.store
        epgSourceBox = store.boxFor(EpgSource::class.java)
        epgChannelBox = store.boxFor(EpgSourceChannel::class.java)
        epgDataBox = store.boxFor(EpgDataOB::class.java)
    }

    var epgDataBatch = mutableListOf<EpgDataOB>()
    private val batchSize = 10000
    private var currentElement = ""
    private var currentChannel: Channel? = null
    private var currentProgram: EpgDataOB? = null
    private var stringBuilder = StringBuilder()


    override fun startElement(uri: String?, localName: String?, qName: String?, attributes: Attributes?) {
        currentElement = qName ?: ""
        when (qName) {
            "channel" -> {
                val id = attributes?.getValue("id") ?: ""
                currentChannel = Channel(id, mutableListOf(), mutableListOf(), "")
            }
            "programme" -> {

                val start = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
                    .parse(attributes?.getValue("start") ?: "")?.time ?: -1

                val stop = SimpleDateFormat("yyyyMMddHHmmss Z", Locale.getDefault())
                    .parse(attributes?.getValue("stop") ?: "")?.time ?: -1

                val channel = attributes?.getValue("channel") ?: ""

                if (isUpdating) {
                    val existingProgram = epgDataBox.query(EpgDataOB_.idByAccountData.equal("${channel}_${start}_${epgSourceId}",)).build().findUnique()
                    if (existingProgram != null) {
                        currentProgram = null
                        return
                    }
                }
                currentProgram = EpgDataOB(
                    id = 0,
                    idByAccountData = "${channel}_${start}_${epgSourceId}",
                    epgId = channel,
                    chId = channel,
                    datum = SimpleDateFormat("yyyy-MM-dd", Locale.getDefault()).format(start),
                    name = "",
                    sub_title = "",
                    descr = "",
                    category = mutableListOf(),
                    director = mutableListOf(),
                    actor = mutableListOf(),
                    date = "",
                    country = mutableListOf(),
                    showIcon = "",
                    episode_num = "",
                    rating = "",
                    startTimestamp = start,
                    stopTimestamp = stop,
                    mark_archive = null,
                    accountData = sourceUrl,
                    epgSourceId = epgSourceId.toInt(),
                    epChId = "${epgSourceId}_$channel"
                )
            }
            "icon" -> {
                val src = attributes?.getValue("src") ?: ""
                currentChannel?.icon?.add(src)
                currentProgram?.showIcon = src
            }
            "desc", "title", "sub-title", "episode-num", "rating", "country", "director", "actor", "date", "display-name" -> {
                stringBuilder = StringBuilder()
            }
        }
    }

    override fun characters(ch: CharArray?, start: Int, length: Int) {
        ch?.let {
            stringBuilder.append(it, start, length)
        }
    }

    override fun endElement(uri: String?, localName: String?, qName: String?) {
        when (qName) {
            "channel" -> {
                currentChannel?.let { channel ->
                    val channelInDB = epgChannelBox.query(EpgSourceChannel_.chEpgId.equal("${epgSourceId}_${channel.id}")).build().findUnique()
                    if (channelInDB == null) {
                        val newChannel = EpgSourceChannel(
                            id = 0,
                            chEpgId = "${epgSourceId}_${channel.id}",
                            chId = channel.id,
                            icon = channel.icon,
                            display_name = channel.displayName,
                            name = channel.displayName.firstOrNull() ?: "",
                            epgSourceId = epgSourceId,
                            isExternalEpg = true
                        )
                        epgChannelBox.put(newChannel)
                    } else {
                        channelInDB.display_name = channel.displayName
                        channelInDB.icon = channel.icon
                        channelInDB.name = channel.displayName.firstOrNull() ?: channelInDB.name
                        epgChannelBox.put(channelInDB)
                    }
                }
                currentChannel = null
            }
            "programme" -> {
                currentProgram?.let { program ->
                    addEpgDataToBatch(program)
                }
                currentProgram = null
            }
            "desc" -> {
                currentProgram?.descr = stringBuilder.toString()
            }
            "title" -> {
                currentProgram?.name = stringBuilder.toString()
            }
            "sub-title" -> {
                currentProgram?.sub_title = stringBuilder.toString()
            }
            "episode-num" -> {
                currentProgram?.episode_num = stringBuilder.toString()
            }
            "rating" -> {
                currentProgram?.rating = stringBuilder.toString()
            }
            "country" -> {
                currentProgram?.country?.add(stringBuilder.toString())
            }
            "director" -> {
                currentProgram?.director?.add(stringBuilder.toString())
            }
            "actor" -> {
                currentProgram?.actor?.add(stringBuilder.toString())
            }
            "date" -> {
                currentProgram?.date = stringBuilder.toString()
            }
            "display-name" -> {
                currentChannel?.displayName?.add(stringBuilder.toString())
            }
        }
        currentElement = ""
    }



    private fun addEpgDataToBatch(epgData: EpgDataOB) {
        epgDataBatch.add(epgData)
        if (epgDataBatch.size >= batchSize) {
            processEpgDataBatch()
        }
    }

    private fun processEpgDataBatch() {
        if (epgDataBatch.isNotEmpty()) {
            epgDataBox.put(epgDataBatch)
            epgDataBatch.clear()
        }
    }
}

So I am searching for a fast way to parse the xml-data and insert it to the database, without having lags or crashes in my app :-) :-) Is there something wrong in my code that causes the lags? Or isn't it simple possible without slow down the parsing and database inserting process?

If any other code is needed, I can post it. Here what the Memory-Profiler looks like while parsing the Programs with XmlPullParser: Parsing with XmlPullParser

UPDATE:

Memory usage & gc -> only parsing, no database usage I used data classes Channel & Programme to parse the data somewhere, and reused always the same channel/programme: Memory usage & gc -> only parsing, no database usage

Memory usage & gc -> parsing and creating EpgDataOB Objects (no db inserting) Memory usage & gc -> parsing and creating EpgDataOB Objects (no db inserting)

Memory usage & gc -> parsing and add data to the database (db = last 10 seconds) Memory usage & gc -> parsing and add data to the database (db = last 10 seconds)

Memory usage & gc -> parsing, adding data to db & manage relation epg-channel with list of EpgData with:

 private fun addEpgDataToDatabase() {
        GlobalScope.launch {
            withContext(Dispatchers.IO) {
                epgDataBatch.chunked(15000).forEach { batch ->
                    epgDataBox.put(batch)
                    epgChannelBatch.forEach { epgChannel ->
                        epgChannel.epgDataList.addAll(batch.filter { it.epChId == epgChannel.chEpgId })
                    }
                    Log.d("EPGPARSING ADD TO DB", "OK")
                    delay(500)
                }
                epgDataBatch.clear()
            }
        }
    }

Memory usage & gc -> parsing, adding data to db & manage relation epg-channel with list of EpgData

New code for putting the parsed data into the data (tested also 3 times on TV, it's running much better then with the code of my question). Adding the whole epgDataBatch (= mutableListof) with one put into the database is even a little faster.

 private fun addEpgDataToDatabase() {
        epgDataBatch.chunked(30000).forEach { batch ->
            epgDataBox.store.runInTx {
                epgDataBox.put(batch)
                epgDataBox.closeThreadResources()
            }
        }
        addEpgDataToChannel()
    }

    private fun addEpgDataToChannel() {
        epgChannelBox.store.runInTx {
            for (epgCh in epgChannelBatch) {
                epgCh.epgDataList.addAll(epgDataBatch.filter { it.epChId == epgCh.chEpgId })
            }
            epgChannelBox.put(epgChannelBatch)
            epgChannelBox.closeThreadResources()
        }
        epgChannelBatch.clear()
        epgDataBatch.clear()
    }
8
  • What type of database are we talking about? Are you sure the parsing implementation is the problem (have you tried the parsing without saving it to the database)? Commented Jul 17, 2024 at 19:15
  • To follow up on what @Robert said, if you are doing a lot of inserts in to the database this can be very expensive because each has a implicit transaction which is costly. Better to control the transaction yourself and start just one before parsing and end it when finished. Commented Jul 17, 2024 at 19:35
  • I'm using the objectbox-database. I updated the question with some tests (& code) I made. Only parsing = memory usage is stable, but gc is called very often. Adding the data to the database, logically raises memory usage (last picture under Update). But as I need all the data in the db and the relations I see no other way how to handle that? @Andrew what do you mean by "control the transaction yourself and start just one before parsing and end it when finished"? So both of you think that not the garbage collector calls are making my app laggy, but the db operations I used while parsing? Commented Jul 18, 2024 at 9:35
  • Another question: are the frequent calls of the garbage collector, while parsing large xml-files, normal? Don't they bother the user experience? Commented Jul 18, 2024 at 9:37
  • See docs.objectbox.io/transactions#transaction-costs every time you do a put you do blocking i/o which might not be good for the coroutine, it might be worth profiling the cpu as well as coroutines are less clear to me on the impact on the main thread. I was parsing a large csv file and changing from a transaction per insert (not objectbox) to one for every file improved the speed greatly. objectbox is a bit weird but does offer objectbox.io/docfiles/java/current/io/objectbox/… Commented Jul 18, 2024 at 9:53

1 Answer 1

1

Database inserts can be costly if you are doing a lot of them when inserting your parsed xml data after data object. From ObjectBox docs.

This is because it uses blocking I/O and file locks to write the database to disk as each put is in an implicit transaction.

Thus you can speed up parsing by speeding up the database inserts.

You can batch up the data in to an array and put (insert) them all in one go and thus are in only one transaction, this will cost more memory but be faster.

Or ObjectBox does have BoxStore.runInTx() that takes a Runnable to do multiple puts in a single transaction.

ObjectBox seems want you to avoid just beginning a transaction at the start of the xml parsing and ending it when you have finished xml parsing. It does have an Internal low level method to do this.

Note this also applies to other file based databases like sqlite.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for your explanation. First I parsed all the data and saved the epg-channels in a mutableList and all epgdata to a mutableList. Then I tried both, 1. adding all epgdata with one put was very fast but increased the Native-part of the Profiler to 130-140MB that wasn't freed until I did a restart of the app. 2. I chunked the mutableList of EpgData in batches of 30.000 (total 80.000 epgdata) and used runInTx(), was also fast and the Native-part of the profiler increased "only" to 90MB, but it wasn't freed either, only with restart of the app. Shouldn't that be freed after some time?
I'm not an expert in ObjectBox internals, but probably freed when you BoxStore.close()
Read about it and tried already, but got some problems in the fragment thats calls the update epg source functions. Will do some testing next week, if I get the solution I'll post it here. Added my code for using the batches in the question at the bottom. I still have one question, the garbage collections that are running while parsing can be ignored? (even if there are so many)
Usually GC runs in it's own thread so should not slow things down unless you have too many other threads running doing too much work.
Andrews answer summed up the reason for my problem. I was to fixed on the parsing code, that I didn't realized that the database insertions could cause the problem and not the parsing and the GC. For the issue with the increasing native-part when inserting the whole epgdata into the database i asked a new question, as it's something separately, see: https://stackoverflow.com/questions/78783374/..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.