0

I'm trying to parse huge json file to 2d array.

I can parse. But required memory is almost 10times.

My sample.json file has 100,000 rows, each with a different item.

If sample.json is 500MB this code need 5GB.

How can i reduce memory usage?

I use Newtonsoft.Json, .Net6.0

Read from json


        static void Read()
        {
            List<Dictionary<string, string>> rows = new List<Dictionary<string, string>>();
            string path = @"D:\small.json";
           
            using (FileStream fsRead = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (BufferedStream bsRead = new BufferedStream(fsRead))
            using (StreamReader srRead = new StreamReader(bsRead))
            {
                string? line;
                while ((line = srRead.ReadLine()) != null)
                {
                    JObject jsonObject = JObject.Parse(line);
                    MakeRowData(jsonObject, out var row);

                    rows.Add(row);
                }
            }
        }

Make row

        private static  void MakeRowData(JObject jsonData, out Dictionary<string, string> row)
        {
            Dictionary<string, string> output = new Dictionary<string, string>();

            foreach (var item in jsonData)
            {
                int childSize = 0;

                if (item.Value != null)
                {
                    childSize = item.Value.Children().Count();

                    ///if Item has child, explore deep
                    if (childSize > 0)
                    {
                        ExploreChild(item.Value, ref output);
                    }
                    ///or not just add new item
                    else
                    {
                        string str = item.Value.ToString();
                        output[item.Key] = str ?? "";
                    }
                }
            }
            row = output;
        }

        private static void ExploreChild(JToken jToken, ref Dictionary<string, string> row)
        {
            foreach (var item in jToken)
            {
                int childSize = item.Children().Count();

                ///if Item has child, explore deep
                if (childSize > 0)
                {
                    ExploreChild(item,  ref row);
                }
                ///or not just add new item
                else
                {
                    string path = jToken.Path.Replace('[', '(').Replace(']', ')');

                    string str = jToken.First.ToString();

                    row[path] = str?? "";
                }
            }
        }
    

EDIT Add Sample.json

It is set of json strings.

And Fields are not fixed.

Sample.json
{Field1:0,Field2:1,Field2:3}
{Field1:0,Field5:1,Field6:3}
{Field1:0,Field7:1,Field9:3}
{Field1:0,Field13:1,Field50:3,Field57:3}
...

12
  • 2
    How can i reduce memory usage? by not storeing whole content of file in memory ... maybe you can stream output to file instead putting it to rows but you didn't write what you wona do with results Commented Jun 23, 2022 at 7:06
  • What do you do with the data after deserializing them? Do you write them to a database? Can you process them row by row instead of reading all of the data into a list? Commented Jun 23, 2022 at 7:16
  • This rows will displayed GUI. Like Datagrid, chart and process ordering, data filter. Commented Jun 23, 2022 at 7:22
  • For such big data, I don't think using JSON is a good idea, an RDB is more applicable. Commented Jun 23, 2022 at 7:28
  • 2
    Consider using real classes rather than JObject. Also why BufferedStream? Can we have a sample of your JSON objects? Commented Jun 23, 2022 at 8:36

1 Answer 1

1

You can try replacing the recursive exploring children with the iterative one. Something like this:

    private static  void MakeRowData(JObject jsonData, out Dictionary<string, string> row)
{
    Dictionary<string, string> output = new Dictionary<string, string>();
    foreach (var item in jsonData)
    {
        if (item.Value != null)
        {
            ///if Item has child, explore deep
            if (item.Value.HasValues)
            {
                var queue = new Queue<JToken>();
                queue.Enqueue(item.Value);
                while (queue.Any())
                {
                    var currItem = queue.Dequeue();
                    if (currItem.HasValues)
                    {
                        foreach(var child in item)
                            queue.Enqueue(child);
                    }
                    else
                    {   
                        // add item without children to row here
                    }
                }
            }
            ///or not just add new item
            else
            {
                string str = item.Value.ToString();
                output[item.Key] = str ?? "";
            }
        }
    }
    row = output;
}

Recursive calls, unless it is a tail recursion, keep the stack of a method they were called from. This can lead to extensive memory usage.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.