1

Currently, I'm trying to implement a data reader to perform a particularly large query. The current implementation uses Entity Framework, but due to the nature of the query, it's incredibly slow (somewhere around 4 and a half minutes).

Here is the current implementation with EF:

public List<SomeDataModel> GetSomeData(List<string> SomeValues, string setId)
{
    var ret = new List<SomeDataModel>();

    using(var context = new SomeDBContext())
    {
        var data = context.SomeEntity.Where(x => x.SetId == setId && SomeValues.Contains(x.SomeValue));
        data.ForEach(x => ret.Add(mapper.Map<SomeDataModel>(x))); // mapper is an instance of AutoMapper via dependency injection
    }
    return ret; 
}

Ideally I'd like to generate a more basic query string and pull data through an OracleDataReader. The issue is this: in an IN statement in Oracle, you can only have 1000 values. The SomeValues parameter can be anywhere from 5,000 to 25,000, so I imagine on the back end EF is generating multiple queries on its own, but like I said, it's incredibly slow.

This is sort of the direction I'm trying to take it:

public List<SomeDataModel> GetSomeData(List<string> SomeValues, string setId)
{
    var ret = new List<SomeDataModel>();
    const int MAX_CHUNK_SIZE = 1000;
    var totalPages = (int)Math.Ceiling((decimal)SomeValues.Count / MAX_CHUNK_SIZE);

    for(var i = 0; i < totalPages; i++)        
    {
            var chunkItems = SomeValues.Skip(i * MAX_CHUNK_SIZE).Take(MAX_CHUNK_SIZE).ToList();
            pageList.Add(chunkItems);
    }

        using (var context = new CASTDbContext())
        {
            var connStr = context.Database.Connection.ConnectionString;
            using (var conn = new OracleConnection(connStr))
            {
                foreach(var page in pageList)
                {
                    var queryStr = string.Format("SELECT * FROM SomeTable WHERE SomeColumn IN ({0})", "(" + string.Join(",", page.ToArray())  + ")");
                    var cmd = new OracleCommand(queryStr, conn);
                    using (var reader = cmd.ExecuteReader())
                    {
                        while(reader.Read())
                        {
                            var newItem = new SomeDataModel();
                            newItem.Something = reader["Something"].ToString();
                            ret.Add(newItem);
                        }
                    }
                }                                      
            }
        }
    return ret; 
}

The desired results I suppose are to either efficiently generate multiple queries for the reader, or construct a single query that can handle this scenario in an effective way. What I have in that second example is sort of a placeholder code at the moment.

9
  • Why are you using OracleDataReader? Why don't you run a stored procedure using EF? Commented Aug 15, 2019 at 17:37
  • Unfortunately, there's a sort of superficial requirement to not use a Stored Procedure. The table it references is incredibly volatile, and there have been problems with other stored procedures referencing it, so I was told to not use one. Commented Aug 15, 2019 at 17:39
  • If you're splitting the list of items to query into pages anyway, did you try doing that with EF? How's the performance on that? I mean build the pageList like in your second example, and then do the Contains query for each page? Commented Aug 15, 2019 at 17:43
  • Shame you can't use a proc, as otherwise you could have passed all the values as an array and the query would probably take no time at all. Commented Aug 15, 2019 at 17:43
  • @gnud I tried going that route as well, but there wasn't really much of a difference in performance. Commented Aug 15, 2019 at 17:46

2 Answers 2

2

At first,

Oracle can take more than 1000 values in IN list if we use the pair of column.

So, following will throw error:

SELECT * FROM TABLE 
WHERE COL1 IN (VAL1, VAL2,... VAL1000, VAL1001);

But following will work:

SELECT * FROM TABLE 
WHERE (COL,1) IN ( (VAL1,1), (VAL2,1),... (VAL1000,1), 
(VAL1001,1).....(VAL9999,1) ); 
-- we have used pair of value and 1 to be compared with col and 1

Hope, This will give you direction in solving the problem. And yes, I am not sure about performance so please check it at your end.

Cheers!!

Sign up to request clarification or add additional context in comments.

1 Comment

Seems like a good way to work around Oracle's limitations. I've only tried it with a data-reader, but performance doesn't quite hold up to EF.
0

Something that might help:

Depending on what your SomeEntity and SomeDataModel look like you can be loading values from the DB that you don't need or even tripping lazy loading because in populating a data model it is referencing a related entity that isn't eager loaded. With Automapper I would recommend leveraging the ProjectTo method into the Linq query. This will ensure that the data hit only happens once and returns just the fields needed for the data model.

var results = context.SomeEntity
    .Where(x => x.SetId == setId && SomeValues.Contains(x.SomeValue))
    .ProjectTo<SomeDataModel>()
    .ToList();

or with Automapper 9:

var results = mapper.ProjectTo<SomeDataModel>(context.SomeEntity
    .Where(x => x.SetId == setId && SomeValues.Contains(x.SomeValue)))
    .ToList();

Querying across a large number of arbitrary values like that is never going to be efficient. You should also be checking whether the data is indexed or not. Once you capture the SQL being run to retrieve your data, feed it through a suitable analyzer (I'm not that familiar with Oracle tooling) to see if there are index suggestions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.