How to sort 56k complex objects quickly.

This is not something I will go in-depth for, because the actual implementation
will probably change for your project AND I just want to get this out there
quickly.

So my issue was that I was trying to sort 56 000 objects quickly, for the sake
of pagination, using LINQs .Take and .Skip methods.

Say I wanted the 30th page
of items and I wanted to have 15 items per page, I’d Skip the first 29 pages
worth of items ( 29 * 15) and then .Take the next 15 items. It’d look like

db.Table.Skip(435).Take(15).ToList();

but was running into issues when I tried to do that because you need to use
OrderBy first on the set of objects, in order to be able to skip the first ones,
since they’re not in any reliable order to .Skip in the first place.

db.Table.OrderBy(item => item.ID).Skip(435).Take(15).ToList();

I’m sure with more simple objects, just adding the OrderBy call would be enough
to do the trick, but my objects were a wrapper around 4 or 5 other objects.
Definitely not a good idea, but it’s what I’ve got! So anyway, the issue was
that it was taking over 3 seconds for just 56k objects. I tried not only
Orderby, but Take’ing the 30 pages and GetRange’ing the last page worth. That
was actually a small improvement initially, since I only ever tested it on the
first page, which meant that there was only 30 or so objects being pulled and
dealt with.

But as you might imagine, as I got to page 1000, there was a Timeout exception
naturally, as it was trying to deal with 15000 items in memory. So this very
last thing I tried, I didn’t really expect it to work for me, since I was
already sorting by ID:

I .Select’ed only the IDs for the objects I wanted to pull, rather that the
entire object. Simple right? What that means is that once I was ready to list
only the 15 items I wanted to show, I didn’t have to OrderBy all the objects and
then Skip and Take from that, I only had to deal with simple integers instead,
and then find the associated ids later with a line that looks something like the
following, with a healthy mix of .Where and .Any:

db.Table.Where( item =>
                sorted_item_ids.Any(
                    item_id =>
                       item_id ==
                     item.deeper_object.item_id)) .ToList();
                

In the likely case that that’s not as clear as it seems, it’s looking to match
any objects from db.Table with any of the item_ids from the sorted_item_ids
list, which is an array of ints, and then finally turns the resulting IQueryable
into a List<table_object>.

And there you have it! I’m not sure this post makes any sense as I wrote it
pretty quickly, so maybe I’ll come back to it later and clean it up. (Fat
chance, right?)

Leave a Reply

Your email address will not be published.