How to sort 56k complex objects quickly.

This is not something I will go indepth for, because the actual implementation
will probably change for your project AND I just want to get this out there
quickly.

So my issue was that I was trying to sort 56 000 objects quickly, for the sake
of pagination, using LINQs .Take and .Skip methods. Say I wanted the 30th page
of items and I wanted to have 15 items per page, I’d Skip the first 29 pages
worth of items ( 29 * 15) and then .Take the next 15 items. It’d look like

db.Table.Skip(435).Take(15).ToList();

but was running into issues when I tried to do that because you need to use
OrderBy first on the set of objects, in order to be able to skip the first ones,
since they’re not in any reliable order to .Skip in the first place.

db.Table.OrderBy(item => item.ID).Skip(435).Take(15).ToList();

I’m sure with more simple objects, just adding the OrderBy call would be enough
to do the trick, but my objects were a wrapper around 4 or 5 other objects.
Definitely not a good idea, but it’s what I’ve got! So anyway, the issue was
that it was taking over 3 seconds for just 56k objects. I tried not only
Orderby, but Take’ing the 30 pages and GetRange’ing the last page worth. That
was actually a small improvement initially, since I only ever tested it on the
first page, which meant that there was only 30 or so objects being pulled and
dealt with.

But as you might imagine, as I got to page 1000, there was a Timeout exception
naturally, as it was trying to deal with 15000 items in memory. So this very
last thing I tried, I didn’t really expect it to work for me, since I was
already sorting by ID:

I .Select’ed only the IDs for the objects I wanted to pull, rather that the
entire object. Simple right? What that means is that once I was ready to list
only the 15 items I wanted to show, I didn’t have to OrderBy all the objects and
then Skip and Take from that, I only had to deal with simple integers instead,
and then find the associated ids later with a line that looks something like the
following, with a healthy mix of .Where and .Any:

db.Table.Where( item =>
                sorted_item_ids.Any(
                    item_id =>
                       item_id ==
                     item.deeper_object.item_id)) .ToList();
                

In the likely case that that’s not as clear as it seems, it’s looking to match
any objects from db.Table with any of the item_ids from the sorted_item_ids
list, which is an array of ints, and then finally turns the resulting IQueryable
into a List<table_object>.

And there you have it! I’m not sure this post makes any sense as I wrote it
pretty quickly, so maybe I’ll come back to it later and clean it up. (Fat
chance, right?)

Netflix Catalog January update

Took a few  weeks off, first for school, and then for christmas. Back only a few days, and I’ve learned the basics about Cookies and Filters in ASP.NET MVC. They’re both pretty simple, even moreso than I thought. Instead of rewriting a lot of the stuff I read, I’ll just link to the StackOverflow answers that I used directly.

For the cookies, it’s here, where it shows both how to add and delete (via setting the Expiry of the cookie to yesterday) a cookie. I thought that was pretty neat. Basically all it is is a number that the browser stores, and then the server has to do something with that.

Which brings me to Filters, with a SO answer in two parts: Here and the offical docs, which are very surprisingly handy this time. Filters just run before and after certain events, like Authorization, Actions, Results and Exceptions. I’ve only played with “OnResultExecuting” which mean before the ViewResult has been finished being processed, since if I change the ViewBag in the filter, the returned view with contain the modified data.

Other than that, I’ve run into a strange issue which I’m really hoping to solve: the catalog index I got the from Netflix API is incomplete! It’s got about 56k titles on there, individual shows included as separate entries (something I still have to sort out) and I can’t find certain titles, like Dexter and The 4400, which are both available for streaming off Netflix.com. I’ve made a post asking for clarification here but looking at the post, it seems like I’ll need to cross post into the “Help me” subforum, rather than the “API forum” forum.

My buddy is still working on the Rotten Tomatoes thing, which has taken a few months now. He hasn’t made much progress as far entering things into the database, but then I realized that without the huge /catalog/titles/streaming resource, I’d probably be in the same spot he is. I’ll have to remind him about the OMDB tool, and how we can just pull the RottenTomatoes data from there instead. Then it’ll be a matter of associating the Netflix movie with the RottenTomatoes movie. That’ll be tricky.

Anyway wanted to make a post here because  it’s been a while, and it doesn’t get any easier to write these. And although I now manage to get about 1000 people to the site a month, I realized 75%+ of them are here for the tutorials I have written. I really should keep working on those… If you ever have something you’d like to learn about, please hit me up!

You should follow me on twitter @tankorsmash to hear me complain about the official docs, or how All That Remains is dead

Latest Project: A new website!

What’s up dudes! It’s been a while, but I’ve been busy. If you’ve been following me on Twitter, you’ll see that I’ve been working on this new website. It’s basically just like 5000best.com/movies or easyqueue or whatever it’s called. I’ve never used them, since I only use it on the xbox, and the interface is pretty bad!

But anyways, the project has been underway for a few weeks now, and while it’s my first website that I’m building from as scratch as you can get with C#’s ASP.NET MVC4, it’s coming along nicely. So far, I’ve managed to pull down the Netflix API’s Index, so that’s about 55k titles that I’ve got to play around with. The dataset is pretty complete, which means that it’s pretty comprehensive.

Once I got MVC4 set up and working, which was not easy, even with sec_goat’s help by the way, was getting a database local server working. I don’t want to get into that too much, because I’m still sick of the idea, but basically MVC4 uses 2012 SQL databases, and I had SQL SERVER 2008, so I needed to update to 2012 to match the server I had created. That meant I needed to update to ’08, ’08 SP1, ’08 R2, ’08 R2 SP1, ’08 R2 SP3, then finally ’12. I think I even needed SP1 on the ’12. But it really isn’t clear the order to me, since at that point I was just installing everything I could and trying to get it to work. It was frustrating.

Another issue I am currently dealing with is the idea of Database Normalization. What that is is essentially splitting up a database in order to get to be the least redundant as it could possibly be. Reducing columns that work out to the same thing. Like say you had an ingredient list, and you added ketchup and mustard and pickles. But EVERY single time you didn’t use ketchup, you took away mustard too, so that means that you might as well have one column called toppings yes or no, rather than a ketchup and a mustard column. Well, that’s actually a really poor example. As usual, Stack Overflow has a better example, as well as About.com actually.

So back to the issue. I used to have ALL the relevant data from the Netflix API in one massive, ugly table, but I wasn’t happy with how long that it was taking to get data from it (and apparently DB Normalization doesn’t really help with that, but maybe it does, I really don’t know at this point) so I endeavored to break it up. So it’s most of the movie data, like year released, duration, rating etc in one table, then a list of all the genres into another, then a cross table, linking one genre to one movie ID. And with Box Art, I’ve got a table of movie id’s linked with the box art of relevant sizes. That way whenever I need to box art for a movie ID, I can just select it from the BoxArt table, rather than look through the whole movies database. Dunno if it helps my speed case at all, but I hope so.

Right this minute, I’m building the list of Movie to Genre table to make. It’s been about 20 minutes since I’ve started writing this, and it’s at movie 909 out of 55k. It’ll take about 10 hours I think to create the table in full, but I’m told that isn’t a terrible amount of time. But again, I don’t know for sure.

So there you have it, good news and bad new. A lot of frustration but a lot of new information, so you take what you can get. I definitely don’t like knowing absolutely nothing about DBM and SQL, and asking really basic questions, but I guess that’s just part of the job. Er, hobby.

Learning C#

So I’ve been quiet the last few weeks, and that’s because I’ve been busy learning C#. I never got into it because I was always worried that it’d be archaic and obtuse to learn. That was compounded by the fact that I had tried C++ a few times, with Code::Blocks and I always got stuck with choosing a compiler or something. I really didn’t know what I was doing. Luckily, a friend of mine, sec_goat, convinced me to try out C#.

I’m really glad he did, because the whole language really isn’t as nasty as I was expecting. The compiler issue was taken care of by Visual Studio 2010 Pro, which I get for free from Microsoft DreamSpark, since I’m still in College. I’m actually still pretty fuzzy on the stuff that’s going on under the hood in C#, so I probably couldn’t compile my own project, but hey, it’s only been a few weeks.

Another reason I’m pretty stoked is that I’ve always wanted the be able to develop some games, and that just wasn’t easy in Python, because of how it runs. It’s super easy to get the design going, but unless you’d optimize the shit out of it, it’s going to run a bit slow. Now, not only can I develop games that’ll run at a decent pace, I can actually make games for the Xbox 360 too, with the XNA package.

It’s similar to PyGame as far as I can tell, but allows you to develop for the Xbox 360, and it’s integrated into Visual Studio quite nicely. It’s only a shame that I can’t use Visual Studio 2012 with it, without tweaking some files around; but that’s not a huge issue, since VS 2010 is pretty great anyways.

Relatedly, I’m using the VSvim plugin for Visual Studio, so a great majority of the vim controls are allowed. This greatly helps me learn vim. I have also set up gVim as an external tool, so whenever I hit F1, the code gets loaded into gVim for editing, should I ever need to do some extreme vimming.

Right now, we’re toying with the idea of making a Facebook pet raising game. Sec_goat is seeing what he can do as far as setting up a networked leaderboard, and I’m experimenting with making a tamigotchi clone. I’ve got a little console game going now that allows you to feed and do damage to your pet, along with an inventory system so you can store the food and a rudimentary status system where you can get buffs and debuffs such as happy or sick etc. I’m now trying to move the Console game to XNA, so I can make a graphical interface for it, along with having a true game loop, rather than just one prompt after another.

As far as tutorials go, that’s going to be put on the sidelines, if you can’t tell already. Like most people, I’m motivated by my own interests, so I just jump from project to project when it suits me. But if you’d like to get a simple tutorial on a Python feature, send me an email at tankorsmash@tankorsmash.com or on reddit under the username tankorsmash and maybe it’ll be the motivation I need to write another!

Thanks for reading, and as always,  I love hearing feedback. Ask me some questions or something!