Looks like most other folks still messing around in the netflix competition are in a similar situation as myself.
This new article on NYtimes gives some decent insight into the folks still working on it and the remaining challenges to winning the prize.
The article discusses one of the problems I’ve found too. Movies that actually suck by most accounts but people have very polarized opinions on – either love or hate. This is very difficult for an algorithm working on sparse data to handle when MOST of the data is mean-reverting.
“Bertoni says it’s partly because of “Napoleon Dynamite,” an indie comedy from 2004 that achieved cult status and went on to become extremely popular on Netflix. It is, Bertoni and others have discovered, maddeningly hard to determine how much people will like it. When Bertoni runs his algorithms on regular hits like “Lethal Weapon” or “Miss Congeniality” and tries to predict how any given Netflix user will rate them, he’s usually within eight-tenths of a star. But with films like “Napoleon Dynamite,” he’s off by an average of 1.2 stars.
The reason, Bertoni says, is that “Napoleon Dynamite” is very weird and very polarizing. It contains a lot of arch, ironic humor, including a famously kooky dance performed by the titular teenage character to help his hapless friend win a student-council election. It’s the type of quirky entertainment that tends to be either loved or despised. The movie has been rated more than two million times in the Netflix database, and the ratings are disproportionately one or five stars.
Worse, close friends who normally share similar film aesthetics often heatedly disagree about whether “Napoleon Dynamite” is a masterpiece or an annoying bit of hipster self-indulgence. When Bertoni saw the movie himself with a group of friends, they argued for hours over it. “Half of them loved it, and half of them hated it,” he told me. “And they couldn’t really say why. It’s just a difficult movie.””
This is exactly the problem I’ve run into. For the most part this prize algorithm has been uncovered for most of the database of users and movies. It’s probably to the point where if this was an internal dev team working on the problem they would haven’t said “good enough” and moved on to other projects.
I’m glad I’m not alone in still working hard on this but getting almost further. Whew.
Read Full Post »