Creating a “what’s hot” algorithm with PHP and MySQL

This is a small guide on how to implement a Reddit-style “what’s hot” algorithm with PHP and MySQL.

There are two primary benefits to using a “what’s hot” algorithm.

  1. It gives newer posts a chance. The problem with many “most popular” pages is that they give older posts and items a huge advantage over their younger counterparts. This leads to a situation where superior content is losing out to mediocre content that was created months beforehand.
  2. It keeps things “fresh” and prevents content stagnation. Most users will quickly lose interest in a website that only displays a “most popular” list. This is because the content does not change enough to warrant any kind of long-term attention.

For this example, we are going to assume that we are running a website that provides users with the functionality to “thumbs up” or “thumbs down” videos that have been submitted.

Our table design.

Note that I am going to try to keep our database design as simple as possible, lest we allow other topics to creep into the article. Here is the structure of an example table. Note that the three columns that are most important to us are “thumbs_up”, “thumbs_down” and “created.” Without these columns, we cannot implement a “what’s hot” page:

CREATE TABLE IF NOT EXISTS `videos` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `url` text COLLATE utf8_unicode_ci NOT NULL,
  `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `thumbs_up` smallint(5) NOT NULL,
  `thumbs_down` smallint(5) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;

Our SQL query.

Personally, I think that it is better to implement the “what’s hot” algorithm in SQL, and not in PHP. i.e. Let the database handle the heavy work. This is because:

  1. It allows us to avail of any indexes that we created.
  2. We can cache the results of the query in memory if we need to.

The query looks like this (based on the Reddit algorithm):

SELECT 
    id, url, title, thumbs_up, thumbs_down, created
FROM 
    videos
ORDER BY 
    LOG10(ABS(thumbs_up - thumbs_down) + 1) * SIGN(thumbs_up - thumbs_down)
    + (UNIX_TIMESTAMP(created) / 300000) DESC
LIMIT 100

As you can see, the algorithm is implemented in the ORDER BY clause and we’ve limited the results to 100. Here is some quick test data, if you need it:

INSERT INTO `videos` (`id`, `url`, `title`, `thumbs_up`, `thumbs_down`, `created`) VALUES
(1, 'http://google.com', 'Test post, please ignore.', 12, 1, '2014-03-15 11:00:00'),
(2, 'http://youtube.com', 'Another example, submitted a while back.', 90, 12, '2014-03-10 08:00:00'),
(3, 'http://example.com/video.php?id=3421', 'Funny video.', 120, 45, '2014-03-14 06:00:00'),
(4, 'http://website.com/v/36782', 'Cat wearing slippers.', 7, 1, '2014-03-15 12:00:00');

After adding a few videos to your table and running the “what’s hot” query above, you’ll see that a row’s total score does not guarantee it the top spot. Instead, our algorithm takes both the record’s total score and its creation time into account.

Tying it together with PHP.

Because the SQL query does all of the hard work, tying it into PHP is as simple as running any kind of SELECT statement.

Here’s an example with PDO:

$sql = "SELECT id, url, title, thumbs_up, thumbs_down, created
        FROM videos
        ORDER BY 
            LOG10( ABS( thumbs_up - thumbs_down ) + 1 ) * 
            SIGN( thumbs_up - thumbs_down ) + 
            ( UNIX_TIMESTAMP( created ) /300000 ) DESC 
        LIMIT 100";

$statement = $pdo->prepare($sql);
$statement->execute();
while ($row = $statement->fetch(PDO::FETCH_ASSOC)) {
    var_dump($row);
}

If, for whatever reason, you need to use the mysql extension.

$sql = "SELECT id, url, title, thumbs_up, thumbs_down, created
        FROM videos
        ORDER BY 
            LOG10( ABS( thumbs_up - thumbs_down ) + 1 ) * 
            SIGN( thumbs_up - thumbs_down ) + 
            ( UNIX_TIMESTAMP( created ) /300000 ) DESC 
        LIMIT 100";

$result = mysql_query($sql, $conn);
if($result === false){
    throw new Exception(mysql_error($conn));
}

while($row = mysql_fetch_assoc($result)){
    var_dump($row);
}

Extra Pointers.

  • Your vote buttons should send off an Ajax request instead of redirecting the user or reloading the page. Websites like Youtube and Reddit have conditioned users to believe that their vote is in “real time.” Sending Ajax requests is extremely easy with JavaScript libraries such as JQuery.
  • If performance is a concern, be sure to read up on indexes. You might also want to look into the possibility of using an object caching daemon such as Memcached.
  • To prevent the user from voting multiple times, you should store votes against the User’s ID. There are other methods of preventing vote rigging, but they fall outside the scope of this article.

If you have any questions or concerns, feel free to post them in the comment section below.