Creating a “Hot” algorithm with PHP and MySQL

In this tutorial, we are going to show you how to create a Reddit-style “Hot” algorithm using PHP and MySQL.

There are two primary benefits to using a “Hot” algorithm.

Firstly, it gives newer posts a chance. The problem with many “most popular” pages is that they give older posts a huge advantage over newer submissions. This leads to a situation where good content loses out to mediocre content that was created months beforehand.

Secondly, it keeps things “fresh” and prevents content stagnation. Most users will quickly lose interest in a website that only displays a “most popular” list. This is because the content does not change enough to warrant any kind of long-term attention.

For this example, we are going to assume that we are running a website that provides users with the ability to “thumbs up” or “thumbs down” videos that have been submitted.

Our table design.

We are going to try to keep our database design as simple as possible.

Here is the structure of an example “videos” table.

Note that the three columns that are most important to us are “thumbs_up”, “thumbs_down” and “created.” Without these columns, we cannot implement a “what’s hot” page:

CREATE TABLE IF NOT EXISTS `videos` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `url` text COLLATE utf8_unicode_ci NOT NULL,
  `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `thumbs_up` smallint(5) NOT NULL,
  `thumbs_down` smallint(5) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;

Our “Hot” SQL query.

It is better to implement the “what’s hot” algorithm using SQL instead of PHP.

In other words, let the database handle the heavy work. There are two reasons for this:

  1. It allows us to take advantage of indexes.
  2. We can cache the results of the query in memory if we need to.

The query, which is based on the Reddit algorithm, looks like this:

SELECT 
    id, url, title, thumbs_up, thumbs_down, created
FROM 
    videos
ORDER BY 
    LOG10(ABS(thumbs_up - thumbs_down) + 1) * SIGN(thumbs_up - thumbs_down)
    + (UNIX_TIMESTAMP(created) / 300000) DESC
LIMIT 100

As you can see, the algorithm is implemented in the ORDER BY clause. We’ve also limited the results to 100.

Here is some quick test data, if you need it:

INSERT INTO `videos` (`id`, `url`, `title`, `thumbs_up`, `thumbs_down`, `created`) VALUES
(1, 'http://google.com', 'Test post, please ignore.', 12, 1, '2014-03-15 11:00:00'),
(2, 'http://youtube.com', 'Another example, submitted a while back.', 90, 12, '2014-03-10 08:00:00'),
(3, 'http://example.com/video.php?id=3421', 'Funny video.', 120, 45, '2014-03-14 06:00:00'),
(4, 'http://website.com/v/36782', 'Cat wearing slippers.', 7, 1, '2014-03-15 12:00:00');

After adding a few videos to your table and running the “what’s hot” query above, you’ll see that a row’s total score does not guarantee it the top spot.

Instead, our algorithm takes both the record’s total score and its creation time into account.

Tying it together with PHP.

Because the SQL query does all of the hard work, using it with PHP is as simple as running any kind of SELECT statement.

Here is an example that uses the PDO object:

$sql = "SELECT id, url, title, thumbs_up, thumbs_down, created
        FROM videos
        ORDER BY 
            LOG10( ABS( thumbs_up - thumbs_down ) + 1 ) * 
            SIGN( thumbs_up - thumbs_down ) + 
            ( UNIX_TIMESTAMP( created ) /300000 ) DESC 
        LIMIT 100";

$statement = $pdo->prepare($sql);
$statement->execute();
while ($row = $statement->fetch(PDO::FETCH_ASSOC)) {
    var_dump($row);
}

If, for whatever reason, you need to use the older mysql extension:

$sql = "SELECT id, url, title, thumbs_up, thumbs_down, created
        FROM videos
        ORDER BY 
            LOG10( ABS( thumbs_up - thumbs_down ) + 1 ) * 
            SIGN( thumbs_up - thumbs_down ) + 
            ( UNIX_TIMESTAMP( created ) /300000 ) DESC 
        LIMIT 100";

$result = mysql_query($sql, $conn);
if($result === false){
    throw new Exception(mysql_error($conn));
}

while($row = mysql_fetch_assoc($result)){
    var_dump($row);
}

Extra Pointers.

  • Your vote buttons should send an Ajax request instead of redirecting the user or reloading the page. Websites like Youtube and Reddit have conditioned users to believe that their vote is in “real time.” Sending Ajax requests is extremely easy with JavaScript libraries such as jQuery.
  • If performance is a concern, make sure that you read about indexes. You might also want to look into the possibility of using an object caching daemon such as Memcached.
  • To prevent the user from voting multiple times, you should store votes against the User’s ID. There are other methods of preventing vote rigging, but they fall outside the scope of this article.

If you have any questions or concerns, feel free to post them in the comment section below.