How to stop Drupal from crashing your Linux/Apache/MySQL server.
If you are using Drupal 6 or an earlier version of this popular open source content management system, commonly installed on Linux servers running Apache and MySQL, to manage your content, you may have a problem. It is the same even if you managed to install Drupal in a much less common environment such as Windows, or are running it on a PostgreSQL database. The problem I am talking about lurks inside Drupal's core search module. Supposedly, it was fixed starting with Drupal 7, but I didn't check this personally.
Drupal's optional core search module is the one that makes internal site searches possible by parsing the site's nodes by the keywords that they contain. The module also scores the keywords it collects by the criteria set up in the module's admin interface, indexes them and stores results in the four MySQL database search tables: search_dataset, search_index, search_node_links and search_total. Even though this search module is optional, it is enabled by default at installation and most Drupal sites keep it that way.
That’s all good, except that this search module indexes both published and unpublished nodes. Now, what’s the problem with that? Well, we live in the world full of spammers and the way many good Drupal site administrators deal with all the unwanted “content” that spammers generate is not by deleting spam entirely, but by unpublishing it. In many other setups, all content must be approved by admins explicitly and that which is not, remains unpublished.
Now imagine a very likely scenario of a site with a thousand published real content articles and a hundred times as many unpublished spam articles. Also presume that each unit of spam is two to three times the length of real published content – somehow spammers always have that much more to say! All that unpublished spam grows the four Drupal MySQL search tables and it grows them a lot.
In our case, the largest of these MySQL database search tables, search_index alone grew to over 20 million keyword records and took up almost a gigabyte of database storage space. For a site with relatively little real content, this was an unjustifiably high load, which caused the MySQL database server to timeout on a nearly daily basis often taking the co-resident application server down with it. Something had to be done.
To solve the problem, I slightly hacked the node_update_index() function to restrict search indexing to only the published nodes. I then truncated all the four MySQL search tables and rebuilt the indexes. When all was done, the MySQL search_index table was only 35 thousand keywords long and took just over two megabytes of database space.
If you are in a similar situation, you may appreciate following my three easy steps to save your server from crashing all the time:
- Find the node_update_index() function inside the node.module file located in the modules/node directory of your Drupal installation. Change the WHERE clause of the SELECT statement used in this function from WHERE d.sid IS NULL OR d.reindex <> 0 to WHERE (d.sid IS NULL OR d.reindex <> 0) AND n.status=1. Remember to again apply this patch every time you upgrade Drupal 6 core.
- Empty the four Drupal MySQL search tables by issuing TRUNCATE TABLE search_dataset; TRUNCATE TABLE search_index; TRUNCATE TABLE search_node_links; TRUNCATE TABLE search_total; commands from MySQL command line or by using phpMyAdmin to do it.
- To rebuild the search indexes, go to your Drupal website’s /admin/settings/search url and click on the “Re-index site” button. Drupal will take it from here, or you can expedite the rebuilding of your site’s search indexes by running cron manually a few times until you see the “100% of the site has been indexed. There are 0 items left to index.” message above the “Re-index site” button. To run cron manually, go to your Drupal website’s /admin/reports/status/run-cron url.