Summary
When a high-volume insert workload caused a MongoDB server running on Ubuntu to crash due to 100% disk utilization, the WiredTiger storage engine’s inability to complete mandatory recovery prevented the database from binding to its connection port until sufficient disk space was available.
- The MongoDB process crashed after the underlying Linux mount reached 100% utilization. When restarted, WiredTiger automatically entered non-optional recovery mode.
- The recovery process failed to complete because WiredTiger requires 5–10% free disk space for replaying journals and writing temporary recovery data, which the system lacked with only 1 GB of free space.
- Attempted workarounds, such as reducing the cache size or adjusting journal settings, failed because the core issue was the insufficient physical disk headroom necessary for recovery operations.
- The ultimate fix required moving the data directory to a larger filesystem, which provided 13 GB of free space and allowed WiredTiger to complete recovery and accept client connections successfully.
To prevent severe database outages, organizations must adopt proactive capacity monitoring and ensure database mounts consistently maintain at least 10–15% free disk space for necessary operational headroom.
Table of contents
As a database administrator supporting MongoDB in production and test environments, I often remind teams that databases are unforgiving when the operating system runs out of disk space. Recently, I worked through a case that illustrates this point clearly — and it highlights the importance of monitoring, capacity planning, and recovery strategies.
The Problem: MongoDB Will Not Allow Connections After a Restart
The incident began after a high-volume insert workload pushed the underlying Linux mount to 100% disk utilization. MongoDB was running on Ubuntu, with the data directory (/var/lib/mongo
) and the WiredTiger journal on the same file system.
Once the disk was filled completely, the mongod
process crashed. After shutting down mongod
, I manually freed up about 1 GB of space. Unfortunately, that was not nearly enough.
When attempting to restart the database, the process started, but I was unable to connect using the MongoDB
shell. From the OS perspective, the port was not listening:
ss -tulnp | grep 27017
# (no output)
Symptoms in the Logs
The MongoDB logs told the real story. On startup, WiredTiger entered recovery mode:
{"t":{"$date":"2025-09-30T22:31:49.537-05:00"},"s":"W","c":"STORAGE","id":22302,"ctx":"initandlisten","msg":"Recovering data from the last clean checkpoint."}
{"t":{"$date":"2025-09-30T22:31:49.769-05:00"},"s":"I","c":"STORAGE","id":22430,"ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"Recovering log 467 through 476"}}
{"t":{"$date":"2025-09-30T22:31:54.880-05:00"},"s":"I","c":"STORAGE","id":22430,"ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"Recovering log 476 through 476"}}
What was happening:
- MongoDB was replaying journal files from the last checkpoint.
- Until that recovery finished, MongoDB would not bind to port 27017 and therefore refused all connections.
This behavior is expected. WiredTiger recovery is non-optional — it ensures committed transactions are applied before the database is made available again.
Why Recovery Was Stuck
WiredTiger requires 5–10% free disk space to operate properly. That space is used for:
- replaying journals,
- allocating cache pages, and
- writing temporary recovery data.
With only 1 GB of free space, the system lacked sufficient headroom for recovery to succeed. The result was a “limbo” state: MongoDB appeared to start, but never reached the point where clients could establish a connection.
Attempted Workarounds
As a DBA, I tried a few safe adjustments:
Reducing the cache size in mongod.conf
:
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 0.5
This reduced memory/disk pressure during startup, but it did not resolve the lack of overall free space.
Restarting with minimal journal settings:
Journaling remained enabled, but with reduced overhead. Again, recovery refused to complete.
At this point, it was clear: MongoDB could not and would not start until more disk space was made available.
The Resolution: Move the Data Directory
The ultimate fix was straightforward but required decisive action. With mongod shut down:
1. I moved the data directory from the nearly full /
mount to a larger filesystem:
mv /var/lib/mongo /u01/mongo
ln -s /u01/mongo /var/lib/mongo
2. Updated mongod.conf to point to the new location.
3. Restarted MongoDB:
systemctl restart mongod
This time, WiredTiger had 13 GB of free space to complete recovery. The logs showed progress:
{"msg":"WiredTiger message","attr":{"message":"txn rollback_to_stable: Rollback to stable has been running for 23 seconds and has inspected 93 files."}}
After some time, MongoDB successfully bound to port 27017 and accepted client connections. From there, I was able to drop the oversized test collection and restore the system to a stable state.
Lessons Learned
For IT leaders and infrastructure teams, there are clear takeaways from this incident:
- MongoDB requires headroom — Always maintain at least 10–15% free disk space on database mounts.
- Recovery is unavoidable — WiredTiger will always replay journaled operations after a crash. There is no “skip recovery” option.
- Capacity monitoring matters — Alerts on filesystem utilization can prevent a crisis before it happens.
- Design with growth in mind — Large insert workloads, especially in test/dev environments, can consume space rapidly. Use quotas or batch inserts to manage risk.
Final Thoughts
In this case, the root cause was simple: uncontrolled inserts filled the disk. But the impact was severe: the database was effectively offline until corrective action was taken.
For production environments, this type of outage is unacceptable. Proactive monitoring, disk growth planning, and workload management are essential.
As DBAs, our role is not just to fix problems when they occur, but to build resilient database platforms that prevent them in the first place.
For questions, please contact us.