| Alan McKinnon on Tue, 5 May 2009 20:59:26 +0200 (SAST) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| [GLUG-tech] Disk space being consumed somewhere |
Hi all, Odd one this. A colleague asked me to look at a RHEL5 server running Oracle that was running out of disk space on /. It's a standard RHEL fs layout on ext3 with 66G on LogVol00 (mounted at /). df says 62G is used, du says around 12G is used. First check was "dd -bs 1024 count 1000000000 if=/dev/zero of=/1", it failed at 2.4G. OK, so there is only 2.4G space available. I checked the obvious things: - bind mounted / somewhere else, du still says 12G used (so no log files hidden under the var mount for example) - du -l (count hard links many times) - small increase, and no file is hard linked more than 3 times, verified with ls and sort - du --apparent-size shows a small increase (thinking maybe a runaway java process created a million small files < 10 bytes) - /tmp is about 200k (smaller than expected as tmpwatch isn't being used and /tmp is on /, but no worries) - tune2fs -l shows 5% reserved for root (the default), a mere few % of inodes consumed, and 75% of blocks free (consistent with du's numbers) - [pv|vg|lv}display all normal, set to RH defaults - this is expected, the machine owner clicked yes,yes,yes at install, not being familiar with LVM - a read-only fsck (which doesn't do much as / is mounted) at least shows the superblock is consistent and usable (numbers agree with tune2fs, obviously) - all on-disk filesystems are ext3, mkfs'ed with RH defaults Checking further, a java parent process (not a child thread) is using 100% of one CPU, and lsof shows a huge number of FIFOs in use - looked like >500 but I neglected to wc to make sure :-( Unfortunately, it didn't occur to me to check what swap was in use, but swap files show up as regular files AFAIK, and like sparse files can consume less space than appears, not more. A reboot put things back to normal, but I fully expect it to happen again (the owner says he's been watching the machine seeing usage creep up and du staying static). My current working theory: something in the oracle/java/whatever stack is buggy, creating temp files of some form and not releasing them. Am I correct in saying that if a FIFO blocks, it's backed by disk? And where? /tmp is the obvious place but it had only 188k used per du. It's a nice theory, and if I'm right, what would my next step be? <Alan concedes to being baffled at this point> -- alan dot mckinnon at gmail dot com -- To unsubscribe: send the line "unsubscribe glug-tech" in the subject of a mail to "glug-tech-request@xxxxxxxxxxxx". Problems? Email "glug-tech-admins@xxxxxxxxxxxx". Archives are at http://www.linux.org.za/Lists-Archives/ RULES: http://www.linux.org.za/glugrules.html