Invalidating snapshot unable to allocate exception

Currently setting up a small KVM host to run a few VMs for a small business.The server has 2 drives in software md RAID 1, then I have it set as a PV in an LVM setup. The / partition of the KVM guests are disk images but with one particular guest that will have higher i/o requirements, I've added a 2nd HDD to the VM which is a logical volume from the host storage pool.

Most commands fail with an Input/Output error (tail, desmg, umount, mc, blkid, ls). A message is displayed but the reboot doesn't happen.

Press Ctrl Alt Del "Nothing happens..." After reading a few messages on the screen I must power off.

I'll check the console through the DRAC shortly but I'm expecting to see a bunch of i/o errors on the console.

I don't have virtual media access so can't load systemrescuecd to do repairs, so I'm a bit wary of rebooting at this stage.

Install from: USB flash drive Install mode: x64 Install options: mdraid per partition (or not) As of ISO Nov 2014, if too many packages are installed by pacstrap, pacstrap crashes shortly after mkinitcpio runs.

The last pacstrap commands and user typed commands show these errors.Even power on hours were relatively low, about 150 days or so. Just tried rebooting that VM and that LVM-backed virtual disk isn't even detected now, even though it's attached in virt-manager.Based on all that, what's the likelihood of this being the start of a drive failure?Severity column represents the severity of the PMR at the time the APAR was opened.这两天看了有关LVM快照(snapshot)的一些资料,加上 pczou的一些指导,对snapshot有一些理解,记录如下,和大家共享: Logical Volume Manager (LVM)提供了对任意一个Logical Volume(LV)做“快照”(snapshot)的功能,以此来获得一个分区的状态一致性备份。 在某一个状态下做备份的时候,可能有应用正在访问某一个文件或者数据库,这就是使得备份的时候文件处于一个状态,而备份完后,文件却处于另外一个状态,从而造成备份的非一致性,这种状态恢复数据库数据几乎不会成功。 状态的解决办法是将其分区挂载为只读,然后通过数据库的表级别锁定(table-level write locks)甚至停止数据库来备份数据。所有这些方法无意严重影响了服务的可用性。使用LVM snapshot既可以获得一致性备份,又不会影响服务器的可用性。 要提醒一点是,snapshot这种方法仅对LVM有效,对于非LVM文件系统无效。 snapshot的实现有多种方式(参考文章最后的连接),这里说说LVM中snapshot的“写时复制”(copy on write) 的实现方法。 当 一个snapshot创建的时候,仅拷贝原始卷里数据的元数据(meta-data)。创建的时候,并不会有数据的物理拷贝,因此snapshot的创建 几乎是实时的,当原始卷上有写操作执行时,snapshot跟踪原始卷块的改变,这个时候原始卷上将要改变的数据在改变之前被拷贝到snapshot预留 的空间里,因此这个原理的实现叫做写时复制(copy-on-write)。 在写操作写入块之前,Co W讲原始数据移动到 snapshot空间里,这样就保证了所有的数据在snapshot创建时保持一致。而对于snapshot的读操作,如果是读取数据块是没有修改过的, 那么会将读操作直接重定向到原始卷上,如果是要读取已经修改过的块,那么就读取拷贝到snapshot中的块。 这样,通常的文件I/0流程有一个改变,那就是在文件系统和设备驱动之间增加了一个cow层,变成了下面这个样子: file I/0 --- block I /O 下面的图也许可以比较容易了解Co W的原理: 采 取Co W实现方式时,snapshot的大小并不需要和原始卷一样大,其大小仅仅只需要考虑两个方面:从shapshot创建到释放这段时间内,估计块的 改变量有多大;数据更新的频率。一旦 snapshot的空间记录满了原始卷块变换的信息,那么这个snapshot立刻被释放,从而无法使用,从而导致这个snapshot无效。所以,非常 重要的一点,一定要在snapshot的生命周期里,做完你需要做得事情。当然,如果你的snapshot大小和原始卷一样大,甚至还要大,那它的寿命就 是“与天齐寿”了。 snapshot其实除了备份以外,还有很多其他用途 1)虚拟化 在使用 LVM2 时,快照可以不是只读的。这意味着,在创建快照之后, 可以像常规块设备一样挂载和读写快照。 因 为流行的虚拟化系统(比如 Xen、VMWare、Qemu 和 KVM)可以将块设备用作 guest 映像,所以可以创建这些映像的完整拷贝,并根据需要使用它们,它们就像是内存占用量很低的虚拟机。这样做的好处是部署迅速(创建快照的时间常常不超过几 秒)和节省空间(guest 共享原映像的大多数数据)。 设置的步骤如下: 1. Nov 22 mail kernel: __ratelimit: 20 callbacks suppressed Nov 22 mail kernel: __ratelimit: 2295 callbacks suppressed Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 47270479 Nov 22 mail kernel: lost page write due to I/O error on vdb1 Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 47271504 Nov 22 mail kernel: end_request: I/O error, dev vdb, sector 378116680 Nov 22 mail kernel: end_request: I/O error, dev vdb, sector 378157680 Nov 22 mail kernel: end_request: I/O error, dev vdb, sector 378432440 Nov 22 mail kernel: EXT3-fs (vdb1): error: ext3_journal_start_sb: Detected aborted journal Nov 22 mail kernel: EXT3-fs (vdb1): error: remounting filesystem read-only Nov 22 mail kernel: __ratelimit: 35 callbacks suppressed Nov 22 mail kernel: __ratelimit: 35 callbacks suppressed Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 64003824 Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 64003839 Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 256 Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 32 Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 64 Nov 22 mail kernel: end_request: I/O error, dev vdb, sector 6144 Nov 22 mail yum[19139]: Installed: lsof-4.82-4.el6.x86_64 Nov 22 mail kernel: __ratelimit: 1 callbacks suppressed Nov 22 mail kernel: __ratelimit: 1 callbacks suppressed Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 64003824 Nov 22 mail kernel: Buffer I/O error on device vdb1, logical block 512There were plenty more than that, full excerpt here H8SDr Cg Note the point where there's an i/o error when "updating journal superblock" then later the volume is re-mounted as read-only because of an aborted journal. - 'cat /proc/mdstat' returns 'UU' for both RAID 1 arrays ('boot' and main PV).

Tags: , ,