Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly corrupt qcow2 image #221

Open
198d opened this issue Jun 11, 2018 · 4 comments
Open

Possibly corrupt qcow2 image #221

198d opened this issue Jun 11, 2018 · 4 comments

Comments

@198d
Copy link

@198d 198d commented Jun 11, 2018

I'm evaluating Hyperkit as a platform for managing a development VM. I've built the hyperkit binary with the qcow2 support and have run into an issue with a possibly corrupt disk image. I was installing NixOS on the disk when I believe the nixos-install binary ran out of memory. I forcefully killed the hyperkit process, started it again with more memory and was going to try the install again. Unfortunately, now I can't get the VM to interact with the disk correctly: commands like fsck and mount just hang. Here's the output I received the last time I tried to run fsck after booting up with the install CD (disk attached to Hyperkit with ahci-hd):

# fsck /dev/sda1
fsck from util-linux 2.31.1
e2fsck 1.44.1 (24-Mar-2018)
NIXOS: recovering journal
hyperkit: [INFO] Allocator: 32765 used; 0 junk; 0 erased; 0 available; 0 copies; 0 roots; 0 Copying; 0 Copied; 0 Flushed; 0 Referenced; max_cluster = 32767
                                                                                                                                                           hyperkit: [INFO] Allocator: file contains cluster 0 .. 32767 will enlarge file to 0 .. 33279
              hyperkit: [INFO] resize: adding available clusters (Node ((x 32768) (y 33279) (l Empty) (r Empty) (h 1) (cardinal 512)))
                                                                                                                                      hyperkit: [ERROR] write sector = 4263936 length = 131072: I/O deadline exceeded
                                                                                                                                                                                                                     hyperkit: [INFO] (((description "Anonymous client 1784")
                                       (locks (((description "cluster 1") (mode Read) (released false)))))
                                                                                                            ((description "write sector = 4263936 length = 131072")
                                                                                                                                                                       (locks
                                                                                                                                                                                   (((description "cluster 3154") (mode Read) (released false))
              ((description "cluster 3") (mode Read) (released false))))))
                                                                          [   87.045565] ata7.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
[   87.046829] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.047719] ata7.00: cmd 61/00:00:00:17:41/01:00:00:00:00/40 tag 0 ncq dma 131072 out
[   87.047719]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.049771] ata7.00: status: { DRDY }
[   87.050278] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.050883] ata7.00: cmd 61/00:08:00:18:41/01:00:00:00:00/40 tag 1 ncq dma 131072 out
[   87.050883]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.052611] ata7.00: status: { DRDY }
[   87.052994] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.053608] ata7.00: cmd 61/00:10:00:19:41/01:00:00:00:00/40 tag 2 ncq dma 131072 out
[   87.053608]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.055271] ata7.00: status: { DRDY }
[   87.055657] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.056167] ata7.00: cmd 61/00:18:00:1a:41/01:00:00:00:00/40 tag 3 ncq dma 131072 out
[   87.056167]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.057549] ata7.00: status: { DRDY }
[   87.057897] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.058416] ata7.00: cmd 61/00:20:00:1b:41/01:00:00:00:00/40 tag 4 ncq dma 131072 out
[   87.058416]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.059683] ata7.00: status: { DRDY }
[   87.060054] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.060546] ata7.00: cmd 61/00:28:00:1c:41/01:00:00:00:00/40 tag 5 ncq dma 131072 out
[   87.060546]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.061984] ata7.00: status: { DRDY }
[   87.062370] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.062801] ata7.00: cmd 61/00:30:00:1d:41/01:00:00:00:00/40 tag 6 ncq dma 131072 out
[   87.062801]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.064205] ata7.00: status: { DRDY }
[   87.064516] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.064937] ata7.00: cmd 61/00:38:00:1e:41/01:00:00:00:00/40 tag 7 ncq dma 131072 out
[   87.064937]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.066147] ata7.00: status: { DRDY }
[   87.066450] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.066871] ata7.00: cmd 61/00:40:00:1f:41/01:00:00:00:00/40 tag 8 ncq dma 131072 out
[   87.066871]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.068103] ata7.00: status: { DRDY }
[   87.068424] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.068861] ata7.00: cmd 61/20:48:00:20:41/00:00:00:00:00/40 tag 9 ncq dma 16384 out
[   87.068861]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.070141] ata7.00: status: { DRDY }
[   87.070446] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.070866] ata7.00: cmd 61/08:50:00:08:04/00:00:03:00:00/40 tag 10 ncq dma 4096 out
[   87.070866]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.072064] ata7.00: status: { DRDY }
[   87.072363] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.072782] ata7.00: cmd 61/10:58:00:08:80/00:00:03:00:00/40 tag 11 ncq dma 8192 out
[   87.072782]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.073996] ata7.00: status: { DRDY }
[   87.074316] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.074737] ata7.00: cmd 61/08:60:80:08:80/00:00:03:00:00/40 tag 12 ncq dma 4096 out
[   87.074737]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.075933] ata7.00: status: { DRDY }
[   87.076231] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.076670] ata7.00: cmd 61/08:68:00:09:80/00:00:03:00:00/40 tag 13 ncq dma 4096 out
[   87.076670]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.077919] ata7.00: status: { DRDY }
[   87.078220] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.078657] ata7.00: cmd 61/10:70:00:09:81/00:00:03:00:00/40 tag 14 ncq dma 8192 out
[   87.078657]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.079859] ata7.00: status: { DRDY }
[   87.080178] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.080641] ata7.00: cmd 61/10:78:00:08:00/00:00:05:00:00/40 tag 15 ncq dma 8192 out
[   87.080641]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.081959] ata7.00: status: { DRDY }
[   87.082325] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.082789] ata7.00: cmd 61/08:80:80:08:00/00:00:05:00:00/40 tag 16 ncq dma 4096 out
[   87.082789]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.084021] ata7.00: status: { DRDY }
[   87.084363] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.084813] ata7.00: cmd 61/00:88:00:09:00/01:00:05:00:00/40 tag 17 ncq dma 131072 out
[   87.084813]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.086208] ata7.00: status: { DRDY }
[   87.086536] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.086985] ata7.00: cmd 61/00:90:00:0a:00/01:00:05:00:00/40 tag 18 ncq dma 131072 out
[   87.086985]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.088313] ata7.00: status: { DRDY }
[   87.088631] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.089080] ata7.00: cmd 61/00:98:00:0b:00/01:00:05:00:00/40 tag 19 ncq dma 131072 out
[   87.089080]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.090400] ata7.00: status: { DRDY }
[   87.090718] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.091167] ata7.00: cmd 61/00:a0:00:0c:00/01:00:05:00:00/40 tag 20 ncq dma 131072 out
[   87.091167]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.092472] ata7.00: status: { DRDY }
[   87.092793] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.093286] ata7.00: cmd 61/00:a8:00:0d:00/01:00:05:00:00/40 tag 21 ncq dma 131072 out
[   87.093286]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.095026] ata7.00: status: { DRDY }
[   87.095364] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.095796] ata7.00: cmd 61/00:b0:00:0e:00/01:00:05:00:00/40 tag 22 ncq dma 131072 out
[   87.095796]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.097039] ata7.00: status: { DRDY }
[   87.097364] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.097791] ata7.00: cmd 61/00:b8:00:0f:00/01:00:05:00:00/40 tag 23 ncq dma 131072 out
[   87.097791]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.099019] ata7.00: status: { DRDY }
[   87.099353] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.099814] ata7.00: cmd 61/00:c0:00:10:41/01:00:00:00:00/40 tag 24 ncq dma 131072 out
[   87.099814]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.104163] ata7.00: status: { DRDY }
[   87.104474] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.104925] ata7.00: cmd 61/00:c8:00:11:41/01:00:00:00:00/40 tag 25 ncq dma 131072 out
[   87.104925]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.106229] ata7.00: status: { DRDY }
[   87.106536] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.106987] ata7.00: cmd 61/00:d0:00:12:41/01:00:00:00:00/40 tag 26 ncq dma 131072 out
[   87.106987]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.108441] ata7.00: status: { DRDY }
[   87.108746] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.109236] ata7.00: cmd 61/00:d8:00:13:41/01:00:00:00:00/40 tag 27 ncq dma 131072 out
[   87.109236]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.110582] ata7.00: status: { DRDY }
[   87.110940] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.111459] ata7.00: cmd 61/00:e0:00:14:41/01:00:00:00:00/40 tag 28 ncq dma 131072 out
[   87.111459]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.112774] ata7.00: status: { DRDY }
[   87.113162] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.113651] ata7.00: cmd 61/00:e8:00:15:41/01:00:00:00:00/40 tag 29 ncq dma 131072 out
[   87.113651]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.115094] ata7.00: status: { DRDY }
[   87.115399] ata7.00: failed command: WRITE FPDMA QUEUED
[   87.115823] ata7.00: cmd 61/00:f0:00:16:41/01:00:00:00:00/40 tag 30 ncq dma 131072 out
[   87.115823]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   87.117039] ata7.00: status: { DRDY }

qemu-img check and qcow-tool check both say everything is OK. I've also ran qcow-tool repair on the image. Here's the output from qcow-tool info:

((version Three) (backing_file_offset 0) (backing_file_size 0)
 (cluster_bits 16) (size 53687091200) (crypt_method None) (l1_size 100)
 (l1_table_offset ((bytes 196608) (is_mutable false) (is_compressed false)))
 (refcount_table_offset
  ((bytes 65536) (is_mutable false) (is_compressed false)))
 (refcount_table_clusters 1) (nb_snapshots 0) (snapshots_offset 0)
 (additional
  (((dirty false) (corrupt false) (lazy_refcounts false)
    (autoclear_features 0) (refcount_order 4))))
 (extensions
  ((Feature_name_table
    (((ty Incompatible) (bit 0) (feature Dirty))
     ((ty Incompatible) (bit 1) (feature Corrupt))
     ((ty Compatible) (bit 0) (feature Lazy_refcounts)))))))

Any advice or guidance is much appreciated.

@198d
Copy link
Author

@198d 198d commented Jun 11, 2018

So, after trying again with a new disk image (created with qemu-img create) and watching the hyperkit process while running nixos-install, I noticed that the install hung when the new disk image got to roughly the same size (in terms of number of "clusters") as the other. I went ahead and used qcow-tool create to create a new disk image and everything worked. What is going on there?

@djs55
Copy link
Collaborator

@djs55 djs55 commented Jun 12, 2018

hyperkit and qcow-tool share the same implementation of qcow2 and so are 100% compatible with each other.

Since the majority of our testing focuses on qcow2 files created by qcow-tool there may still be incompatibilities with images created by qemu-img. So for hyperkit I recommend sticking to qcow-tool.

Is your reproduction case easily scriptable? If so, could you file an issue against https://github.com/mirage/ocaml-qcow? Thanks!

@cfergeau
Copy link

@cfergeau cfergeau commented Jun 6, 2019

I've been having this issue too. No automated way to reproduce, but I can reproduce easily starting with the images at http://download.cirros-cloud.net/0.4.0/. I ran the first steps on a linux box though.

$ qemu-img create -f qcow2 cirros.qcow2 3G 
$ virt-resize --expand /dev/sda1 cirros-0.4.0-x86_64-disk.img cirros.qcow2

Start this image with hyperkit:

$ /Users/$USER/bin/hyperkit   -A -m 1G  -s 0:0,hostbridge -s 31,lpc \
-s 2:0,virtio-blk,file://$BASEDIR/cirros.qcow2,format=qcow \
-l com1,autopty=/Users/$USER/.hyperkit/crc/tty \
-f kexec,$BASEDIR/cirros-0.4.0-x86_64-kernel,$BASEDIR/cirros-0.4.0-x86_64-initramfs,"LABEL=cirros-rootfs ro console=tty0 console=ttyS0,115200n8"  

Connect to its serial console:

sudo screen /dev/ttys001 115200

Login to the VM, and then run:

$ dd if=/dev/zero of=/test

The freeze described here should happen shortly after.

If in the very first step, we create the qcow2 image with lazy refcounts enabled (which qcow-tool create enables by default) , then I no longer get the freeze:

$ qemu-img create -f qcow2 -o lazy_refcounts=on cirros.qcow2 3G
@cfergeau
Copy link

@cfergeau cfergeau commented Jun 6, 2019

Forgot to mention, I was initially reproducing the bug on a totally different linux VM, so I suspect it can be reproduced by triggering a big enough growth of a qcow2 file without lazy refcounts

cfergeau added a commit to cfergeau/snc that referenced this issue Jun 6, 2019
This works around a bug in hyperkit qcow2 implementation on macos which
would cause the whole VM to freeze when it tries to dynamically grow the
qcow2 image
moby/hyperkit#221
cfergeau added a commit to cfergeau/snc that referenced this issue Jun 6, 2019
This works around a bug in hyperkit qcow2 implementation on macos which
would cause the whole VM to freeze when it tries to dynamically grow the
qcow2 image
moby/hyperkit#221

Lazy refcounts are a qcow2 performance improvement which should not have
many adverse effects (slower startup in case of improper VM shutdown),
see https://lists.gnu.org/archive/html/qemu-devel/2012-06/msg03827.html

code-ready#38
cfergeau added a commit to cfergeau/snc that referenced this issue Jun 6, 2019
This works around a bug in hyperkit qcow2 implementation on macos which
would cause the whole VM to freeze when it tries to dynamically grow the
qcow2 image
moby/hyperkit#221

Lazy refcounts are a qcow2 performance improvement which should not have
many adverse effects (slower startup in case of improper VM shutdown),
see https://lists.gnu.org/archive/html/qemu-devel/2012-06/msg03827.html

This fixes code-ready#38.
code-ready#38
praveenkumar added a commit to code-ready/snc that referenced this issue Jun 6, 2019
This works around a bug in hyperkit qcow2 implementation on macos which
would cause the whole VM to freeze when it tries to dynamically grow the
qcow2 image
moby/hyperkit#221

Lazy refcounts are a qcow2 performance improvement which should not have
many adverse effects (slower startup in case of improper VM shutdown),
see https://lists.gnu.org/archive/html/qemu-devel/2012-06/msg03827.html

This fixes #38.
#38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.