{"id":8581,"date":"2024-06-15T06:14:30","date_gmt":"2024-06-15T06:14:30","guid":{"rendered":"https:\/\/www.infinitivehost.com\/knowledge-base\/?p=8581"},"modified":"2024-07-30T05:49:20","modified_gmt":"2024-07-30T05:49:20","slug":"fixing-openstack-gpu-passthrough-prevents-instance-launch","status":"publish","type":"post","link":"https:\/\/www.infinitivehost.com\/knowledge-base\/fixing-openstack-gpu-passthrough-prevents-instance-launch\/","title":{"rendered":"Fixing OpenStack: GPU Passthrough Prevents Instance Launch"},"content":{"rendered":"<div class='epvc-post-count'><span class='epvc-eye'><\/span>  <span class=\"epvc-count\"> 2,372<\/span><span class='epvc-label'> Views<\/span><\/div>\n<p>When trying to enable GPU passthrough on OpenStack, you may encounter issues where instances fail to launch. This can be a complex problem due to the various components and configurations involved. Here\u2019s a step-by-step guide to diagnose and resolve the issue:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common Reasons for Instances Failing to Launch with GPU Passthrough<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Misconfigured Nova Compute Service<\/strong>:<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The Nova compute service needs to be configured correctly to handle GPU passthrough. Incorrect settings in the <code>nova.conf<\/code> file can prevent instances from launching.<\/li>\n<\/ul>\n\n\n\n<p>     2. <strong>Incorrect or Missing PCI Passthrough Configuration<\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenStack needs to know which PCI devices (GPUs) are available for passthrough. This is set in the <code>nova.conf<\/code> under the <code>[pci]<\/code> section.<\/li>\n\n\n\n<li>Ensure that the GPU and any required devices (such as audio components often bundled with GPUs) are specified correctly.<\/li>\n<\/ul>\n\n\n\n<p>     3. <strong>Inadequate BIOS\/UEFI Settings<\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The host machine&#8217;s BIOS or UEFI firmware settings must support and correctly configure IOMMU (VT-d for Intel, AMD-Vi for AMD) and SR-IOV (if needed).<\/li>\n\n\n\n<li>Check that the IOMMU is enabled and that the GPU is set to be visible to the operating system for passthrough.<\/li>\n<\/ul>\n\n\n\n<p>     4. <strong>Kernel and Driver Issues<\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The host operating system\u2019s kernel and the GPU drivers must support PCI passthrough.<\/li>\n\n\n\n<li>Verify that the GPU is using the correct driver (such as the <code>vfio-pci<\/code> driver) and not a standard graphics driver like <code>nouveau<\/code> or <code>nvidia<\/code>.<\/li>\n<\/ul>\n\n\n\n<p>     5. <strong>Cinder or Storage Issues<\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If your instance depends on Cinder for volume storage, any misconfiguration or issues with Cinder can cause instances to fail to launch.<\/li>\n\n\n\n<li>Check that your storage backend is properly configured and available.<\/li>\n<\/ul>\n\n\n\n<p>     6. <strong>Resource Allocation and NUMA Topology<\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ensure that there are enough resources (CPU, memory, and PCI slots) on the host.<\/li>\n\n\n\n<li>Check that the NUMA topology and resource pinning are configured correctly to support PCI passthrough.<\/li>\n<\/ul>\n\n\n\n<p>     7. <strong>Libvirt and QEMU Configuration<\/strong>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenStack often uses QEMU and libvirt for virtualization. Incorrect settings in the <code>libvirt<\/code> or <code>qemu<\/code> configuration files can prevent proper PCI passthrough.<\/li>\n\n\n\n<li>Verify the <code>libvirt<\/code> settings and ensure that the GPU is being passed through correctly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Steps to Troubleshoot and Resolve<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Check Nova Configuration<\/strong>:<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Verify that the <code>[pci]<\/code> section in <code>nova.conf<\/code> is correctly configured.<\/li>\n\n\n\n<li>Ensure that the <code>pci_passthrough_whitelist<\/code> and <code>pci_alias<\/code> settings include the correct vendor and product IDs for your GPU.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-red-color has-text-color has-link-color wp-elements-8365068e9fd92bd133fc7682cca81e78\"><code>   <code>&#91;pci]\n   passthrough_whitelist = {\"vendor_id\":\"1234\", \"product_id\":\"5678\"}\n   alias = {\"vendor_id\":\"1234\", \"product_id\":\"5678\", \"name\":\"gpu\", \"device_type\":\"type-PF\"}<\/code><\/code><\/pre>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Verify BIOS\/UEFI Settings<\/strong>:<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Restart the host and enter BIOS\/UEFI settings.<\/li>\n\n\n\n<li>Enable IOMMU (Intel VT-d or AMD-Vi).<\/li>\n\n\n\n<li>Ensure any settings related to PCIe or device visibility are correctly configured for passthrough.<\/li>\n<\/ul>\n\n\n\n<p>     3. <strong>Update Kernel and GPU Drivers<\/strong>:<\/p>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ensure that your kernel version supports IOMMU and PCI passthrough.<\/li>\n\n\n\n<li>Load the <code>vfio-pci<\/code> module and bind your GPU to this driver:<br><code><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">bash echo \"vfio-pci\" &gt; \/sys\/bus\/pci\/devices\/0000:01:00.0\/driver_override echo 0000:01:00.0 &gt; \/sys\/bus\/pci\/drivers\/vfio-pci\/bind<\/mark><\/code><\/li>\n\n\n\n<li>Replace <code>0000:01:00.0<\/code> with your GPU\u2019s PCI address.<\/li>\n<\/ul>\n\n\n\n<p>      4. <strong>Inspect Libvirt and QEMU Settings<\/strong>:<\/p>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Check <code>\/etc\/libvirt\/qemu.conf<\/code> for correct GPU passthrough settings.<\/li>\n\n\n\n<li>Ensure that the GPU is included in the VM&#8217;s XML configuration.<\/li>\n<\/ul>\n\n\n\n<p>    5. <strong>Validate Resource Availability<\/strong>:<\/p>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <code>lscpu<\/code> and <code>lspci<\/code> to check available CPUs and PCI devices.<\/li>\n\n\n\n<li>Ensure the host has enough free resources to accommodate the GPU and any other requirements of the instance.<\/li>\n<\/ul>\n\n\n\n<p>     6. <strong>Review Logs for Errors<\/strong>:<\/p>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Check the Nova and Libvirt logs for any error messages related to PCI passthrough.<br><code><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">bash sudo tail -f \/var\/log\/nova\/nova-compute.log sudo tail -f \/var\/log\/libvirt\/libvirtd.log<\/mark><\/code><\/li>\n\n\n\n<li>Look for specific errors that can give more clues about what\u2019s going wrong.<\/li>\n<\/ul>\n\n\n\n<p>     7. <strong>Check Host and Hypervisor Compatibility<\/strong>:<\/p>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ensure that your host and hypervisor software versions are compatible with GPU passthrough.<\/li>\n\n\n\n<li>Look up known issues or limitations in the documentation of your specific OpenStack version.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example Configuration for GPU Passthrough<\/h3>\n\n\n\n<p>Here&#8217;s a basic example of how to set up GPU passthrough in <code>nova.conf<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code has-vivid-red-color has-text-color has-link-color wp-elements-de29dce6b349d206a1715eb39804e41b\"><code><code>&#91;pci]\npassthrough_whitelist = &#91;{\"vendor_id\": \"10de\", \"product_id\": \"1db6\", \"address\": \"0000:04:00.0\"}]\nalias = {\"vendor_id\":\"10de\", \"product_id\":\"1db6\", \"name\":\"nvidia_gpu\"}<\/code><\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Replace <code>10de<\/code> with your GPU\u2019s vendor ID and <code>1db6<\/code> with the product ID.<\/li>\n\n\n\n<li>Replace <code>0000:04:00.0<\/code> with your GPU&#8217;s PCI address.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Final Checks<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Reboot the Host<\/strong>: After making changes to BIOS, kernel, or driver settings, a reboot is often necessary.<\/li>\n\n\n\n<li><strong>Test with a Simple Instance<\/strong>: Try launching a simple VM with minimal resources and the GPU assigned to verify basic functionality.<\/li>\n\n\n\n<li><strong>Documentation and Community<\/strong>: Consult the OpenStack and hardware-specific documentation, and consider asking for help in forums or communities if issues persist.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Conclusion<\/h3>\n\n\n\n<p>By checking all these reasons in a systematic way, you can identify and resolve the issues with instances failing to launch. OpenStack needs to know the PCI devices that are recognized as the <a href=\"https:\/\/www.infinitivehost.com\/gpu-dedicated-server\"><mark style=\"background-color:#8ed1fc\" class=\"has-inline-color\"><strong>best GPU dedicated server<\/strong><\/mark><\/a> components. If you follow all the troubleshooting steps and systemically follow them, you may easily resolve the issues preventing your OpenStack instances from launching with GPU passthrough enabled.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>2,372 Views When trying to enable GPU passthrough on OpenStack, you may encounter issues where instances fail to launch. This can be a complex problem due to the various components and configurations involved. Here\u2019s a step-by-step guide to diagnose and resolve the issue: Common Reasons for Instances Failing to Launch with GPU Passthrough 2. Incorrect [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[202],"tags":[],"class_list":["post-8581","post","type-post","status-publish","format-standard","hentry","category-gpu-server"],"_links":{"self":[{"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/posts\/8581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/comments?post=8581"}],"version-history":[{"count":2,"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/posts\/8581\/revisions"}],"predecessor-version":[{"id":8760,"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/posts\/8581\/revisions\/8760"}],"wp:attachment":[{"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/media?parent=8581"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/categories?post=8581"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.infinitivehost.com\/knowledge-base\/wp-json\/wp\/v2\/tags?post=8581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}