Creating PDF and Converting to PDF/A using Python and Ghostscript

Vinod Charan Kumar

blog cover.png
Creating PDFs programmatically is a common need in modern applications, whether for reports, invoices, certificates, or other documents. However, when systems require long-term archiving, regular PDFs are not sufficient. This is where PDF/A comes in, a specialized version of PDF designed for digital preservation.

What is PDF/A?

PDF/A is a specialized version of the standard PDF format, created with the primary goal of ensuring the long-term preservation and archiving of electronic documents.

Key Features of PDF/A:

  • Self-Contained: All the elements needed to display the document, such as fonts, color profiles, and images, are embedded directly within the file.
  • No External Dependencies: PDF/A eliminates the use of features that rely on external content, including audio, video, or JavaScript, ensuring that the document is fully functional without needing external resources.
  • Device-Independent: PDF/A ensures that the document will look the same across different platforms, both now and in the future, preserving its appearance for years to come.

Tools We'll Use

In this guide, we’ll:

  • Generate a PDF from HTML using xhtml2pdf
  • Convert the resulting PDF to PDF/A using Ghostscript

There are several ways to create PDFs in Python, but in this post, we’ll focus on using xhtml2pdf to generate a styled PDF from HTML and then convert it to PDF/A format using Ghostscript.

Generating PDF Using xhtml2pdf

Install xhtml2pdf

pip install xhtml2pdf

Create a PDF Template (invoice_template.html)

This is a Django compatible HTML template with header, footer, table formatting, and dynamic placeholders:

<!DOCTYPE html>

<html>

<head>

    <style>

        body { font-family: Arial, sans-serif; font-size: 12px; }

        .header, .footer { text-align: center; position: fixed; width: 100%; }

        .header { top: 0; }

        .footer { bottom: 0; font-size: 10px; }

        .content { margin-top: 80px; margin-bottom: 50px; }

        table { width: 100%; border-collapse: collapse; margin-top: 20px; }

        table, th, td { border: 1px solid #000; }

        th, td { padding: 5px; text-align: left; }

        .page-break { page-break-after: always; }

    </style>

</head>

<body>

 

<div class=”header”>

    <img src=”{{ logo_url }}” width=”100″ alt=”Company Logo”><br>

    <strong>{{ company }}</strong><br>

    Invoice #: {{ invoice_number }} | Date: {{ date }}

</div>

 

<div class=”footer”>

    Page <pdf:page /> of <pdf:total />

</div>

 

<div class=”content”>

    <p><strong>Billed To:</strong> {{ customer }}</p>

 

    <table>

        <thead>

            <tr>

                <th>Item</th>

                <th>Qty</th>

                <th>Price</th>

                <th>Total</th>

            </tr>

        </thead>

        <tbody>

        {% for item in items %}

            <tr>

                <td>{{ item.name }}</td>

                <td>{{ item.qty }}</td>

                <td>${{ item.price }}</td>

                <td>${{ item.qty|floatformat:2|add:item.price|floatformat:2 }}</td>

            </tr>

        {% endfor %}

        </tbody>

    </table>

 

    <div class=”page-break”></div>

 

    <p>Terms & Conditions:</p>

    <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed facilisis nulla at erat vulputate.</p>

</div>

 

</body>

</html>

Replace variables like {{ customer }}, {{ items }}, {{ logo_url }}, etc., with your dynamic context data.

PDF Render Function in Django

Once you’ve created your HTML template for the PDF, the next step is rendering it dynamically with Django and converting it into a PDF using xhtml2pdf.

Here’s a reusable utility function to do just that:

from django.http import HttpResponse

from django.template.loader import get_template

from xhtml2pdf import pisa

 

def render_to_pdf(template_src, context_dict={}):

    template = get_template(template_src)

    html = template.render(context_dict)

    response = HttpResponse(content_type=’application/pdf’)

    pisa_status = pisa.CreatePDF(html, dest=response)

    if pisa_status.err:

        return HttpResponse(“Error Rendering PDF”, status=400)

    return response

            How It Works:

  • get_template(template_src) loads your HTML template.
  • render(context_dict) injects dynamic data.
  • CreatePDF() converts the HTML to a PDF and writes it directly to the HttpResponse object.

Convert PDF to PDF/A Using Ghostscript

After generating the PDF, convert it to PDF/A using Ghostscript (which must be installed):

Install Ghostscript (Linux)

  • sudo apt install ghostscript
  • PDF to PDF/A Conversion Command

def convert_pdf_to_pdfa(input_buffer):

   logger = hlogger.hlogger()

   logger.setflow(‘convert_pdf_to_pdfa’)

   try:

       logger.info(“Initiating PDF/A conversion process”)

       output_buffer = io.BytesIO()

       with tempfile.NamedTemporaryFile(suffix=’.pdf’, delete=False) as temp_input, \

               tempfile.NamedTemporaryFile(suffix=’.pdf’, delete=False) as temp_output:

           # Write input buffer content to temporary file

           temp_input.write(input_buffer.getvalue())

           temp_input.flush()

           # Prepare Ghostscript command

           gs_command = [

               “gs”,

               “-dPDFA=1”,

               “-dBATCH”,

               “-dNOPAUSE”,

               “-sDEVICE=pdfwrite”,

               “-sOutputFile=” + temp_output.name,

               “-sColorConversionStrategy=UseDeviceIndependentColor”,

               “-dPDFACompatibilityPolicy=1”,

               “-dAutoRotatePages=/None”,

               “-dEmbedAllFonts=true”,

               “-dSubsetFonts=false”,

               temp_input.name

           ]

           # Execute conversion

           with open(os.devnull, “w”) as fnull:

               subprocess.check_call(gs_command, stdout=fnull, stderr=fnull)

           with open(temp_output.name, ‘rb’) as f:

               output_buffer.write(f.read())

               output_buffer.seek(0)

           logger.info(“PDF/A conversion completed successfully”)

           return output_buffer

   except subprocess.CalledProcessError as e:

       logger.error(“Ghostscript conversion failed: %s” % str(e))

       return None

   except Exception as e:

       logger.error(“Failed to initialize conversion process: %s” % str(e))

       return None

   finally:

       # Clean up temporary files

       for temp_file in (temp_input.name, temp_output.name):

           try:

               if os.path.exists(temp_file):

                   os.unlink(temp_file)

           except Exception as e:

               logger.error(“Failed to clean up file %s: %s”% (temp_file, str(e)))

Why Use a Custom Ghostscript Wrapper for PDF/A Conversion?

Even though there are Python libraries like ghostscript, PyMuPDF, or reportlab that offer some level of PDF handling or Ghostscript integration, using a custom subprocess wrapper around the Ghostscript command-line tool is often the most reliable, flexible, and production-ready solution for PDF/A conversion.

Full Control Over PDF/A Compliance

By calling the Ghostscript CLI directly using subprocess, you gain access to all critical flags and fine-tuned parameters, such as:

  • -dEmbedAllFonts=true: Ensures fonts are embedded
  • -sColorConversionStrategy=UseDeviceIndependentColor: Guarantees color fidelity
  • -dPDFA=1 and -dPDFACompatibilityPolicy=1: Strict PDF/A compliance
  • -dAutoRotatePages=/None: Prevents unexpected page rotation

Many Python wrappers expose only a limited subset of these options or abstract away important flags, making it harder to guarantee 100% PDF/A compliance.

Reliable and Battle-Tested

The Ghostscript CLI has been used for decades in enterprise environments. It’s:

  • Stable and maintained
  • Trusted in compliance-sensitive industries
  • More feature-complete than any wrapper

Using it directly ensures consistent behavior across environments.

Efficient Memory Handling

The function works with BytesIO memory buffers:

  • No need to save files permanently to disk
  • Fits well in web server or API contexts
  • Handles temporary file cleanup safely with tempfile

Enhanced Logging and Error Handling

Your own wrapper allows you to:

  • Log exactly what’s happening at each step
  • Catch and respond to errors gracefully
  • Provide better debugging and observability

Libraries often abstract these details away, making issues harder to track down.

Cross-Platform and Decoupled

  • Works on Linux, macOS, and Windows
  • No dependency on any specific Python library version
  • Easier to maintain in Docker or cloud environments

Read More Articles

Serverless application
AWS Serverless

Serverless Application

Serverless architecture is a software design pattern where applications’ hosting is outsourced to a third-party service provider, eliminating the developer’s need for server software and

 Contact Us Now

Talk to us to find out about our flexible engagement models.

Get In Touch With Us