Creating PDF and Converting to PDF/A using Python and Ghostscript

Vinod Charan Kumar

Creating PDFs programmatically is a common need in modern applications, whether for reports, invoices, certificates, or other documents. However, when systems require long-term archiving, regular PDFs are not sufficient. This is where PDF/A comes in, a specialized version of PDF designed for digital preservation.

What is PDF/A?

PDF/A is a specialized version of the standard PDF format, created with the primary goal of ensuring the long-term preservation and archiving of electronic documents.

Key Features of PDF/A:

Self-Contained: All the elements needed to display the document, such as fonts, color profiles, and images, are embedded directly within the file.
No External Dependencies: PDF/A eliminates the use of features that rely on external content, including audio, video, or JavaScript, ensuring that the document is fully functional without needing external resources.
Device-Independent: PDF/A ensures that the document will look the same across different platforms, both now and in the future, preserving its appearance for years to come.

Tools We'll Use

In this guide, we’ll:

Generate a PDF from HTML using xhtml2pdf
Convert the resulting PDF to PDF/A using Ghostscript

There are several ways to create PDFs in Python, but in this post, we’ll focus on using xhtml2pdf to generate a styled PDF from HTML and then convert it to PDF/A format using Ghostscript.

Generating PDF Using xhtml2pdf

Install xhtml2pdf

pip install xhtml2pdf

Create a PDF Template (invoice_template.html)

This is a Django compatible HTML template with header, footer, table formatting, and dynamic placeholders:

<!DOCTYPE html>

<html>

<head>

<style>

body { font-family: Arial, sans-serif; font-size: 12px; }

.header, .footer { text-align: center; position: fixed; width: 100%; }

.header { top: 0; }

.footer { bottom: 0; font-size: 10px; }

.content { margin-top: 80px; margin-bottom: 50px; }

table { width: 100%; border-collapse: collapse; margin-top: 20px; }

table, th, td { border: 1px solid #000; }

th, td { padding: 5px; text-align: left; }

.page-break { page-break-after: always; }

</style>

</head>

<body>

<strong>{{ company }}</strong><br>

Invoice #: {{ invoice_number }} | Date: {{ date }}

</div>

Page <pdf:page /> of <pdf:total />

</div>

<p><strong>Billed To:</strong> {{ customer }}</p>

<table>

<thead>

<tr>

<th>Price</th>

<th>Total</th>

</tr>

</thead>

<tbody>

{% for item in items %}

<tr>

<td>${{ item.price }}</td>

<td>${{ item.qty|floatformat:2|add:item.price|floatformat:2 }}</td>

</tr>

{% endfor %}

</tbody>

</table>

<p>Terms & Conditions:</p>

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed facilisis nulla at erat vulputate.</p>

</div>

</body>

</html>

Replace variables like {{ customer }}, {{ items }}, {{ logo_url }}, etc., with your dynamic context data.

PDF Render Function in Django

Once you’ve created your HTML template for the PDF, the next step is rendering it dynamically with Django and converting it into a PDF using xhtml2pdf.

Here’s a reusable utility function to do just that:

from django.http import HttpResponse

from django.template.loader import get_template

from xhtml2pdf import pisa

def render_to_pdf(template_src, context_dict={}):

template = get_template(template_src)

html = template.render(context_dict)

response = HttpResponse(content_type=’application/pdf’)

pisa_status = pisa.CreatePDF(html, dest=response)

if pisa_status.err:

return HttpResponse(“Error Rendering PDF”, status=400)

return response

How It Works:

get_template(template_src) loads your HTML template.
render(context_dict) injects dynamic data.
CreatePDF() converts the HTML to a PDF and writes it directly to the HttpResponse object.

Convert PDF to PDF/A Using Ghostscript

After generating the PDF, convert it to PDF/A using Ghostscript (which must be installed):

Install Ghostscript (Linux)

sudo apt install ghostscript
PDF to PDF/A Conversion Command

def convert_pdf_to_pdfa(input_buffer):

logger = hlogger.hlogger()

logger.setflow(‘convert_pdf_to_pdfa’)

try:

logger.info(“Initiating PDF/A conversion process”)

output_buffer = io.BytesIO()

with tempfile.NamedTemporaryFile(suffix=’.pdf’, delete=False) as temp_input, \

tempfile.NamedTemporaryFile(suffix=’.pdf’, delete=False) as temp_output:

# Write input buffer content to temporary file

temp_input.write(input_buffer.getvalue())

temp_input.flush()

# Prepare Ghostscript command

gs_command = [

“gs”,

“-dPDFA=1”,

“-dBATCH”,

“-dNOPAUSE”,

“-sDEVICE=pdfwrite”,

“-sOutputFile=” + temp_output.name,

“-sColorConversionStrategy=UseDeviceIndependentColor”,

“-dPDFACompatibilityPolicy=1”,

“-dAutoRotatePages=/None”,

“-dEmbedAllFonts=true”,

“-dSubsetFonts=false”,

temp_input.name

]

# Execute conversion

with open(os.devnull, “w”) as fnull:

subprocess.check_call(gs_command, stdout=fnull, stderr=fnull)

with open(temp_output.name, ‘rb’) as f:

output_buffer.write(f.read())

output_buffer.seek(0)

logger.info(“PDF/A conversion completed successfully”)

return output_buffer

except subprocess.CalledProcessError as e:

logger.error(“Ghostscript conversion failed: %s” % str(e))

return None

except Exception as e:

logger.error(“Failed to initialize conversion process: %s” % str(e))

return None

finally:

# Clean up temporary files

for temp_file in (temp_input.name, temp_output.name):

try:

if os.path.exists(temp_file):

os.unlink(temp_file)

except Exception as e:

logger.error(“Failed to clean up file %s: %s”% (temp_file, str(e)))

Why Use a Custom Ghostscript Wrapper for PDF/A Conversion?

Even though there are Python libraries like ghostscript, PyMuPDF, or reportlab that offer some level of PDF handling or Ghostscript integration, using a custom subprocess wrapper around the Ghostscript command-line tool is often the most reliable, flexible, and production-ready solution for PDF/A conversion.

Full Control Over PDF/A Compliance

By calling the Ghostscript CLI directly using subprocess, you gain access to all critical flags and fine-tuned parameters, such as:

-dEmbedAllFonts=true: Ensures fonts are embedded
-sColorConversionStrategy=UseDeviceIndependentColor: Guarantees color fidelity
-dPDFA=1 and -dPDFACompatibilityPolicy=1: Strict PDF/A compliance
-dAutoRotatePages=/None: Prevents unexpected page rotation

Many Python wrappers expose only a limited subset of these options or abstract away important flags, making it harder to guarantee 100% PDF/A compliance.

Reliable and Battle-Tested

The Ghostscript CLI has been used for decades in enterprise environments. It’s:

Stable and maintained
Trusted in compliance-sensitive industries
More feature-complete than any wrapper

Using it directly ensures consistent behavior across environments.

Efficient Memory Handling

The function works with BytesIO memory buffers:

No need to save files permanently to disk
Fits well in web server or API contexts
Handles temporary file cleanup safely with tempfile

Enhanced Logging and Error Handling

Your own wrapper allows you to:

Log exactly what’s happening at each step
Catch and respond to errors gracefully
Provide better debugging and observability

Libraries often abstract these details away, making issues harder to track down.

Cross-Platform and Decoupled

Works on Linux, macOS, and Windows
No dependency on any specific Python library version
Easier to maintain in Docker or cloud environments

blog

Common Product Development Challenges Faced By Product Manager

A Product Manager is incharge of developing a product from start to end, in this process the product manager faces many unpredictable challenges. Product managers

Mythri Sanna

AWS Serverless

Serverless Application

Serverless architecture is a software design pattern where applications’ hosting is outsourced to a third-party service provider, eliminating the developer’s need for server software and

Dhanyajeeth Kanhangad

Talk to us to find out about our flexible engagement models.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Creating PDF and Converting to PDF/A using Python and Ghostscript

Vinod Charan Kumar

What is PDF/A?

Key Features of PDF/A:

Tools We'll Use

Generating PDF Using xhtml2pdf

Install xhtml2pdf

Create a PDF Template (invoice_template.html)

PDF Render Function in Django

Convert PDF to PDF/A Using Ghostscript

Install Ghostscript (Linux)

Why Use a Custom Ghostscript Wrapper for PDF/A Conversion?

Full Control Over PDF/A Compliance

Reliable and Battle-Tested

Efficient Memory Handling

Enhanced Logging and Error Handling

Cross-Platform and Decoupled

Read More Articles

Common Product Development Challenges Faced By Product Manager

Serverless Application

We build custom software solutions, faster hassle-free with quality

Company

Career We are Hiring

Our Services

Explore More

Contact Us

Get In Touch With Us

Creating PDF and Converting to PDF/A using Python and Ghostscript

Vinod Charan Kumar

What is PDF/A?

Key Features of PDF/A:

Tools We'll Use

Generating PDF Using xhtml2pdf

Install xhtml2pdf

Create a PDF Template (invoice_template.html)

PDF Render Function in Django

Convert PDF to PDF/A Using Ghostscript

Install Ghostscript (Linux)

Why Use a Custom Ghostscript Wrapper for PDF/A Conversion?

Full Control Over PDF/A Compliance

Reliable and Battle-Tested

Efficient Memory Handling

Enhanced Logging and Error Handling

Cross-Platform and Decoupled

Read More Articles

Common Product Development Challenges Faced By Product Manager

Serverless Application

General Data Protection Regulation (GDPR)

We build custom software solutions, faster hassle-free with quality

Company

Career We are Hiring

Our Services

Explore More

Contact Us

Get In Touch With Us