Creating PDF and Converting to PDF/A using Python and Ghostscript
Vinod Charan Kumar

What is PDF/A?
Key Features of PDF/A:
- Self-Contained: All the elements needed to display the document, such as fonts, color profiles, and images, are embedded directly within the file.
- No External Dependencies: PDF/A eliminates the use of features that rely on external content, including audio, video, or JavaScript, ensuring that the document is fully functional without needing external resources.
- Device-Independent: PDF/A ensures that the document will look the same across different platforms, both now and in the future, preserving its appearance for years to come.
Tools We'll Use
In this guide, we’ll:
- Generate a PDF from HTML using xhtml2pdf
- Convert the resulting PDF to PDF/A using Ghostscript
There are several ways to create PDFs in Python, but in this post, we’ll focus on using xhtml2pdf to generate a styled PDF from HTML and then convert it to PDF/A format using Ghostscript.
Generating PDF Using xhtml2pdf
Install xhtml2pdf
pip install xhtml2pdf
Create a PDF Template (invoice_template.html)
This is a Django compatible HTML template with header, footer, table formatting, and dynamic placeholders:
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; font-size: 12px; }
.header, .footer { text-align: center; position: fixed; width: 100%; }
.header { top: 0; }
.footer { bottom: 0; font-size: 10px; }
.content { margin-top: 80px; margin-bottom: 50px; }
table { width: 100%; border-collapse: collapse; margin-top: 20px; }
table, th, td { border: 1px solid #000; }
th, td { padding: 5px; text-align: left; }
.page-break { page-break-after: always; }
</style>
</head>
<body>
<div class=”header”>
<img src=”{{ logo_url }}” width=”100″ alt=”Company Logo”><br>
<strong>{{ company }}</strong><br>
Invoice #: {{ invoice_number }} | Date: {{ date }}
</div>
<div class=”footer”>
Page <pdf:page /> of <pdf:total />
</div>
<div class=”content”>
<p><strong>Billed To:</strong> {{ customer }}</p>
<table>
<thead>
<tr>
<th>Item</th>
<th>Qty</th>
<th>Price</th>
<th>Total</th>
</tr>
</thead>
<tbody>
{% for item in items %}
<tr>
<td>{{ item.name }}</td>
<td>{{ item.qty }}</td>
<td>${{ item.price }}</td>
<td>${{ item.qty|floatformat:2|add:item.price|floatformat:2 }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<div class=”page-break”></div>
<p>Terms & Conditions:</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed facilisis nulla at erat vulputate.</p>
</div>
</body>
</html>
Replace variables like {{ customer }}, {{ items }}, {{ logo_url }}, etc., with your dynamic context data.
PDF Render Function in Django
Once you’ve created your HTML template for the PDF, the next step is rendering it dynamically with Django and converting it into a PDF using xhtml2pdf.
Here’s a reusable utility function to do just that:
from django.http import HttpResponse
from django.template.loader import get_template
from xhtml2pdf import pisa
def render_to_pdf(template_src, context_dict={}):
template = get_template(template_src)
html = template.render(context_dict)
response = HttpResponse(content_type=’application/pdf’)
pisa_status = pisa.CreatePDF(html, dest=response)
if pisa_status.err:
return HttpResponse(“Error Rendering PDF”, status=400)
return response
How It Works:
- get_template(template_src) loads your HTML template.
- render(context_dict) injects dynamic data.
- CreatePDF() converts the HTML to a PDF and writes it directly to the HttpResponse object.
Convert PDF to PDF/A Using Ghostscript
After generating the PDF, convert it to PDF/A using Ghostscript (which must be installed):
Install Ghostscript (Linux)
- sudo apt install ghostscript
- PDF to PDF/A Conversion Command
def convert_pdf_to_pdfa(input_buffer):
logger = hlogger.hlogger()
logger.setflow(‘convert_pdf_to_pdfa’)
try:
logger.info(“Initiating PDF/A conversion process”)
output_buffer = io.BytesIO()
with tempfile.NamedTemporaryFile(suffix=’.pdf’, delete=False) as temp_input, \
tempfile.NamedTemporaryFile(suffix=’.pdf’, delete=False) as temp_output:
# Write input buffer content to temporary file
temp_input.write(input_buffer.getvalue())
temp_input.flush()
# Prepare Ghostscript command
gs_command = [
“gs”,
“-dPDFA=1”,
“-dBATCH”,
“-dNOPAUSE”,
“-sDEVICE=pdfwrite”,
“-sOutputFile=” + temp_output.name,
“-sColorConversionStrategy=UseDeviceIndependentColor”,
“-dPDFACompatibilityPolicy=1”,
“-dAutoRotatePages=/None”,
“-dEmbedAllFonts=true”,
“-dSubsetFonts=false”,
temp_input.name
]
# Execute conversion
with open(os.devnull, “w”) as fnull:
subprocess.check_call(gs_command, stdout=fnull, stderr=fnull)
with open(temp_output.name, ‘rb’) as f:
output_buffer.write(f.read())
output_buffer.seek(0)
logger.info(“PDF/A conversion completed successfully”)
return output_buffer
except subprocess.CalledProcessError as e:
logger.error(“Ghostscript conversion failed: %s” % str(e))
return None
except Exception as e:
logger.error(“Failed to initialize conversion process: %s” % str(e))
return None
finally:
# Clean up temporary files
for temp_file in (temp_input.name, temp_output.name):
try:
if os.path.exists(temp_file):
os.unlink(temp_file)
except Exception as e:
logger.error(“Failed to clean up file %s: %s”% (temp_file, str(e)))
Why Use a Custom Ghostscript Wrapper for PDF/A Conversion?
Full Control Over PDF/A Compliance
By calling the Ghostscript CLI directly using subprocess, you gain access to all critical flags and fine-tuned parameters, such as:
- -dEmbedAllFonts=true: Ensures fonts are embedded
- -sColorConversionStrategy=UseDeviceIndependentColor: Guarantees color fidelity
- -dPDFA=1 and -dPDFACompatibilityPolicy=1: Strict PDF/A compliance
- -dAutoRotatePages=/None: Prevents unexpected page rotation
Many Python wrappers expose only a limited subset of these options or abstract away important flags, making it harder to guarantee 100% PDF/A compliance.
Reliable and Battle-Tested
The Ghostscript CLI has been used for decades in enterprise environments. It’s:
- Stable and maintained
- Trusted in compliance-sensitive industries
- More feature-complete than any wrapper
Using it directly ensures consistent behavior across environments.
Efficient Memory Handling
The function works with BytesIO memory buffers:
- No need to save files permanently to disk
- Fits well in web server or API contexts
- Handles temporary file cleanup safely with tempfile
Enhanced Logging and Error Handling
Your own wrapper allows you to:
- Log exactly what’s happening at each step
- Catch and respond to errors gracefully
- Provide better debugging and observability
Libraries often abstract these details away, making issues harder to track down.
Cross-Platform and Decoupled
- Works on Linux, macOS, and Windows
- No dependency on any specific Python library version
- Easier to maintain in Docker or cloud environments