3 years, 3 months ago

1 year, 1 month ago

I am having an issue with Unicode with a variable contents when writing to a .pdf with python.
Which is it getting caught on an em dash basically.
I have tried taking that variable, where the contents has an 'em dash' and redefined it with an ' .encode('utf-8') ' for example, i.e., below:
Below is my full code, how could I simply fix my Unicode error in ' Body ' variable contents.
Converting to utf-8 or western , anything outside of ' latin-1 '. Any suggestions?
A workaround is to convert all text to latin-1 encoding before passing it on to the library. You can do that with the following command:
text2 will be free of any non-latin-1 characters. However, some chars may be replaced with ?
The reason for this error is that you are trying to render a character in your PDF that is outside the code range of latin-1 encoding. FPDF uses latin-1 as default encoding for all its build-in fonts.
So as a workaround you can just remove all characters from your text that do not fit into latin-1 encoding. (see my other answer for this workaround).
To fix this error and be able to render those characters in your PDF you need to use fonts that support a wider range of characters. To address this the FPDF library supports Unicode font.
For example you could get the free Google Noto fonts , which support a wide range of Unicode endpoints. For most western languages I would recommend the NotoSans font set. But you can also get fonts for many other languages and scripts including Chinese, Hebrew or Arabic.
Here is how to enable the Unicode fonts in your code for FPDF:
First you need to tell FPDF library where it can find the font files. In this example I am setting it to the sub-folder fonts of the current folder.
Then you need to add the fonts to your PDF document. In this example I am adding the NotoSans fonts for the styles normal, bold, italic and bold-italic:
Now you can use the new fonts normally in your PDF document with set_font() . Here is an example for normal text:
You can also change the encoding through the .set_doc_option() method (documentation here ). I tried Erik's method, which worked for me, but then after adding some more complexities (such as a second PDF and using the write_html() method which required creating a new class), I went back to having the same error. Changing the encoding for the whole document should solve the overall problem as you said.
The readthedocs page says you can only use latin-1 or windows-1252, but pdf.set_doc_option('core_fonts_encoding', 'utf-8') worked for me according to the debugger. Just be aware that some characters will need fixing, like the apostrophe (') showing as â€ÂTM in the PDF.
Hope this is the global fix for this issue you were looking for, even if several months late!
I was trying Erik's solution with some changes, works great with a mix of English and Arabic text. Sample code posted below to generate PDF using pyFPDF.
October 27, 2021

We you are using python to crawl a web page, you may get this error: UnicodeEncodeError: ‘latin-1’ codec can’t encode character ‘\u2026’ . In this tutorial, we will introduce you how to fix it.
As to us, you can check your http request header.
We can find there is a string … in our http request header.
We remove it and find this error is fixed.
UnicodeEncodeError: “latin-1” codec can”t encode character

What could be causing this error when I try to insert a foreign character into the database?

