itext 1.5 Html to Pdf

January 3, 2011 46 Comments

Hello all and Welcome to my first blog post,

I have recently ran into a big problem on how to use iText 1.5 in order to parse a HTML that was converted to a String into a pdf file. The project that I am working on is quite old and we are trying to migrate it from Java 1.4 to Java 6. With that being said we are also trying to use the latest versions on all our jars.

The old itext had also an itextXml library that used a SaxmyHtmlHandler class to handle the content and parsed the content using a SaxParser. So I begun researching for ways to rewrite SaxmyHtmlHandler with the new itext library since itextXml.jar wasn’t available for the new version. The documentation is very weak and lacks in giving the right examples. On the lowagie site the examples are useless since they write their on parsers for a very particular case not a general case like SaxmyHtmlHandler treated.

After days of researching I had discovered HTMLWorker. I have seen it before in their lib but no examples were there to explain how should it be done properly. I will give you the code below. It should be put in void main(String args[]) method. Hope this helps!

try {

com.itextpdf.text.Document document = new com.itextpdf.text.Document(PageSize.A4);
PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(“D://testpdf.pdf”));
document.open();
document.addAuthor(“Author of the Doc”);
document.addCreator(“Creator of the Doc”);
document.addSubject(“Subject of the Doc”);
document.addCreationDate();
document.addTitle(“This is the title”);

//SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
//SAXmyHtmlHandler shh = new SAXmyHtmlHandler(document);

HTMLWorker htmlWorker = new HTMLWorker(document);
String str = “<html><head><title>titlu</title></head><body><table><tr><td>” +
“Cher Monsieur, ” +
“asdasdasdsadas ” +
“En vous remerciant à nouveau de la confiance que vous nous témoignez,” +
“ Bien Cordialement, ” +
“ ADMINISTRATEUR ADMINISTRATEUR Ligne directe : 04 42 91 52 10 Acadomia® – ” +
“37 BD Aristide Briand – 13100 Aix en Provence </td></tr></table></body></html>”;
htmlWorker.parse(new StringReader(str));

document.close();

} catch(DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

Filed under iText 1.5

About javaxtendsolutions
Java programmer

46 Responses to itext 1.5 Html to Pdf

Mohamed El-Beltagy says:

January 5, 2011 at 2:04 pm

First of all, thank you very much for your tiny but to the point example. I really appreciate it very much.

Second, I just wanted to add the following piece of info:
The passed string to the HTMLWorker must start with the following tags:

I tried using iText 1.4.5 ,we’re still using Java 1.4 :(, and things didn’t work as expected till I added my text inside the HTML+BODY+TABLE+TR+TD tags.
Otherwise, it will generate PDF with text in it, but overlapping.

Reply
- javaxtendsolutions says:
  
  January 5, 2011 at 2:10 pm
  
  First of all, thank you for reading my post and for your comment.
  
  I am glad it finally worked for you.
  
  I use iText 1.5 and it works fine in any conditions. Maybe they fixed that problem in this last version.
  
  Reply
- Durgesh says:
  
  July 6, 2011 at 12:06 pm
  
  Realy Its a good help for the programers who are working on it.
  Thanks a ton. it helps me too….
  
  Reply
- leelavathi says:
  
  September 26, 2012 at 11:26 am
  
  how to convert the html into pdf as well as we have to send a mail
  
  Reply
archix says:

January 20, 2011 at 10:22 pm

hi,
Thanks for the blog entry. My customer was getting very impatient, you(r code) saved me from painful discussions. Don’t forget to get free beer from me 😉

Reply
Robert Baldock says:

March 8, 2011 at 2:18 pm

That’s a very useful post but can I make one observation:

The variable pdfWriter doesn’t seem to be used after it is initialised.

Is that intentional?

Robert

Reply
- javaxtendsolutions says:
  
  March 8, 2011 at 2:22 pm
  
  It isn’t used because as you see it’s role is to write the content of the pdf on the document.
  
  Reply
  - Robert Baldock says:
    
    March 8, 2011 at 2:39 pm
    
    OK, thanks for that.
    
    I guess this would achieve the same thing then:
    
    PdfWriter.getInstance(document, new FileOutputStream(“D://testpdf.pdf”));
    
    Robert
Prafull says:

March 29, 2011 at 7:32 am

Hi,

Thanks for the post,
I tried the code in post, but its giving me only header printed in the PDF and not the rest of HTML design and data init. I am using Itext 5.0.6

please help.

Thanks,
Prafull

Reply
- freund says:
  
  December 9, 2011 at 11:40 am
  
  Since i am facing same issue only header is printing and not the body and table content
  
  Reply
  - freund says:
    
    December 12, 2011 at 8:23 am
    
    http://mvnrepository.com/artifact/com.itextpdf/itextpdf/5.1.3
    http://mvnrepository.com/artifact/com.itextpdf.tool/xmlworker/1.1.1
    
    import com.itextpdf.text.Document;
    import com.itextpdf.text.PageSize;
    import com.itextpdf.text.html.simpleparser.HTMLWorker;
    import com.itextpdf.text.pdf.PdfWriter;
    import com.itextpdf.tool.xml.XMLWorkerHelper;
    
    Document document = new Document(PageSize.LETTER);
    PdfWriter pdfWriter = PdfWriter.getInstance
    (document, new FileOutputStream(“c://temp//testpdf.pdf”));
    document.open();
    document.addAuthor(“Real Gagnon”);
    document.addCreator(“Real’s HowTo”);
    document.addSubject(“Thanks for your support”);
    document.addCreationDate();
    document.addTitle(“Please read this”);
    
    XMLWorkerHelper worker = XMLWorkerHelper.getInstance(); or HTMLWorker worker = new HTMLWorker(document);
    
    String str = “”+
    “Real’s HowTo” +
    “Show your support” +
    “Java HowTo” +
    “Javascript HowTo” +
    “Powerbuilder HowTo” +
    “”;
    Reader htmlReader = new StringReader(str);
    
    worker.parseXHtml(pdfWriter, document, htmlReader); or worker.parse(htmlReader);
    document.close();
aranik says:

April 12, 2011 at 4:48 pm

Using the current iText 5.0.6 doesn’t show the table. Using an older iText 2.1.3 shows everything. I’m having trouble using the HTMLWorker.parseToList with tables, regardless of version. The table just doesn’t show up for some reason.

Reply
- javaxtendsolutions says:
  
  April 12, 2011 at 9:53 pm
  
  I don’t recommand using parseToList. Just give the document you want as a parameter in the HtmlWorker’s constructor and then parse the content of the html you desire to be parsed and all it’s content will be parsed into that document you gave as a parameter.
  
  Reply
Pingback: Co nowego « Wiadomości o technologiach IT
mostafa says:

May 9, 2011 at 1:50 am

hi everyone
when i write this code this code from itext example
// step 1: creation of a document-object
Document document = new Document(PageSize.A4, 80, 50, 30, 65);

// step 2:
// we create a writer that listens to the document
// and directs a XML-stream to a file
PdfWriter.getInstance(document, new FileStream(“Chap0707.pdf”, FileMode.Create));

// step 3: we parse the document
HtmlParser.parse(document, “Chap0702.html”);
give me error
Exception of type ‘com.lowagie.text.ExceptionConverter’ was thrown. Any help?

Reply
Mano Senthil says:

June 3, 2011 at 10:36 am

Hi,coud you please share the itext1.5 jar..i chave not get the jar from google.The above code is working fine only for text…it is not displaying lines().i tried itext2.1.3 also.But no positive result.

Reply
Avipe says:

June 13, 2011 at 5:15 am

hi, i have problem inserting image in my html file, the conversion fails if i add image.

have some tips ?

thanks a lot !

Reply
- javaxtendsolutions says:
  
  June 14, 2011 at 8:20 am
  
  Hi Avipe,
  
  I used the same code for inserting images also and it worked.
  
  Please send me a sample of your Java code and HTML code. Maybe you have an alignment problem.
  
  Andrei.
  
  Reply
Kirti says:

July 11, 2011 at 9:48 am

Hi Andrei,

Thanks for the code.
I am working on generating a PDF report dynamically. My report contains more than one table and each table has list entries inside them. When I tried it with one of such table it worked perfectly fine but once I completed my report and tried to parse it, it gave me following error. Can you please help me? I am having demo tomorrow.

com.itextpdf.text.html.simpleparser.TableWrapper cannot be cast to com.itextpdf.text.TextElementArray

I am using iText 5.1.1

Reply
- javaxtendsolutions says:
  
  July 11, 2011 at 1:03 pm
  
  Hi Kirti,
  
  Can you please show me the html you are trying to parse and also the code that parses it?
  
  Andrei.
  
  Reply
  - Kirti says:
    
    July 12, 2011 at 11:49 am
    
    Hi Andrei,
    
    I got the problem in my code. The HTML was not properly formatted and missing one tag that’s why I was getting the issue.
    Thanks for the help.
    
    If any one else is facing problem like this then please get the whole html code and save it as xml file on your machine and then open it with IE. IE shows the missing tags problem :).
    
    Kirti
OxxOno says:

July 12, 2011 at 6:03 pm

Thnks man … your post is so useful … I appreciate u … regards from Peru

Reply
- javaxtendsolutions says:
  
  July 12, 2011 at 6:57 pm
  
  Thank you for your response. I am glad that you find it useful.
  
  Reply
sandesh says:

August 8, 2011 at 1:28 pm

Hi,

Thanks for the post. When i tried to change the font in the html(String), its not reflecting in the pdf. Any suggestions? My problem is that from my jsp I will be sending an html code in form of a string to my java class where I can write this code. So If I have to pass the styles from my jsp itself, how would i do that?

Regards,
Sandesh.

Reply
amrit says:

August 24, 2011 at 7:36 pm

Hey buddy I am facing problem while converting html to pdf using itext-5.1.2.jars.
whenever my html contain controls,images and style sheet(css file).
it gives me below mentioned error
Exception in thread “main” java.lang.ClassCastException: com.itextpdf.text.html.simpleparser.CellWrapper
at com.itextpdf.text.html.simpleparser.HTMLWorker.processLink(HTMLWorker.java:499)
at com.itextpdf.text.html.simpleparser.HTMLTagProcessors$2.endElement(HTMLTagProcessors.java:152)
at com.itextpdf.text.html.simpleparser.HTMLWorker.endElement(HTMLWorker.java:231)
at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.processTag(SimpleXMLParser.java:589)
at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.go(SimpleXMLParser.java:340)
at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.parse(SimpleXMLParser.java:607)
at com.itextpdf.text.html.simpleparser.HTMLWorker.parse(HTMLWorker.java:147)
at com.Test.main(Test.java:60)

Reply
- javaxtendsolutions says:
  
  August 25, 2011 at 8:27 am
  
  Hey Amrit,
  
  Can you please send me a sample piece of code of your HTML?
  
  I thing you are using ” instead of ‘ and it brakes because of that. It is just a blind guess, I will have a better answer once I take a look at your code.
  
  Andrei VISAN.
  
  Reply
Fazle Rokib says:

August 30, 2011 at 12:53 am

I was exactly looking for this one. This is the most concise and to the point code to get started on the html to pdf conversion. It helped me a lot. Thank you very much.

Reply
Singa says:

September 12, 2011 at 1:01 pm

It was really useful and simple to understand,, Thanks

Reply
Sashi says:

October 5, 2011 at 9:06 pm

HI

I have a requirement where i will recieve mails ( unstructured) which contains HTML content as RAW Data. I am using Java.Mail API MIMEMESSAGE to instantiate the message.I am getting the Multipart Message and if type is text/html i need to render the content into PDF. I tried using I Text 2.1 and using HTMLWORKER. i am able to render the conent but it is crossin the PDF Page. I am using org.apache.commons.lang.StringEscapeUtils; to convert the mime part into HTML.

else if (mimeBodyType.contains(“text/html;”)) {
System.out.println(“———HTML”);

HTMLWorker htmlWorker = new HTMLWorker(document);
String strEscapeHTML = StringEscapeUtils.escapeHtml(innerPart.getContent().toString());
String strUnEscapeHTML = StringEscapeUtils.unescapeHtml(strEscapeHTML);
String str2 = “” + (strUnEscapeHTML.toString()) + “\n”;
String strUnEscapeHTML2 = StringEscapeUtils.unescapeHtml(str2);
System.out.println(strUnEscapeHTML2);
htmlWorker.parse(new StringReader(strUnEscapeHTML2));
System.out.println(“———HTML”);

}

Reply
- javaxtendsolutions says:
  
  October 6, 2011 at 11:54 am
  
  Hi,
  
  If I understand well your problem is that your text width is crossing the PDF page?
  
  Andrei.
  
  Reply
hdgirlmd says:

October 6, 2011 at 12:02 am

Hi. I would like to know if I where to parse HTML what would I replace it with to have my rendered pdf not display a but actually force a newline?

Reply
- hdgirlmd says:
  
  October 6, 2011 at 12:03 am
  
  sorry. replace with what pdf code to force a newline?
  
  Reply
  - javaxtendsolutions says:
    
    October 6, 2011 at 11:51 am
    
    Hi,
    Sorry, I don’t understand very well what problems you have. Can you please be more explicit?
    
    Thanks,
    Andrei.
venkatesan says:

October 7, 2011 at 10:38 am

Hi,
very very thanks for your post.

here i am facing some problem…. please clarify me….

I am following the same way that what you said above, like converting a html string to pdf.

The Problem is i am assigning some fonts (system fonts only – verdana, arial) and its sizes for data, which is not rendered while converting to pdf.
The pdf conversion takes its own font and size.

As well as, if i am assigning some width for each table cell (Ex: <td width='30%'…… ), also not reflected in converted pdf.

please help me….

regards
venkatesan.

Reply
venki says:

October 8, 2011 at 7:26 am

please help…

i am using the code said above…
i have one html string which i want to convert to pdf.

in the html, i put font type (system font only: verdana) and font size. But it is not rendering in the converted pdf.

likewise, if i specify the width for each cell, that also not rendered….

please help me….

venkatesan.

Reply
chandu18in says:

November 7, 2011 at 8:45 pm

Hello Author,

I am also receiving Casting error:

encountered [java.lang.ClassCastException] : [com.itextpdf.text.html.simpleparser.TableWrapper at com.itextpdf.text.html.simpleparser.HTMLWorker.processTable(HTMLWorker.java:592)]

Could you please help me? using itext 5.1.2 and my jre is 1.5.

Reply
- chandu18in says:
  
  November 7, 2011 at 8:48 pm
  
  My HTML text is generated dynamically and I have tried saving the html stream to a file and opening it with IE etc, everything is working perfectly fine.
  
  Reply
chandu18in says:

November 7, 2011 at 8:52 pm

If you can send me a test email at chandu18in(at)yahoo.com, i will reply with my code snippet and HTML file i am trying to convert.

Reply
waseem says:

November 16, 2011 at 10:51 am

Its works from top level code. but i want add CSS on it. how can i done this?? any one help please

Reply
Sachin says:

December 1, 2011 at 12:23 pm

Hi,
Is there a way to write complete html from URL stream to a pdf this incudes images.css etc.My code is like below
URL url = new URL(“http://localhost:8180/loan”);
com.itextpdf.text.Document pdfDocument = new com.itextpdf.text.Document();
Reader htmlreader = new BufferedReader(new InputStreamReader(url.openStream()));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
com.itextpdf.text.pdf.PdfWriter.getInstance(pdfDocument, baos);
pdfDocument.open();
com.itextpdf.text.html.simpleparser.StyleSheet styles = new com.itextpdf.text.html.simpleparser.StyleSheet();
// styles.loadTagStyle(“body”, “font”, “Bitstream Vera Sans”);
List arrayElementList = com.itextpdf.text.html.simpleparser.HTMLWorker.parseToList(htmlreader, styles);
for (int i = 0; i < arrayElementList.size(); ++i)
{
com.itextpdf.text.Element e = (com.itextpdf.text.Element) arrayElementList.get(i);
pdfDocument.add(e);
}
pdfDocument.close();
byte[] bs = baos.toByteArray();
String pdfBase64 = Base64.encodeBytes(bs);
File pdfFile = new File("D:/pdfExample.pdf");
FileOutputStream out = new FileOutputStream(pdfFile);
out.write(bs);
out.close();

With this the pdf is getting created but with no css and images.
Thanks in advance.

-Sachin

Reply
Pingback: Confluence: Shared Development Group Space
Luca says:

May 31, 2012 at 2:20 pm

Thank you so much.

Reply
Ranjit says:

July 24, 2012 at 5:48 am

I am using following code to generate pdf file but it give me error following error

“ExceptionConverter: com.lowagie.text.DocumentException: The document is open; you can only add Elements with content.” Please help me….

try {

Document document = new Document(PageSize.A4);
document.open();
PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(“D://testpdf.pdf”));

document.addAuthor(“Author of the Doc”);
document.addCreator(“Creator of the Doc”);
document.addSubject(“Subject of the Doc”);
document.addCreationDate();
document.addTitle(“This is the title”);

//SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
//SAXmyHtmlHandler shh = new SAXmyHtmlHandler(document);

HTMLWorker htmlWorker = new HTMLWorker(document);
String str =”HELLLO”;
htmlWorker.parse(new StringReader(str));
document.close();
} catch(DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

Reply
@nil says:

July 31, 2012 at 1:24 pm

Hi everyone,

I need a small help to convert my html output into pdf along with css and images(not static images and containing flash chart images also). Actually i am saving that jsp output into a html file than am trying to converting into pdf. Iam getting css styles but failed to showing images in pdf.

Reply
Ankit jain says:

November 18, 2012 at 6:21 pm

Hello Author,

I am also facing Casting error:

i am using itextpdf-5.1.0 jar file regarding this example.

please help me in this problem

——————————————————————————————————————
java.lang.ClassCastException: com.itextpdf.text.html.simpleparser.TableWrapper cannot be cast to com.itextpdf.text.TextElementArray
at com.itextpdf.text.html.simpleparser.HTMLWorker.processTable(HTMLWorker.java:588)
at com.itextpdf.text.html.simpleparser.HTMLTagProcessors$11.endElement(HTMLTagProcessors.java:361)
at com.itextpdf.text.html.simpleparser.HTMLWorker.endElement(HTMLWorker.java:227)
at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.processTag(SimpleXMLParser.java:589)
at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.go(SimpleXMLParser.java:340)
at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.parse(SimpleXMLParser.java:607)
at com.itextpdf.text.html.simpleparser.HTMLWorker.parse(HTMLWorker.java:143)
at test.PdfFormGene.main(PdfFormGene.java:97)

————————————————————————————————————————
My Program is the following:-

package test;

import java.io.FileOutputStream;
import java.io.StringReader;

import com.itextpdf.text.PageSize;
import com.itextpdf.text.html.simpleparser.HTMLWorker;
import com.itextpdf.text.pdf.PdfWriter;

public class PdfFormGene {
public static void main(String[] args) {
try {

com.itextpdf.text.Document document = new com.itextpdf.text.Document(
PageSize.A4);
PdfWriter pdfWriter = PdfWriter.getInstance(document,
new FileOutputStream(“f:/testpdf.pdf”));
document.open();
document.addAuthor(“Author of the Doc”);
document.addCreator(“Creator of the Doc”);
document.addSubject(“Subject of the Doc”);
document.addCreationDate();
document.addTitle(“This is the title”);

// SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
// SAXmyHtmlHandler shh = new SAXmyHtmlHandler(document);

HTMLWorker htmlWorker = new HTMLWorker(document);
String str = ” “+
” “+
“Application No. “+
“NREGS ID “+
“Date “+
“Branch “+
“Habitation “+
“Village “+
“Panchayat “+
“Mandal “+
“District “+
“Pincode “+
“State “+
“Country “+
“First NameMiddle NameLast Name “+
” “+
” “+
“Display Name “+
“Father /Husband Name “+
“Gender “+
“Date Of Birth   Age   “+
” “+
“Spouse Name “+
“Address Details:    “+
“Category Spl Category “+
” Caste   “+
“Profession Details:(Tick In appr. Box)1. “+
“Farmer    2.Vendor     “+
“3.Labour    4.Other “+
“Id Proof:(Tick In appr. Box)Ration Card   “+
“Job/House Hold Card      “+
“SHG Card     “+
“Voter ID Card   Other ID    ID Number : “+
“Assets Details:(Tick In appr. Box) “+
“LAND : YesNoHOUSE :Yes “+
“NoNominee Details: “+
” Name :     Age    “+
”         RelationShip :   “+
”    “+
“Declaration “+
“I hereby apply for opening of a SBH Biometrics Based Smart Card Acc issue of a Biometric Based Smart Card to me. I declared that the info provided “+
“by me in this application form is true and correct.I am aware that I hereby apply for opening of a SBH Biometrics Based Smart Card Acc issue “+
“of a Biometric Based Smart Card to me.I declared that the info provided by me in this application form is true and correct. “+
”  Date:  LTI/RTI/Signature Of Customer “+
“Introducer Details:(Tick In appr. Box) “+
”  BO  FA “+
”  PS  VA “+
”  SARP  VS “+
”  VOP  VOS “+
”  I know the above customer for_______Years. “+
“Introducer Signature      “+
”    Signature Of BC    “+
“For Bank Use “+
“Payment Accepted Payment Not Accepted “+
“A/C No. “+
“CustomerId “+
“Date of AuthorizationSGL Code “+
” “+
”  Signature Of Bank/Link Br. Official   “+
” “+
“———————————————————————————————————————— “+
” “+
“Acknowledgment To CustomerDate “+
“Application No. “+
“NREGS ID “+
“A/C Holder Name “+
“Father Name “+
” “;
htmlWorker.parse(new StringReader(str));

document.close();

} catch (Exception e) {
e.printStackTrace();
}

}

}

——————————————————————————————————————

please help me

Thanks
Ankit Jain

Reply
meeta says:

November 20, 2012 at 11:09 am

Hi, I have PdfPCell, which is getting content which contain html tag, how to parse and present in the cell. sample code is below:-
PdfPTable tablePolDet = new PdfPTable(1);
tablePolDet.setWidthPercentage(105f);
cell = new PdfPCell(new Phrase(policySearchItem.getSummary(), smallBold));
cell.setBorder(Rectangle.NO_BORDER);
htmlWorker.parse(new StringReader(cell)); // giving error
tablePolDet.addCell(cell);
document.add(tablePolDet);

Thanks in advance

Reply