itext 1.5 Html to Pdf

Hello all and Welcome to my first blog post,

I have recently ran into a big problem on how to use iText 1.5 in order to parse a HTML that was converted to a String into a pdf file. The project that I am working on is quite old and we are trying to migrate it from Java 1.4 to Java 6. With that being said we are also trying to use the latest versions on all our jars.

The old itext had also an itextXml library that used a SaxmyHtmlHandler class to handle the content and parsed the content using a SaxParser. So I begun researching for ways to rewrite SaxmyHtmlHandler with the new itext library since itextXml.jar wasn’t available for the new version. The documentation is very weak and lacks in giving the right examples. On the lowagie site the examples are useless since they write their on parsers for a very particular case not a general case like  SaxmyHtmlHandler treated.

After days of researching I had discovered HTMLWorker. I have seen it before in their lib but no examples were there to explain how should it be done properly. I will give you the code below.  It should be put in void main(String args[]) method. Hope this helps!

try {

com.itextpdf.text.Document document = new com.itextpdf.text.Document(PageSize.A4);
PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(“D://testpdf.pdf”));
document.open();
document.addAuthor(“Author of the Doc”);
document.addCreator(“Creator of the Doc”);
document.addSubject(“Subject of the Doc”);
document.addCreationDate();
document.addTitle(“This is the title”);

//SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
//SAXmyHtmlHandler shh = new SAXmyHtmlHandler(document);

HTMLWorker htmlWorker = new HTMLWorker(document);
String str = “<html><head><title>titlu</title></head><body><table><tr><td><p style=’font-size: 10pt; font-family: Times’>” +
“Cher Monsieur,</p><br><p align=’justify’ style=’text-indent: 2em; font-size: 10pt; font-family: Times’>” +
“asdasdasdsadas<br></p><p align=’justify’ style=’text-indent: 2em; font-size: 10pt; font-family: Times’>” +
“En vous remerciant &agrave; nouveau de la confiance que vous nous t&eacute;moignez,</p>” +
“<br><p style=’font-size: 10pt; font-family: Times’>Bien Cordialement,<br>” +
“<br>ADMINISTRATEUR ADMINISTRATEUR<br>Ligne directe : 04 42 91 52 10<br>Acadomia&reg; – ” +
“37 BD Aristide Briand  – 13100 Aix en Provence  </p></td></tr></table></body></html>”;
htmlWorker.parse(new StringReader(str));

document.close();

} catch(DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

About javaxtendsolutions
Java programmer

46 Responses to itext 1.5 Html to Pdf

  1. Mohamed El-Beltagy says:

    First of all, thank you very much for your tiny but to the point example. I really appreciate it very much.

    Second, I just wanted to add the following piece of info:
    The passed string to the HTMLWorker must start with the following tags:

    I tried using iText 1.4.5 ,we’re still using Java 1.4 :(, and things didn’t work as expected till I added my text inside the HTML+BODY+TABLE+TR+TD tags.
    Otherwise, it will generate PDF with text in it, but overlapping.

    • javaxtendsolutions says:

      First of all, thank you for reading my post and for your comment.

      I am glad it finally worked for you.

      I use iText 1.5 and it works fine in any conditions. Maybe they fixed that problem in this last version.

    • Durgesh says:

      Realy Its a good help for the programers who are working on it.
      Thanks a ton. it helps me too….

    • leelavathi says:

      how to convert the html into pdf as well as we have to send a mail

  2. archix says:

    hi,
    Thanks for the blog entry. My customer was getting very impatient, you(r code) saved me from painful discussions. Don’t forget to get free beer from me 😉

  3. Robert Baldock says:

    That’s a very useful post but can I make one observation:

    The variable pdfWriter doesn’t seem to be used after it is initialised.

    Is that intentional?

    Robert

    • javaxtendsolutions says:

      It isn’t used because as you see it’s role is to write the content of the pdf on the document.

      • Robert Baldock says:

        OK, thanks for that.

        I guess this would achieve the same thing then:

        PdfWriter.getInstance(document, new FileOutputStream(“D://testpdf.pdf”));

        Robert

  4. Prafull says:

    Hi,

    Thanks for the post,
    I tried the code in post, but its giving me only header printed in the PDF and not the rest of HTML design and data init. I am using Itext 5.0.6

    please help.

    Thanks,
    Prafull

    • freund says:

      Since i am facing same issue only header is printing and not the body and table content

      • freund says:

        http://mvnrepository.com/artifact/com.itextpdf/itextpdf/5.1.3
        http://mvnrepository.com/artifact/com.itextpdf.tool/xmlworker/1.1.1

        import com.itextpdf.text.Document;
        import com.itextpdf.text.PageSize;
        import com.itextpdf.text.html.simpleparser.HTMLWorker;
        import com.itextpdf.text.pdf.PdfWriter;
        import com.itextpdf.tool.xml.XMLWorkerHelper;

        Document document = new Document(PageSize.LETTER);
        PdfWriter pdfWriter = PdfWriter.getInstance
        (document, new FileOutputStream(“c://temp//testpdf.pdf”));
        document.open();
        document.addAuthor(“Real Gagnon”);
        document.addCreator(“Real’s HowTo”);
        document.addSubject(“Thanks for your support”);
        document.addCreationDate();
        document.addTitle(“Please read this”);

        XMLWorkerHelper worker = XMLWorkerHelper.getInstance(); or HTMLWorker worker = new HTMLWorker(document);

        String str = “”+
        Real’s HowTo” +
        “Show your support” +
        “Java HowTo” +
        “Javascript HowTo” +
        “Powerbuilder HowTo” +
        “”;
        Reader htmlReader = new StringReader(str);

        worker.parseXHtml(pdfWriter, document, htmlReader); or worker.parse(htmlReader);
        document.close();

  5. aranik says:

    Using the current iText 5.0.6 doesn’t show the table. Using an older iText 2.1.3 shows everything. I’m having trouble using the HTMLWorker.parseToList with tables, regardless of version. The table just doesn’t show up for some reason.

    • javaxtendsolutions says:

      I don’t recommand using parseToList. Just give the document you want as a parameter in the HtmlWorker’s constructor and then parse the content of the html you desire to be parsed and all it’s content will be parsed into that document you gave as a parameter.

  6. Pingback: Co nowego « Wiadomości o technologiach IT

  7. mostafa says:

    hi everyone
    when i write this code this code from itext example
    // step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 80, 50, 30, 65);

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter.getInstance(document, new FileStream(“Chap0707.pdf”, FileMode.Create));

    // step 3: we parse the document
    HtmlParser.parse(document, “Chap0702.html”);
    give me error
    Exception of type ‘com.lowagie.text.ExceptionConverter’ was thrown. Any help?

  8. Mano Senthil says:

    Hi,coud you please share the itext1.5 jar..i chave not get the jar from google.The above code is working fine only for text…it is not displaying lines().i tried itext2.1.3 also.But no positive result.

  9. Avipe says:

    hi, i have problem inserting image in my html file, the conversion fails if i add image.

    have some tips ?

    thanks a lot !

    • javaxtendsolutions says:

      Hi Avipe,

      I used the same code for inserting images also and it worked.

      Please send me a sample of your Java code and HTML code. Maybe you have an alignment problem.

      Andrei.

  10. Kirti says:

    Hi Andrei,

    Thanks for the code.
    I am working on generating a PDF report dynamically. My report contains more than one table and each table has list entries inside them. When I tried it with one of such table it worked perfectly fine but once I completed my report and tried to parse it, it gave me following error. Can you please help me? I am having demo tomorrow.

    com.itextpdf.text.html.simpleparser.TableWrapper cannot be cast to com.itextpdf.text.TextElementArray

    I am using iText 5.1.1

    • javaxtendsolutions says:

      Hi Kirti,

      Can you please show me the html you are trying to parse and also the code that parses it?

      Andrei.

      • Kirti says:

        Hi Andrei,

        I got the problem in my code. The HTML was not properly formatted and missing one tag that’s why I was getting the issue.
        Thanks for the help.

        If any one else is facing problem like this then please get the whole html code and save it as xml file on your machine and then open it with IE. IE shows the missing tags problem :).

        Kirti

  11. OxxOno says:

    Thnks man … your post is so useful … I appreciate u … regards from Peru

  12. sandesh says:

    Hi,

    Thanks for the post. When i tried to change the font in the html(String), its not reflecting in the pdf. Any suggestions? My problem is that from my jsp I will be sending an html code in form of a string to my java class where I can write this code. So If I have to pass the styles from my jsp itself, how would i do that?

    Regards,
    Sandesh.

  13. amrit says:

    Hey buddy I am facing problem while converting html to pdf using itext-5.1.2.jars.
    whenever my html contain controls,images and style sheet(css file).
    it gives me below mentioned error
    Exception in thread “main” java.lang.ClassCastException: com.itextpdf.text.html.simpleparser.CellWrapper
    at com.itextpdf.text.html.simpleparser.HTMLWorker.processLink(HTMLWorker.java:499)
    at com.itextpdf.text.html.simpleparser.HTMLTagProcessors$2.endElement(HTMLTagProcessors.java:152)
    at com.itextpdf.text.html.simpleparser.HTMLWorker.endElement(HTMLWorker.java:231)
    at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.processTag(SimpleXMLParser.java:589)
    at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.go(SimpleXMLParser.java:340)
    at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.parse(SimpleXMLParser.java:607)
    at com.itextpdf.text.html.simpleparser.HTMLWorker.parse(HTMLWorker.java:147)
    at com.Test.main(Test.java:60)

    • javaxtendsolutions says:

      Hey Amrit,

      Can you please send me a sample piece of code of your HTML?

      I thing you are using ” instead of ‘ and it brakes because of that. It is just a blind guess, I will have a better answer once I take a look at your code.

      Andrei VISAN.

  14. Fazle Rokib says:

    I was exactly looking for this one. This is the most concise and to the point code to get started on the html to pdf conversion. It helped me a lot. Thank you very much.

  15. Singa says:

    It was really useful and simple to understand,, Thanks

  16. Sashi says:

    HI

    I have a requirement where i will recieve mails ( unstructured) which contains HTML content as RAW Data. I am using Java.Mail API MIMEMESSAGE to instantiate the message.I am getting the Multipart Message and if type is text/html i need to render the content into PDF. I tried using I Text 2.1 and using HTMLWORKER. i am able to render the conent but it is crossin the PDF Page. I am using org.apache.commons.lang.StringEscapeUtils; to convert the mime part into HTML.

    else if (mimeBodyType.contains(“text/html;”)) {
    System.out.println(“———HTML”);

    HTMLWorker htmlWorker = new HTMLWorker(document);
    String strEscapeHTML = StringEscapeUtils.escapeHtml(innerPart.getContent().toString());
    String strUnEscapeHTML = StringEscapeUtils.unescapeHtml(strEscapeHTML);
    String str2 = “” + (strUnEscapeHTML.toString()) + “\n”;
    String strUnEscapeHTML2 = StringEscapeUtils.unescapeHtml(str2);
    System.out.println(strUnEscapeHTML2);
    htmlWorker.parse(new StringReader(strUnEscapeHTML2));
    System.out.println(“———HTML”);

    }

  17. hdgirlmd says:

    Hi. I would like to know if I where to parse HTML what would I replace it with to have my rendered pdf not display a but actually force a newline?

  18. venkatesan says:

    Hi,
    very very thanks for your post.

    here i am facing some problem…. please clarify me….

    I am following the same way that what you said above, like converting a html string to pdf.

    The Problem is i am assigning some fonts (system fonts only – verdana, arial) and its sizes for data, which is not rendered while converting to pdf.
    The pdf conversion takes its own font and size.

    As well as, if i am assigning some width for each table cell (Ex: <td width='30%'…… ), also not reflected in converted pdf.

    please help me….

    regards
    venkatesan.

  19. venki says:

    please help…

    i am using the code said above…
    i have one html string which i want to convert to pdf.

    in the html, i put font type (system font only: verdana) and font size. But it is not rendering in the converted pdf.

    likewise, if i specify the width for each cell, that also not rendered….

    please help me….

    venkatesan.

  20. chandu18in says:

    Hello Author,

    I am also receiving Casting error:

    encountered [java.lang.ClassCastException] : [com.itextpdf.text.html.simpleparser.TableWrapper at com.itextpdf.text.html.simpleparser.HTMLWorker.processTable(HTMLWorker.java:592)]

    Could you please help me? using itext 5.1.2 and my jre is 1.5.

    • chandu18in says:

      My HTML text is generated dynamically and I have tried saving the html stream to a file and opening it with IE etc, everything is working perfectly fine.

  21. chandu18in says:

    If you can send me a test email at chandu18in(at)yahoo.com, i will reply with my code snippet and HTML file i am trying to convert.

  22. waseem says:

    Its works from top level code. but i want add CSS on it. how can i done this?? any one help please

  23. Sachin says:

    Hi,
    Is there a way to write complete html from URL stream to a pdf this incudes images.css etc.My code is like below
    URL url = new URL(“http://localhost:8180/loan”);
    com.itextpdf.text.Document pdfDocument = new com.itextpdf.text.Document();
    Reader htmlreader = new BufferedReader(new InputStreamReader(url.openStream()));
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    com.itextpdf.text.pdf.PdfWriter.getInstance(pdfDocument, baos);
    pdfDocument.open();
    com.itextpdf.text.html.simpleparser.StyleSheet styles = new com.itextpdf.text.html.simpleparser.StyleSheet();
    // styles.loadTagStyle(“body”, “font”, “Bitstream Vera Sans”);
    List arrayElementList = com.itextpdf.text.html.simpleparser.HTMLWorker.parseToList(htmlreader, styles);
    for (int i = 0; i < arrayElementList.size(); ++i)
    {
    com.itextpdf.text.Element e = (com.itextpdf.text.Element) arrayElementList.get(i);
    pdfDocument.add(e);
    }
    pdfDocument.close();
    byte[] bs = baos.toByteArray();
    String pdfBase64 = Base64.encodeBytes(bs);
    File pdfFile = new File("D:/pdfExample.pdf");
    FileOutputStream out = new FileOutputStream(pdfFile);
    out.write(bs);
    out.close();

    With this the pdf is getting created but with no css and images.
    Thanks in advance.

    -Sachin

  24. Pingback: Confluence: Shared Development Group Space

  25. Luca says:

    Thank you so much.

  26. Ranjit says:

    I am using following code to generate pdf file but it give me error following error

    “ExceptionConverter: com.lowagie.text.DocumentException: The document is open; you can only add Elements with content.” Please help me….

    try {

    Document document = new Document(PageSize.A4);
    document.open();
    PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(“D://testpdf.pdf”));

    document.addAuthor(“Author of the Doc”);
    document.addCreator(“Creator of the Doc”);
    document.addSubject(“Subject of the Doc”);
    document.addCreationDate();
    document.addTitle(“This is the title”);

    //SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    //SAXmyHtmlHandler shh = new SAXmyHtmlHandler(document);

    HTMLWorker htmlWorker = new HTMLWorker(document);
    String str =”HELLLO”;
    htmlWorker.parse(new StringReader(str));
    document.close();
    } catch(DocumentException e) {
    e.printStackTrace();
    } catch (FileNotFoundException e) {
    e.printStackTrace();
    } catch (UnsupportedEncodingException e) {
    e.printStackTrace();
    } catch (IOException e) {
    e.printStackTrace();
    }

  27. @nil says:

    Hi everyone,

    I need a small help to convert my html output into pdf along with css and images(not static images and containing flash chart images also). Actually i am saving that jsp output into a html file than am trying to converting into pdf. Iam getting css styles but failed to showing images in pdf.

  28. Ankit jain says:

    Hello Author,

    I am also facing Casting error:

    i am using itextpdf-5.1.0 jar file regarding this example.

    please help me in this problem

    ——————————————————————————————————————
    java.lang.ClassCastException: com.itextpdf.text.html.simpleparser.TableWrapper cannot be cast to com.itextpdf.text.TextElementArray
    at com.itextpdf.text.html.simpleparser.HTMLWorker.processTable(HTMLWorker.java:588)
    at com.itextpdf.text.html.simpleparser.HTMLTagProcessors$11.endElement(HTMLTagProcessors.java:361)
    at com.itextpdf.text.html.simpleparser.HTMLWorker.endElement(HTMLWorker.java:227)
    at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.processTag(SimpleXMLParser.java:589)
    at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.go(SimpleXMLParser.java:340)
    at com.itextpdf.text.xml.simpleparser.SimpleXMLParser.parse(SimpleXMLParser.java:607)
    at com.itextpdf.text.html.simpleparser.HTMLWorker.parse(HTMLWorker.java:143)
    at test.PdfFormGene.main(PdfFormGene.java:97)

    ————————————————————————————————————————
    My Program is the following:-

    package test;

    import java.io.FileOutputStream;
    import java.io.StringReader;

    import com.itextpdf.text.PageSize;
    import com.itextpdf.text.html.simpleparser.HTMLWorker;
    import com.itextpdf.text.pdf.PdfWriter;

    public class PdfFormGene {
    public static void main(String[] args) {
    try {

    com.itextpdf.text.Document document = new com.itextpdf.text.Document(
    PageSize.A4);
    PdfWriter pdfWriter = PdfWriter.getInstance(document,
    new FileOutputStream(“f:/testpdf.pdf”));
    document.open();
    document.addAuthor(“Author of the Doc”);
    document.addCreator(“Creator of the Doc”);
    document.addSubject(“Subject of the Doc”);
    document.addCreationDate();
    document.addTitle(“This is the title”);

    // SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    // SAXmyHtmlHandler shh = new SAXmyHtmlHandler(document);

    HTMLWorker htmlWorker = new HTMLWorker(document);
    String str = ” “+
    ” “+
    “Application No. “+
    “NREGS ID “+
    “Date “+
    “Branch “+
    “Habitation “+
    “Village “+
    “Panchayat “+
    “Mandal “+
    “District “+
    “Pincode “+
    “State “+
    “Country “+
    “First NameMiddle NameLast Name “+
    ” “+
    ” “+
    “Display Name “+
    “Father /Husband Name “+
    “Gender “+
    “Date Of Birth   Age   “+
    ” “+
    “Spouse Name “+
    Address Details:    “+
    “Category Spl Category “+
    ” Caste   “+
    Profession Details:(Tick In appr. Box)1. “+
    “Farmer    2.Vendor     “+
    “3.Labour    4.Other “+
    Id Proof:(Tick In appr. Box)Ration Card   “+
    “Job/House Hold Card      “+
    “SHG Card     “+
    “Voter ID Card   Other ID    ID Number : “+
    Assets Details:(Tick In appr. Box) “+
    “LAND : YesNoHOUSE :Yes “+
    “NoNominee Details: “+
    ” Name :     Age    “+
    ”         RelationShip :   “+
    ”    “+
    Declaration “+
    “I hereby apply for opening of a SBH Biometrics Based Smart Card Acc issue of a Biometric Based Smart Card to me. I declared that the info provided “+
    “by me in this application form is true and correct.I am aware that I hereby apply for opening of a SBH Biometrics Based Smart Card Acc issue “+
    “of a Biometric Based Smart Card to me.I declared that the info provided by me in this application form is true and correct. “+
    ”  Date:  LTI/RTI/Signature Of Customer “+
    Introducer Details:(Tick In appr. Box) “+
    ”  BO  FA “+
    ”  PS  VA “+
    ”  SARP  VS “+
    ”  VOP  VOS “+
    ”  I know the above customer for_______Years. “+
    “Introducer Signature      “+
    ”    Signature Of BC    “+
    “For Bank Use “+
    “Payment Accepted Payment Not Accepted “+
    “A/C No. “+
    “CustomerId “+
    “Date of AuthorizationSGL Code “+
    ” “+
    ”  Signature Of Bank/Link Br. Official   “+
    ” “+
    “———————————————————————————————————————— “+
    ” “+
    “Acknowledgment To CustomerDate “+
    “Application No. “+
    NREGS ID “+
    “A/C Holder Name “+
    “Father Name “+
    ” “;
    htmlWorker.parse(new StringReader(str));

    document.close();

    } catch (Exception e) {
    e.printStackTrace();
    }

    }

    }

    ——————————————————————————————————————

    please help me

    Thanks
    Ankit Jain

  29. meeta says:

    Hi, I have PdfPCell, which is getting content which contain html tag, how to parse and present in the cell. sample code is below:-
    PdfPTable tablePolDet = new PdfPTable(1);
    tablePolDet.setWidthPercentage(105f);
    cell = new PdfPCell(new Phrase(policySearchItem.getSummary(), smallBold));
    cell.setBorder(Rectangle.NO_BORDER);
    htmlWorker.parse(new StringReader(cell)); // giving error
    tablePolDet.addCell(cell);
    document.add(tablePolDet);

    Thanks in advance

Leave a reply to javaxtendsolutions Cancel reply