iTextSharp generates corrupted PDF file

2018-06-16 12:24:45

I am trying to generate a PDF file from and HTML string and external css files and save the PDF to disk. As you can see from this example, I am using very simple html. I know the css files are getting read into the ccsResolver by viewing intellisense.

Here is the code I am using :

internal string Create(PdfDocumentDefinition documentDefinition)
{
    MemoryStream output = new MemoryStream();
    MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body>Hello, World!</body></html>"));

    string pathName = @WebConfigurationManager.AppSettings["StagingPath"] + documentDefinition.DocumentName + ".pdf";
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    PdfWriter writer = PdfWriter.GetInstance(document, output);

    using (output)
    {
        using (document)
        {
            document.Open();

            CssResolverPipeline pipeline = SetCssResolver(documentDefinition.CssFiles, document, writer);

            XMLWorker worker = new XMLWorker(pipeline, true);

            XMLParser parser = new XMLParser(worker);
            parser.Parse(input);

            output.Position = 0;
        }

        Byte[] data = output.ToArray();
        File.WriteAllBytes(pathName, data);
    }

    return pathName;
}

private CssResolverPipeline SetCssResolver(List<String> cssFiles, Document     document, PdfWriter writer)
{            
    var htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory());
    ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
    if (cssFiles != null)
    {
        foreach (String cssFile in cssFiles)
        {
             //cssResolver.AddCssFile(cssFile, true);
        }
    }

    return new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));            
}

Here is the output as viewed in NotePad++ :

2 0 obj
<</Length 117/Filter/FlateDecode>>stream
xœ+ä*ä2Ð³P€á¢t.c 256U0·0R(JåJã
ÄªÊÜÒXÏÔHÁÌBÏÌBÁÐPÏ¢Ø@!¨¤Å)¤ÌÂÐH!$(¬khbè»*€„Ò¸4<RsròuÂó‹rR5C²€Š@JC€ú¼i!*
endstream
endobj
4 0 obj
<</Type/Page/MediaBox[0 0 595 842]/Resources<</Font<</F1 1 0 R>>>>/Contents 2 0 R/Parent 3 0 R>>
endobj
1 0 obj
<</Type/Font/Subtype/Type1/BaseFont/Helvetica/Encoding/WinAnsiEncoding>>
endobj
3 0 obj
<</Type/Pages/Count 1/Kids[4 0 R]>>
endobj
5 0 obj
<</Type/Catalog/Pages 3 0 R>>
endobj
6 0 obj
<</Producer(iTextSharp’ 5.5.7 ©2000-2015 iText Group NV (AGPL-version))/CreationDate(D:20151026102026-05'00')/ModDate(D:20151026102026-05'00')>>
endobj
xref
0 7
0000000000 65535 f 
0000000311 00000 n 
0000000015 00000 n 
0000000399 00000 n 
0000000199 00000 n 
0000000450 00000 n 
0000000495 00000 n 
trailer
<</Size 7/Root 5 0 R/Info 6 0 R/ID [<055082e8139638e35ce08dedae069690><055082e8139638e35ce08dedae069690>]>>
%iText-5.5.7
startxref
657
%%EOF

I've been working on this for about 4 hours now. Can anyone see why it is not generating a valid PDF?

Trying it

I simplified the OP's original code to

[Test]
public void ResetStreamPositionAtEndOfUsing()
{
    string outputFilePath = @"test-resultsmiscresetStreamPosition.pdf";
    Directory.CreateDirectory(@"test-resultsmisc");

    MemoryStream output = new MemoryStream();

    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    PdfWriter writer = PdfWriter.GetInstance(document, output);

    using (output)
    {
        using (document)
        {
            document.Open();
            document.Add(new Paragraph("Test"));
            output.Position = 0;
        }

        Byte[] data = output.ToArray();
        File.WriteAllBytes(outputFilePath, data);
    }
}

Running it produced an invalid PDF file nearly identical to the one pasted by the OP into the question. In particular the PDF header was missing.

As recommended by Chris Haas I then removed the spurious line

            output.Position = 0;

And indeed, now the output PDF is valid, in particular it has its header.

Analysis

What happens in the MemoryStream output ?

    MemoryStream output = new MemoryStream();

output is created empty.

    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    PdfWriter writer = PdfWriter.GetInstance(document, output);

The new PdfWriter merely is instantiated, nothing is written, output is still empty.

    using (output)
    {
        using (document)
        {
            document.Open();

document informs writer that document construction started, so writer starts by writing the PDF prologue, ie header line and a "binary" comment; output now contains %PDF-1.4n%âãÏÓn, the current stream position at the end.

            document.Add(new Paragraph("Test"));

A new paragraph is added to the current (first) page, but only in memory, the objects constituting the content of the current page will only be written when a new page is started or the document is finished. output still contains %PDF-1.4n%âãÏÓn, the current stream position still at the end.

            output.Position = 0;

The stream position is reset. output still contains %PDF-1.4n%âãÏÓn, but the current stream position now is at the start !

This is the end of the code block of using (document) . Thus, the Dispose method of document is called. Therein document tells writer that the document creation is finished. writer , therefore, now writes all document objects still in memory and then adds the PDF file epilogue (cross references, trailer, ...).

As the stream position is at the start of the stream now, the existing content is overwritten ! output now contains 2 0 obj...%%EOF, ie the complete PDF missing merely the PDF prologue.

Thanks to mkl's hint I was able to solve this, but, it doesn't seem right that it has to be done this way. There must be a better way. But the solution was to flush the output to one array to get the first 15 bytes, then close the document and flush to another array to get everything after the first 15 bytes (As far as I can see the output stream never contains all of the bytes) and then create a third array and copy the first 2 into it. Here is the complete code:

internal string Create(PdfDocumentDefinition documentDefinition)
{
    string pathName = @WebConfigurationManager.AppSettings["StagingPath"] + documentDefinition.DocumentName + ".pdf";

    MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(documentDefinition.Source));

    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    MemoryStream output = new MemoryStream();
    using (output)
    { 
        PdfWriter writer = PdfWriter.GetInstance(document, output);
        document.Open();

        CssResolverPipeline pipeline = SetCssResolver(documentDefinition.CssFiles, document, writer);

        XMLWorker worker = new XMLWorker(pipeline, true);

        XMLParser parser = new XMLParser(worker);
        parser.Parse(input);

        output.Position = 0;

        Byte[] firstBytes = output.ToArray();

        document.Close();

        Byte[] lastBytes = output.ToArray();
        Byte[] allBytes = new Byte[firstBytes.Length + lastBytes.Length];

        firstBytes.CopyTo(allBytes, 0);
        lastBytes.CopyTo(allBytes, firstBytes.Length);
        File.WriteAllBytes(pathName, allBytes);
    }

    return pathName;
}

private CssResolverPipeline SetCssResolver(List<String> cssFiles, Document document, PdfWriter writer)
{            
    var htmlContext = new HtmlPipelineContext(null);
       htmlContext.SetTagFactory(iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory());
    ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
    if (cssFiles != null)
    {
        foreach (String cssFile in cssFiles)
        {
            cssResolver.AddCssFile(cssFile, true);
        }
    }
    return new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));            
}

链接地址: http://www.djcxy.com/p/46758.html

上一篇: ITextSharp印模腐蚀pdf

下一篇: iTextSharp生成损坏的PDF文件